Re: [PATCH 30/33] docs: ABI: cleanup several ABI documents
Mauro Carvalho Chehab writes: > There are some ABI documents that, while they don't generate > any warnings, they have issues when parsed by get_abi.pl script > on its output result. > > Address them, in order to provide a clean output. > > Signed-off-by: Mauro Carvalho Chehab > diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem > b/Documentation/ABI/testing/sysfs-bus-papr-pmem > index c1a67275c43f..8316c33862a0 100644 > --- a/Documentation/ABI/testing/sysfs-bus-papr-pmem > +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem > @@ -11,19 +11,26 @@ Description: > at 'Documentation/powerpc/papr_hcalls.rst' . Below are > the flags reported in this sysfs file: > > - * "not_armed" : Indicates that NVDIMM contents will not > + * "not_armed" > + Indicates that NVDIMM contents will not > survive a power cycle. > - * "flush_fail" : Indicates that NVDIMM contents > + * "flush_fail" > + Indicates that NVDIMM contents > couldn't be flushed during last > shut-down event. > - * "restore_fail": Indicates that NVDIMM contents > + * "restore_fail" > + Indicates that NVDIMM contents > couldn't be restored during NVDIMM > initialization. > - * "encrypted" : NVDIMM contents are encrypted. > - * "smart_notify": There is health event for the NVDIMM. > - * "scrubbed": Indicating that contents of the > + * "encrypted" > + NVDIMM contents are encrypted. > + * "smart_notify" > + There is health event for the NVDIMM. > + * "scrubbed" > + Indicating that contents of the > NVDIMM have been scrubbed. > - * "locked" : Indicating that NVDIMM contents cant > + * "locked" > + Indicating that NVDIMM contents cant > be modified until next power cycle. > > What:/sys/bus/nd/devices/nmemX/papr/perf_stats > @@ -51,4 +58,4 @@ Description: > * "MedWDur " : Media Write Duration > * "CchRHCnt" : Cache Read Hit Count > * "CchWHCnt" : Cache Write Hit Count > - * "FastWCnt" : Fast Write Count > \ No newline at end of file > + * "FastWCnt" : Fast Write Count Thanks, I am fine with proposed changes to sysfs-bus-papr-pmem. Acked-by: Vaibhav Jain # for sysfs-bus-papr-pmem
Re: [PATCH] crypto: talitos - Fix return type of current_desc_hdr()
On Thu, Oct 08, 2020 at 09:34:56AM +, Christophe Leroy wrote: > current_desc_hdr() returns a u32 but in fact this is a __be32, > leading to a lot of sparse warnings. > > Change the return type to __be32 and ensure it is handled as > sure by the caller. > > Fixes: 3e721aeb3df3 ("crypto: talitos - handle descriptor not found in error > path") > Signed-off-by: Christophe Leroy > --- > drivers/crypto/talitos.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: talitos - Endianess in current_desc_hdr()
On Thu, Oct 08, 2020 at 09:34:55AM +, Christophe Leroy wrote: > current_desc_hdr() compares the value of the current descriptor > with the next_desc member of the talitos_desc struct. > > While the current descriptor is obtained from in_be32() which > return CPU ordered bytes, next_desc member is in big endian order. > > Convert the current descriptor into big endian before comparing it > with next_desc. > > This fixes a sparse warning. > > Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on > SEC1") > Signed-off-by: Christophe Leroy > --- > drivers/crypto/talitos.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] powerpc/32s: Setup the early hash table at all time.
Andreas Schwab writes: > On Okt 01 2020, Christophe Leroy wrote: > >> At the time being, an early hash table is set up when >> CONFIG_KASAN is selected. >> >> There is nothing wrong with setting such an early hash table >> all the time, even if it is not used. This is a statically >> allocated 256 kB table which lies in the init data section. >> >> This makes the code simpler and may in the future allow to >> setup early IO mappings with fixmap instead of hard coding BATs. >> >> Put create_hpte() and flush_hash_pages() in the .ref.text section >> in order to avoid warning for the reference to early_hash[]. This >> reference is removed by MMU_init_hw_patch() before init memory is >> freed. > > This breaks booting on the iBook G4. Do you get an oops or anything? cheers
Re: [PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
On 30/10/20 12:17 pm, Nathan Lynch wrote: While drmgr has had work in some areas to make its RTAS syscall interactions endian-neutral, its code for performing partition migration via the syscall has never worked on LE. While it is able to complete ibm,suspend-me successfully, it crashes when attempting the subsequent ibm,update-nodes call. drmgr is the only known (or plausible) user of these ibm,suspend-me, ibm,update-nodes, and ibm,update-properties, so allow them only in big-endian configurations. And there's a zero chance that drmgr will ever be fixed on LE? -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH] powerpc: add support for TIF_NOTIFY_SIGNAL
On 10/29/20 6:48 PM, Michael Ellerman wrote: > Jens Axboe writes: >> Wire up TIF_NOTIFY_SIGNAL handling for powerpc. >> >> Cc: linuxppc-dev@lists.ozlabs.org >> Signed-off-by: Jens Axboe >> --- >> >> 5.11 has support queued up for TIF_NOTIFY_SIGNAL, see this posting >> for details: >> >> https://lore.kernel.org/io-uring/20201026203230.386348-1-ax...@kernel.dk/ >> >> As part of that work, I'm adding TIF_NOTIFY_SIGNAL support to all archs, >> as that will enable a set of cleanups once all of them support it. I'm >> happy carrying this patch if need be, or it can be funelled through the >> arch tree. Let me know. > > Happy for you to take it along with the rest of the series. > > Acked-by: Michael Ellerman Great, thanks Michael! Added. -- Jens Axboe
Re: [PATCH] ibmvfc: add new fields for version 2 of several MADs
Tyrel, > I'm going to have to ask that this patch be unstaged. Done! -- Martin K. Petersen Oracle Linux Engineering
[PATCH 26/29] powerpc/pseries/hibernation: perform post-suspend fixups later
The pseries hibernate code calls post_mobility_fixup() which is sort of a dumping ground of fixups that need to run after resuming from suspend regardless of whether suspend was a hibernation or a migration. Calling post_mobility_fixup() from pseries_suspend_enable_irqs() runs this code early in resume with devices suspended and only one CPU up, while the much more commonly used migration case runs these fixups in a more typical process context. Call post_mobility_fixup() after the suspend core returns a success status to the hibernate sysfs store method and remove pseries_suspend_enable_irqs(). Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 21 - 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 6a94cc0deb88..589a91730db8 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -50,21 +50,6 @@ static int pseries_suspend_begin(u64 stream_id) return 0; } -/** - * pseries_suspend_enable_irqs - * - * Post suspend configuration updates - * - **/ -static void pseries_suspend_enable_irqs(void) -{ - /* -* Update configuration which can be modified based on device tree -* changes during resume. -*/ - post_mobility_fixup(); -} - /** * pseries_suspend_enter - Final phase of hibernation * @@ -127,8 +112,11 @@ static ssize_t store_hibernate(struct device *dev, if (!rc) rc = pm_suspend(PM_SUSPEND_MEM); - if (!rc) + if (!rc) { rc = count; + post_mobility_fixup(); + } + return rc; } @@ -214,7 +202,6 @@ static int __init pseries_suspend_init(void) if ((rc = pseries_suspend_sysfs_register(&suspend_dev))) return rc; - ppc_md.suspend_enable_irqs = pseries_suspend_enable_irqs; suspend_set_ops(&pseries_suspend_ops); return 0; } -- 2.25.4
[PATCH 18/29] powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
There are three ways pseries_suspend_begin() can be reached: 1. When "mem" is written to /sys/power/state: kobj_attr_store() -> state_store() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This never works because there is no way to supply a valid stream id using this interface, and H_VASI_STATE is called with a stream id of zero. So this call path is useless at best. 2. When a stream id is written to /sys/devices/system/power/hibernate. pseries_suspend_begin() is polled directly from store_hibernate() until the stream is in the "Suspending" state (i.e. the platform is ready for the OS to suspend execution): dev_attr_store() -> store_hibernate() -> pseries_suspend_begin() 3. When a stream id is written to /sys/devices/system/power/hibernate (continued). After #2, pseries_suspend_begin() is called once again from the pm core: dev_attr_store() -> store_hibernate() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This is redundant because the VASI suspend state is already known to be Suspending. The begin() callback of platform_suspend_ops is optional, so we can simply remove that assignment with no loss of function. Fixes: 32d8ad4e621d ("powerpc/pseries: Partition hibernation support") Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 81e0ac58d620..3eaa9d59dc7a 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -187,7 +187,6 @@ static struct bus_type suspend_subsys = { static const struct platform_suspend_ops pseries_suspend_ops = { .valid = suspend_valid_only_mem, - .begin = pseries_suspend_begin, .prepare_late = pseries_prepare_late, .enter = pseries_suspend_enter, }; -- 2.25.4
[PATCH 29/29] powerpc/pseries/mobility: refactor node lookup during DT update
In pseries_devicetree_update(), with each call to ibm,update-nodes the partition firmware communicates the node to be deleted or updated by placing its phandle in the work buffer. Each of delete_dt_node(), update_dt_node(), and add_dt_node() have duplicate lookups using the phandle value and corresponding refcount management. Move the lookup and of_node_put() into pseries_devicetree_update(), and emit a warning on any failed lookups. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 45 --- 1 file changed, 16 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index d6417c9db201..5521b63898aa 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -61,18 +61,10 @@ static int mobility_rtas_call(int token, char *buf, s32 scope) return rc; } -static int delete_dt_node(__be32 phandle) +static int delete_dt_node(struct device_node *dn) { - struct device_node *dn; - - dn = of_find_node_by_phandle(be32_to_cpu(phandle)); - if (!dn) - return -ENOENT; - pr_debug("removing node %pOFfp\n", dn); - dlpar_detach_node(dn); - of_node_put(dn); return 0; } @@ -137,10 +129,9 @@ static int update_dt_property(struct device_node *dn, struct property **prop, return 0; } -static int update_dt_node(__be32 phandle, s32 scope) +static int update_dt_node(struct device_node *dn, s32 scope) { struct update_props_workarea *upwa; - struct device_node *dn; struct property *prop = NULL; int i, rc, rtas_rc; char *prop_data; @@ -157,14 +148,8 @@ static int update_dt_node(__be32 phandle, s32 scope) if (!rtas_buf) return -ENOMEM; - dn = of_find_node_by_phandle(be32_to_cpu(phandle)); - if (!dn) { - kfree(rtas_buf); - return -ENOENT; - } - upwa = (struct update_props_workarea *)&rtas_buf[0]; - upwa->phandle = phandle; + upwa->phandle = cpu_to_be32(dn->phandle); do { rtas_rc = mobility_rtas_call(update_properties_token, rtas_buf, @@ -224,21 +209,15 @@ static int update_dt_node(__be32 phandle, s32 scope) cond_resched(); } while (rtas_rc == 1); - of_node_put(dn); kfree(rtas_buf); return 0; } -static int add_dt_node(__be32 parent_phandle, __be32 drc_index) +static int add_dt_node(struct device_node *parent_dn, __be32 drc_index) { struct device_node *dn; - struct device_node *parent_dn; int rc; - parent_dn = of_find_node_by_phandle(be32_to_cpu(parent_phandle)); - if (!parent_dn) - return -ENOENT; - dn = dlpar_configure_connector(drc_index, parent_dn); if (!dn) { of_node_put(parent_dn); @@ -251,7 +230,6 @@ static int add_dt_node(__be32 parent_phandle, __be32 drc_index) pr_debug("added node %pOFfp\n", dn); - of_node_put(parent_dn); return rc; } @@ -284,22 +262,31 @@ int pseries_devicetree_update(s32 scope) data++; for (i = 0; i < node_count; i++) { + struct device_node *np; __be32 phandle = *data++; __be32 drc_index; + np = of_find_node_by_phandle(be32_to_cpu(phandle)); + if (!np) { + pr_warn("Failed lookup: phandle 0x%x for action 0x%x\n", + be32_to_cpu(phandle), action); + continue; + } + switch (action) { case DELETE_DT_NODE: - delete_dt_node(phandle); + delete_dt_node(np); break; case UPDATE_DT_NODE: - update_dt_node(phandle, scope); + update_dt_node(np, scope); break; case ADD_DT_NODE: drc_index = *data++; - add_dt_node(phandle, drc_index); + add_dt_node(np, drc_index); break; } + of_node_put(np); cond_resched(); } } -- 2.25.4
[PATCH 27/29] powerpc/pseries/hibernation: remove prepare_late() callback
The pseries hibernate code no longer calls into the original join/suspend code in kernel/rtas.c, so pseries_prepare_late() and related code don't accomplish anything now. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 25 1 file changed, 25 deletions(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 589a91730db8..1b902cbf85c5 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -15,9 +15,6 @@ #include static struct device suspend_dev; -static DECLARE_COMPLETION(suspend_work); -static struct rtas_suspend_me_data suspend_data; -static atomic_t suspending; /** * pseries_suspend_begin - First phase of hibernation @@ -61,23 +58,6 @@ static int pseries_suspend_enter(suspend_state_t state) return rtas_ibm_suspend_me(NULL); } -/** - * pseries_prepare_late - Prepare to suspend all other CPUs - * - * Return value: - * 0 on success / other on failure - **/ -static int pseries_prepare_late(void) -{ - atomic_set(&suspending, 1); - atomic_set(&suspend_data.working, 0); - atomic_set(&suspend_data.done, 0); - atomic_set(&suspend_data.error, 0); - suspend_data.complete = &suspend_work; - reinit_completion(&suspend_work); - return 0; -} - /** * store_hibernate - Initiate partition hibernation * @dev: subsys root device @@ -152,7 +132,6 @@ static struct bus_type suspend_subsys = { static const struct platform_suspend_ops pseries_suspend_ops = { .valid = suspend_valid_only_mem, - .prepare_late = pseries_prepare_late, .enter = pseries_suspend_enter, }; @@ -195,10 +174,6 @@ static int __init pseries_suspend_init(void) if (!firmware_has_feature(FW_FEATURE_LPAR)) return 0; - suspend_data.token = rtas_token("ibm,suspend-me"); - if (suspend_data.token == RTAS_UNKNOWN_SERVICE) - return 0; - if ((rc = pseries_suspend_sysfs_register(&suspend_dev))) return rc; -- 2.25.4
[PATCH 28/29] powerpc/rtas: remove unused rtas_suspend_me_data
All code which used this type has been removed. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas-types.h | 8 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/include/asm/rtas-types.h b/arch/powerpc/include/asm/rtas-types.h index aa420561bc10..8df6235d64d1 100644 --- a/arch/powerpc/include/asm/rtas-types.h +++ b/arch/powerpc/include/asm/rtas-types.h @@ -23,14 +23,6 @@ struct rtas_t { struct device_node *dev;/* virtual address pointer */ }; -struct rtas_suspend_me_data { - atomic_t working; /* number of cpus accessing this struct */ - atomic_t done; - int token; /* ibm,suspend-me */ - atomic_t error; - struct completion *complete; /* wait on this until working == 0 */ -}; - struct rtas_error_log { /* Byte 0 */ u8 byte0; /* Architectural version */ -- 2.25.4
[PATCH 25/29] powerpc/pseries/hibernation: remove redundant cacheinfo update
Partitions with cache nodes in the device tree can encounter the following warning on resume: CPU 0 already accounted in PowerPC,POWER9@0(Data) WARNING: CPU: 0 PID: 3177 at arch/powerpc/kernel/cacheinfo.c:197 cacheinfo_cpu_online+0x640/0x820 These calls to cacheinfo_cpu_offline/online have been redundant since commit e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration"). Fixes: e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration") Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 703728cb95ec..6a94cc0deb88 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -13,7 +13,6 @@ #include #include #include -#include "../../kernel/cacheinfo.h" static struct device suspend_dev; static DECLARE_COMPLETION(suspend_work); @@ -63,9 +62,7 @@ static void pseries_suspend_enable_irqs(void) * Update configuration which can be modified based on device tree * changes during resume. */ - cacheinfo_cpu_offline(smp_processor_id()); post_mobility_fixup(); - cacheinfo_cpu_online(smp_processor_id()); } /** -- 2.25.4
[PATCH 24/29] powerpc/rtas: remove unused rtas_suspend_last_cpu()
rtas_suspend_last_cpu() is now unused, remove it and __rtas_suspend_last_cpu() which also becomes unused. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 1 - arch/powerpc/kernel/rtas.c | 45 - 2 files changed, 46 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 86f5d07969e4..d405b4bd659b 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -256,7 +256,6 @@ extern bool rtas_indicator_present(int token, int *maxindex); extern int rtas_set_indicator(int indicator, int index, int new_value); extern int rtas_set_indicator_fast(int indicator, int index, int new_value); extern void rtas_progress(char *s, unsigned short hex); -extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); int rtas_ibm_suspend_me(int *fw_status); int rtas_syscall_dispatch_ibm_suspend_me(u64 handle); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 4445219e92ce..98ec9ffaa3b3 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -799,51 +799,6 @@ void rtas_os_term(char *str) } static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE; -#ifdef CONFIG_PPC_PSERIES -static int __rtas_suspend_last_cpu(struct rtas_suspend_me_data *data, int wake_when_done) -{ - u16 slb_size = mmu_slb_size; - int rc = H_MULTI_THREADS_ACTIVE; - int cpu; - - slb_set_size(SLB_MIN_SIZE); - printk(KERN_DEBUG "calling ibm,suspend-me on cpu %i\n", smp_processor_id()); - - while (rc == H_MULTI_THREADS_ACTIVE && !atomic_read(&data->done) && - !atomic_read(&data->error)) - rc = rtas_call(data->token, 0, 1, NULL); - - if (rc || atomic_read(&data->error)) { - printk(KERN_DEBUG "ibm,suspend-me returned %d\n", rc); - slb_set_size(slb_size); - } - - if (atomic_read(&data->error)) - rc = atomic_read(&data->error); - - atomic_set(&data->error, rc); - pSeries_coalesce_init(); - - if (wake_when_done) { - atomic_set(&data->done, 1); - - for_each_online_cpu(cpu) - plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu)); - } - - if (atomic_dec_return(&data->working) == 0) - complete(data->complete); - - return rc; -} - -int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data) -{ - atomic_inc(&data->working); - return __rtas_suspend_last_cpu(data, 0); -} - -#endif /** * rtas_activate_firmware() - Activate a new version of firmware. -- 2.25.4
[PATCH 22/29] powerpc/rtas: remove rtas_suspend_cpu()
rtas_suspend_cpu() no longer has users; remove it and __rtas_suspend_cpu() which now becomes unused as well. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 1 - arch/powerpc/kernel/rtas.c | 52 - 2 files changed, 53 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 1e695e553a36..86f5d07969e4 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -256,7 +256,6 @@ extern bool rtas_indicator_present(int token, int *maxindex); extern int rtas_set_indicator(int indicator, int index, int new_value); extern int rtas_set_indicator_fast(int indicator, int index, int new_value); extern void rtas_progress(char *s, unsigned short hex); -extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data); extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); int rtas_ibm_suspend_me(int *fw_status); int rtas_syscall_dispatch_ibm_suspend_me(u64 handle); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 6fde38b488f7..4445219e92ce 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -843,58 +843,6 @@ int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data) return __rtas_suspend_last_cpu(data, 0); } -static int __rtas_suspend_cpu(struct rtas_suspend_me_data *data, int wake_when_done) -{ - long rc = H_SUCCESS; - unsigned long msr_save; - int cpu; - - atomic_inc(&data->working); - - /* really need to ensure MSR.EE is off for H_JOIN */ - msr_save = mfmsr(); - mtmsr(msr_save & ~(MSR_EE)); - - while (rc == H_SUCCESS && !atomic_read(&data->done) && !atomic_read(&data->error)) - rc = plpar_hcall_norets(H_JOIN); - - mtmsr(msr_save); - - if (rc == H_SUCCESS) { - /* This cpu was prodded and the suspend is complete. */ - goto out; - } else if (rc == H_CONTINUE) { - /* All other cpus are in H_JOIN, this cpu does -* the suspend. -*/ - return __rtas_suspend_last_cpu(data, wake_when_done); - } else { - printk(KERN_ERR "H_JOIN on cpu %i failed with rc = %ld\n", - smp_processor_id(), rc); - atomic_set(&data->error, rc); - } - - if (wake_when_done) { - atomic_set(&data->done, 1); - - /* This cpu did the suspend or got an error; in either case, -* we need to prod all other other cpus out of join state. -* Extra prods are harmless. -*/ - for_each_online_cpu(cpu) - plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu)); - } -out: - if (atomic_dec_return(&data->working) == 0) - complete(data->complete); - return rc; -} - -int rtas_suspend_cpu(struct rtas_suspend_me_data *data) -{ - return __rtas_suspend_cpu(data, 0); -} - #endif /** -- 2.25.4
[PATCH 23/29] powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()
rtas_suspend_last_cpu() and related code perform a lot of work that isn't relevant to the hibernation workflow. All other CPUs are offline when called so there is no need to place them in H_JOIN or prod them on resume, nor is there need for retries or operations on shared state. Call the rtas_ibm_suspend_me() wrapper function directly from pseries_suspend_enter() instead of using rtas_suspend_last_cpu(). Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 3315d698d5ab..703728cb95ec 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -76,11 +76,7 @@ static void pseries_suspend_enable_irqs(void) **/ static int pseries_suspend_enter(suspend_state_t state) { - int rc = rtas_suspend_last_cpu(&suspend_data); - - atomic_set(&suspending, 0); - atomic_set(&suspend_data.done, 1); - return rc; + return rtas_ibm_suspend_me(NULL); } /** -- 2.25.4
[PATCH 21/29] powerpc/machdep: remove suspend_disable_cpu()
There are no users left of the suspend_disable_cpu() callback, remove it. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/machdep.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 475687f24f4a..cf6ebbc16cb4 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -207,7 +207,6 @@ struct machdep_calls { void (*suspend_disable_irqs)(void); void (*suspend_enable_irqs)(void); #endif - int (*suspend_disable_cpu)(void); #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE ssize_t (*cpu_probe)(const char *, size_t); -- 2.25.4
[PATCH 20/29] powerpc/pseries/hibernation: remove pseries_suspend_cpu()
Since commit 48f6e7f6d948 ("powerpc/pseries: remove cede offline state for CPUs"), ppc_md.suspend_disable_cpu() is no longer used and all CPUs (save one) are placed into true offline state as opposed to H_JOIN. So pseries_suspend_cpu() is effectively unused; remove it. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 15 --- 1 file changed, 15 deletions(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 232621f33510..3315d698d5ab 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -48,20 +48,6 @@ static int pseries_suspend_begin(u64 stream_id) vasi_state); return -EIO; } - - return 0; -} - -/** - * pseries_suspend_cpu - Suspend a single CPU - * - * Makes the H_JOIN call to suspend the CPU - * - **/ -static int pseries_suspend_cpu(void) -{ - if (atomic_read(&suspending)) - return rtas_suspend_cpu(&suspend_data); return 0; } @@ -235,7 +221,6 @@ static int __init pseries_suspend_init(void) if ((rc = pseries_suspend_sysfs_register(&suspend_dev))) return rc; - ppc_md.suspend_disable_cpu = pseries_suspend_cpu; ppc_md.suspend_enable_irqs = pseries_suspend_enable_irqs; suspend_set_ops(&pseries_suspend_ops); return 0; -- 2.25.4
[PATCH 17/29] powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()
rtas_ibm_suspend_me_unsafe() is now unused; remove it and rtas_percpu_suspend_me() which becomes unused as a result. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 1 - arch/powerpc/kernel/rtas.c | 64 - 2 files changed, 65 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index be0fc2536673..1e695e553a36 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -258,7 +258,6 @@ extern int rtas_set_indicator_fast(int indicator, int index, int new_value); extern void rtas_progress(char *s, unsigned short hex); extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data); extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); -int rtas_ibm_suspend_me_unsafe(u64 handle); int rtas_ibm_suspend_me(int *fw_status); int rtas_syscall_dispatch_ibm_suspend_me(u64 handle); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 52fb394f15d6..6fde38b488f7 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -895,70 +895,6 @@ int rtas_suspend_cpu(struct rtas_suspend_me_data *data) return __rtas_suspend_cpu(data, 0); } -static void rtas_percpu_suspend_me(void *info) -{ - __rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1); -} - -int rtas_ibm_suspend_me_unsafe(u64 handle) -{ - long state; - long rc; - unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; - struct rtas_suspend_me_data data; - DECLARE_COMPLETION_ONSTACK(done); - - if (!rtas_service_present("ibm,suspend-me")) - return -ENOSYS; - - /* Make sure the state is valid */ - rc = plpar_hcall(H_VASI_STATE, retbuf, handle); - - state = retbuf[0]; - - if (rc) { - printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned %ld\n",rc); - return rc; - } else if (state == H_VASI_ENABLED) { - return -EAGAIN; - } else if (state != H_VASI_SUSPENDING) { - printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned state %ld\n", - state); - return -EIO; - } - - atomic_set(&data.working, 0); - atomic_set(&data.done, 0); - atomic_set(&data.error, 0); - data.token = rtas_token("ibm,suspend-me"); - data.complete = &done; - - lock_device_hotplug(); - - cpu_hotplug_disable(); - - /* Call function on all CPUs. One of us will make the -* rtas call -*/ - on_each_cpu(rtas_percpu_suspend_me, &data, 0); - - wait_for_completion(&done); - - if (atomic_read(&data.error) != 0) - printk(KERN_ERR "Error doing global join\n"); - - - cpu_hotplug_enable(); - - unlock_device_hotplug(); - - return atomic_read(&data.error); -} -#else /* CONFIG_PPC_PSERIES */ -int rtas_ibm_suspend_me_unsafe(u64 handle) -{ - return -ENOSYS; -} #endif /** -- 2.25.4
[PATCH 15/29] powerpc/pseries/mobility: retry partition suspend after error
This is a mitigation for the relatively rare occurrence where a virtual IOA can be in a transient state that prevents the suspend/migration from succeeding, resulting in an error from ibm,suspend-me. If the join/suspend sequence returns an error, it is acceptable to retry as long as the VASI suspend session state is still "Suspending" (i.e. the platform is still waiting for the OS to suspend). Retry a few times on suspend failure while this condition holds, progressively increasing the delay between attempts. We don't want to retry indefinitey because firmware emits an error log event on each unsuccessful attempt. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 59 ++- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 0f592246f345..e459cdb8286f 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -549,16 +549,71 @@ static void pseries_cancel_migration(u64 handle, int err) pr_err("H_VASI_SIGNAL error: %ld\n", hvrc); } +static int pseries_suspend(u64 handle) +{ + const unsigned int max_attempts = 5; + unsigned int retry_interval_ms = 1; + unsigned int attempt = 1; + int ret; + + while (true) { + atomic_t counter = ATOMIC_INIT(0); + unsigned long vasi_state; + int vasi_err; + + ret = stop_machine(do_join, &counter, cpu_online_mask); + if (ret == 0) + break; + /* +* Encountered an error. If the VASI stream is still +* in Suspending state, it's likely a transient +* condition related to some device in the partition +* and we can retry in the hope that the cause has +* cleared after some delay. +* +* A better design would allow drivers etc to prepare +* for the suspend and avoid conditions which prevent +* the suspend from succeeding. For now, we have this +* mitigation. +*/ + pr_notice("Partition suspend attempt %u of %u error: %d\n", + attempt, max_attempts, ret); + + if (attempt == max_attempts) + break; + + vasi_err = poll_vasi_state(handle, &vasi_state); + if (vasi_err == 0) { + if (vasi_state != H_VASI_SUSPENDING) { + pr_notice("VASI state %lu after failed suspend\n", + vasi_state); + break; + } + } else if (vasi_err != -EOPNOTSUPP) { + pr_err("VASI state poll error: %d", vasi_err); + break; + } + + pr_notice("Will retry partition suspend after %u ms\n", + retry_interval_ms); + + msleep(retry_interval_ms); + retry_interval_ms *= 10; + attempt++; + } + + return ret; +} + static int pseries_migrate_partition(u64 handle) { - atomic_t counter = ATOMIC_INIT(0); int ret; ret = wait_for_vasi_session_suspending(handle); if (ret) goto out; - ret = stop_machine(do_join, &counter, cpu_online_mask); + ret = pseries_suspend(handle); if (ret == 0) post_mobility_fixup(); else -- 2.25.4
[PATCH 19/29] powerpc/pseries/hibernation: pass stream id via function arguments
There is no need for the stream id to be a file-global variable; pass it from hibernate_store() to pseries_suspend_begin() for the H_VASI_STATE call. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/suspend.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c index 3eaa9d59dc7a..232621f33510 100644 --- a/arch/powerpc/platforms/pseries/suspend.c +++ b/arch/powerpc/platforms/pseries/suspend.c @@ -15,7 +15,6 @@ #include #include "../../kernel/cacheinfo.h" -static u64 stream_id; static struct device suspend_dev; static DECLARE_COMPLETION(suspend_work); static struct rtas_suspend_me_data suspend_data; @@ -29,7 +28,7 @@ static atomic_t suspending; * Return value: * 0 on success / other on failure **/ -static int pseries_suspend_begin(suspend_state_t state) +static int pseries_suspend_begin(u64 stream_id) { long vasi_state, rc; unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; @@ -132,6 +131,7 @@ static ssize_t store_hibernate(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { + u64 stream_id; int rc; if (!capable(CAP_SYS_ADMIN)) @@ -140,7 +140,7 @@ static ssize_t store_hibernate(struct device *dev, stream_id = simple_strtoul(buf, NULL, 16); do { - rc = pseries_suspend_begin(PM_SUSPEND_MEM); + rc = pseries_suspend_begin(stream_id); if (rc == -EAGAIN) ssleep(1); } while (rc == -EAGAIN); @@ -148,8 +148,6 @@ static ssize_t store_hibernate(struct device *dev, if (!rc) rc = pm_suspend(PM_SUSPEND_MEM); - stream_id = 0; - if (!rc) rc = count; -- 2.25.4
[PATCH 16/29] powerpc/rtas: dispatch partition migration requests to pseries
sys_rtas() cannot call ibm,suspend-me directly in the same way it handles other inputs. Instead it must dispatch the request to code that can first perform the H_JOIN sequence before any call to ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair amount of platform-specific code to implement this. Since a different, more robust implementation of the suspend sequence is now in the pseries platform code, we want to dispatch the request there while minimizing additional dependence on pseries. Use a weak function that only pseries overrides. Note that invoking ibm,suspend-me via the RTAS syscall is all but deprecated; this change preserves ABI compatibility for old programs while providing to them the benefit of the new partition suspend implementation. This is a behavior change in that the kernel performs the device tree update and firmware activation before returning, but experimentation indicates this is tolerated fine by legacy user space. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 1 + arch/powerpc/kernel/rtas.c| 8 +++- arch/powerpc/platforms/pseries/mobility.c | 5 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index fdefe6a974eb..be0fc2536673 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -260,6 +260,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data); extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); int rtas_ibm_suspend_me_unsafe(u64 handle); int rtas_ibm_suspend_me(int *fw_status); +int rtas_syscall_dispatch_ibm_suspend_me(u64 handle); struct rtc_time; extern time64_t rtas_get_boot_time(void); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 58bbd69a233f..52fb394f15d6 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1221,6 +1221,12 @@ static bool block_rtas_call(int token, int nargs, #endif /* CONFIG_PPC_RTAS_FILTER */ +/* Only pseries should need to override this. */ +int __weak rtas_syscall_dispatch_ibm_suspend_me(u64 handle) +{ + return -EINVAL; +} + /* We assume to be passed big endian arguments */ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) { @@ -1271,7 +1277,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) int rc = 0; u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32) | be32_to_cpu(args.args[1]); - rc = rtas_ibm_suspend_me_unsafe(handle); + rc = rtas_syscall_dispatch_ibm_suspend_me(handle); if (rc == -EAGAIN) args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE); else if (rc == -EIO) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index e459cdb8286f..d6417c9db201 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -622,6 +622,11 @@ static int pseries_migrate_partition(u64 handle) return ret; } +int rtas_syscall_dispatch_ibm_suspend_me(u64 handle) +{ + return pseries_migrate_partition(handle); +} + static ssize_t migration_store(struct class *class, struct class_attribute *attr, const char *buf, size_t count) -- 2.25.4
[PATCH 14/29] powerpc/pseries/mobility: signal suspend cancellation to platform
If we're returning an error to user space, use H_VASI_SIGNAL to send a cancellation request to the platform. This isn't strictly required but it communicates that Linux will not attempt to complete the suspend, which allows the various entities involved to promptly end the operation in progress. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 44ca7d4e143d..0f592246f345 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -520,6 +520,35 @@ static int do_join(void *arg) return ret; } +/* + * Abort reason code byte 0. We use only the 'Migrating partition' value. + */ +enum vasi_aborting_entity { + ORCHESTRATOR= 1, + VSP_SOURCE = 2, + PARTITION_FIRMWARE = 3, + PLATFORM_FIRMWARE = 4, + VSP_TARGET = 5, + MIGRATING_PARTITION = 6, +}; + +static void pseries_cancel_migration(u64 handle, int err) +{ + u32 reason_code; + u32 detail; + u8 entity; + long hvrc; + + entity = MIGRATING_PARTITION; + detail = abs(err) & 0xff; + reason_code = (entity << 24) | detail; + + hvrc = plpar_hcall_norets(H_VASI_SIGNAL, handle, + H_VASI_SIGNAL_CANCEL, reason_code); + if (hvrc) + pr_err("H_VASI_SIGNAL error: %ld\n", hvrc); +} + static int pseries_migrate_partition(u64 handle) { atomic_t counter = ATOMIC_INIT(0); @@ -532,6 +561,8 @@ static int pseries_migrate_partition(u64 handle) ret = stop_machine(do_join, &counter, cpu_online_mask); if (ret == 0) post_mobility_fixup(); + else + pseries_cancel_migration(handle, ret); out: return ret; } -- 2.25.4
[PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend
The partition suspend sequence as specified in the platform architecture requires that all active processor threads call H_JOIN, which: - suspends the calling thread until it is the target of an H_PROD; or - immediately returns H_CONTINUE, if the calling thread is the last to call H_JOIN. This thread is expected to call ibm,suspend-me to completely suspend the partition. Upon returning from ibm,suspend-me the calling thread must wake all others using H_PROD. rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this protocol, but because of its synchronizing nature this is susceptible to deadlock versus users of stop_machine() or other callers of on_each_cpu(). Not only is stop_machine() intended for use cases like this, it handles error propagation and allows us to keep the data shared between CPUs minimal: a single atomic counter which ensures exactly one CPU will wake the others from their joined states. Switch the migration code to use stop_machine() and a less complex local implementation of the H_JOIN/ibm,suspend-me logic, which carries additional benefits: - more informative error reporting, appropriately ratelimited - resets the lockup detector / watchdog on resume to prevent lockup warnings when the OS has been suspended for a time exceeding the threshold. Fixes: 91dc182ca6e2 ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call") Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 132 -- 1 file changed, 125 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 1b8ae221b98a..44ca7d4e143d 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -12,9 +12,11 @@ #include #include #include +#include #include #include #include +#include #include #include #include @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle) return ret; } +static void prod_single(unsigned int target_cpu) +{ + long hvrc; + int hwid; + + hwid = get_hard_smp_processor_id(target_cpu); + hvrc = plpar_hcall_norets(H_PROD, hwid); + if (hvrc == H_SUCCESS) + return; + pr_err_ratelimited("H_PROD of CPU %u (hwid %d) error: %ld\n", + target_cpu, hwid, hvrc); +} + +static void prod_others(void) +{ + unsigned int cpu; + + for_each_online_cpu(cpu) { + if (cpu != smp_processor_id()) + prod_single(cpu); + } +} + +static u16 clamp_slb_size(void) +{ + u16 prev = mmu_slb_size; + + slb_set_size(SLB_MIN_SIZE); + + return prev; +} + +static int do_suspend(void) +{ + u16 saved_slb_size; + int status; + int ret; + + pr_info("calling ibm,suspend-me on CPU %i\n", smp_processor_id()); + + /* +* The destination processor model may have fewer SLB entries +* than the source. We reduce mmu_slb_size to a safe minimum +* before suspending in order to minimize the possibility of +* programming non-existent entries on the destination. If +* suspend fails, we restore it before returning. On success +* the OF reconfig path will update it from the new device +* tree after resuming on the destination. +*/ + saved_slb_size = clamp_slb_size(); + + ret = rtas_ibm_suspend_me(&status); + if (ret != 0) { + pr_err("ibm,suspend-me error: %d\n", status); + slb_set_size(saved_slb_size); + } + + return ret; +} + +static int do_join(void *arg) +{ + atomic_t *counter = arg; + long hvrc; + int ret; + + /* Must ensure MSR.EE off for H_JOIN. */ + hard_irq_disable(); + hvrc = plpar_hcall_norets(H_JOIN); + + switch (hvrc) { + case H_CONTINUE: + /* +* All other CPUs are offline or in H_JOIN. This CPU +* attempts the suspend. +*/ + ret = do_suspend(); + break; + case H_SUCCESS: + /* +* The suspend is complete and this cpu has received a +* prod. +*/ + ret = 0; + break; + case H_BAD_MODE: + case H_HARDWARE: + default: + ret = -EIO; + pr_err_ratelimited("H_JOIN error %ld on CPU %i\n", + hvrc, smp_processor_id()); + break; + } + + if (atomic_inc_return(counter) == 1) { + pr_info("CPU %u waking all threads\n", smp_processor_id()); + prod_others(); + } + /* +* Execution may have been suspended for several seconds, so +* reset the watchdog. +*/ + touch_nmi_watchdog(); + return ret; +} + +static int pseries_migrate_partition(u64 handle)
[PATCH 12/29] powerpc/pseries/mobility: extract VASI session polling logic
The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to the caller until the specified VASI suspend session state makes the transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest of separating concerns to prepare for a new implementation of the join/suspend sequence, extract VASI session polling logic into a couple of local functions. Waiting for the session state to reach H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures that we will never get an EAGAIN result necessitating a retry. No user-visible change in behavior is intended. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 76 +-- 1 file changed, 71 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index dc6abf164db7..1b8ae221b98a 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -345,6 +345,73 @@ void post_mobility_fixup(void) return; } +static int poll_vasi_state(u64 handle, unsigned long *res) +{ + unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; + long hvrc; + int ret; + + hvrc = plpar_hcall(H_VASI_STATE, retbuf, handle); + switch (hvrc) { + case H_SUCCESS: + ret = 0; + *res = retbuf[0]; + break; + case H_PARAMETER: + ret = -EINVAL; + break; + case H_FUNCTION: + ret = -EOPNOTSUPP; + break; + case H_HARDWARE: + default: + pr_err("unexpected H_VASI_STATE result %ld\n", hvrc); + ret = -EIO; + break; + } + return ret; +} + +static int wait_for_vasi_session_suspending(u64 handle) +{ + unsigned long state; + bool keep_polling; + int ret; + + /* +* Wait for transition from H_VASI_ENABLED to +* H_VASI_SUSPENDING. Treat anything else as an error. +*/ + do { + keep_polling = false; + ret = poll_vasi_state(handle, &state); + if (ret != 0) + break; + + switch (state) { + case H_VASI_SUSPENDING: + break; + case H_VASI_ENABLED: + keep_polling = true; + ssleep(1); + break; + default: + pr_err("unexpected H_VASI_STATE result %lu\n", state); + ret = -EIO; + break; + } + } while (keep_polling); + + /* +* Proceed even if H_VASI_STATE is unavailable. If H_JOIN or +* ibm,suspend-me are also unimplemented, we'll recover then. +*/ + if (ret == -EOPNOTSUPP) + ret = 0; + + return ret; +} + static ssize_t migration_store(struct class *class, struct class_attribute *attr, const char *buf, size_t count) @@ -356,12 +423,11 @@ static ssize_t migration_store(struct class *class, if (rc) return rc; - do { - rc = rtas_ibm_suspend_me_unsafe(streamid); - if (rc == -EAGAIN) - ssleep(1); - } while (rc == -EAGAIN); + rc = wait_for_vasi_session_suspending(streamid); + if (rc) + return rc; + rc = rtas_ibm_suspend_me_unsafe(streamid); if (rc) return rc; -- 2.25.4
[PATCH 11/29] powerpc/pseries/mobility: use rtas_activate_firmware() on resume
It's incorrect to abort post-suspend processing if ibm,activate-firmware isn't available. Use rtas_activate_firmware(), which logs this condition appropriately and allows us to proceed. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index ef8f5641e700..dc6abf164db7 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -312,21 +312,8 @@ int pseries_devicetree_update(s32 scope) void post_mobility_fixup(void) { int rc; - int activate_fw_token; - activate_fw_token = rtas_token("ibm,activate-firmware"); - if (activate_fw_token == RTAS_UNKNOWN_SERVICE) { - printk(KERN_ERR "Could not make post-mobility " - "activate-fw call.\n"); - return; - } - - do { - rc = rtas_call(activate_fw_token, 0, 1, NULL); - } while (rtas_busy_delay(rc)); - - if (rc) - printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc); + rtas_activate_firmware(); /* * We don't want CPUs to go online/offline while the device -- 2.25.4
[PATCH 10/29] powerpc/pseries/mobility: error message improvements
- Convert printk(KERN_ERR) to pr_err(). - Include errno in property update failure message. - Remove reference to "Post-mobility" from device tree update message: with pr_err() it will have a "mobility:" prefix. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index d85799b5464a..ef8f5641e700 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -208,8 +208,8 @@ static int update_dt_node(__be32 phandle, s32 scope) rc = update_dt_property(dn, &prop, prop_name, vd, prop_data); if (rc) { - printk(KERN_ERR "Could not update %s" - " property\n", prop_name); + pr_err("updating %s property failed: %d\n", + prop_name, rc); } prop_data += vd; @@ -343,8 +343,7 @@ void post_mobility_fixup(void) rc = pseries_devicetree_update(MIGRATION_SCOPE); if (rc) - printk(KERN_ERR "Post-mobility device tree update " - "failed: %d\n", rc); + pr_err("device tree update failed: %d\n", rc); cacheinfo_rebuild(); -- 2.25.4
[PATCH 09/29] powerpc/pseries/mobility: add missing break to default case
update_dt_node() has a switch statement where the default case lacks a break statement. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 5bcb6e5cc0f2..d85799b5464a 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -213,6 +213,7 @@ static int update_dt_node(__be32 phandle, s32 scope) } prop_data += vd; + break; } cond_resched(); -- 2.25.4
[PATCH 08/29] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes
Treat the absence of the ibm,update-nodes function as benign instead of reporting an error. If the platform does not provide that facility, it's not a problem for Linux. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/mobility.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index b6de65cbfcd9..5bcb6e5cc0f2 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -261,7 +261,7 @@ int pseries_devicetree_update(s32 scope) update_nodes_token = rtas_token("ibm,update-nodes"); if (update_nodes_token == RTAS_UNKNOWN_SERVICE) - return -EINVAL; + return 0; rtas_buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL); if (!rtas_buf) -- 2.25.4
[PATCH 07/29] powerpc/hvcall: add token and codes for H_VASI_SIGNAL
H_VASI_SIGNAL can be used by a partition to request cancellation of its migration. To be used in future changes. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/hvcall.h | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index c1fbccb04390..c98f5141e3fc 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -155,6 +155,14 @@ #define H_VASI_RESUMED 5 #define H_VASI_COMPLETED6 +/* VASI signal codes. Only the Cancel code is valid for H_VASI_SIGNAL. */ +#define H_VASI_SIGNAL_CANCEL1 +#define H_VASI_SIGNAL_ABORT 2 +#define H_VASI_SIGNAL_SUSPEND 3 +#define H_VASI_SIGNAL_COMPLETE 4 +#define H_VASI_SIGNAL_ENABLE5 +#define H_VASI_SIGNAL_FAILOVER 6 + /* Each control block has to be on a 4K boundary */ #define H_CB_ALIGNMENT 4096 @@ -261,6 +269,7 @@ #define H_ADD_CONN 0x284 #define H_DEL_CONN 0x288 #define H_JOIN 0x298 +#define H_VASI_SIGNAL 0x2A0 #define H_VASI_STATE0x2A4 #define H_VIOCTL 0x2A8 #define H_ENABLE_CRQ 0x2B0 -- 2.25.4
[PATCH 05/29] powerpc/rtas: add rtas_ibm_suspend_me()
Now that the name is available, provide a simple wrapper for ibm,suspend-me which returns both a Linux errno and optionally the actual RTAS status to the caller. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 1 + arch/powerpc/kernel/rtas.c | 57 + 2 files changed, 58 insertions(+) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 8436ed01567b..b43165fc6c2a 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -258,6 +258,7 @@ extern void rtas_progress(char *s, unsigned short hex); extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data); extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); int rtas_ibm_suspend_me_unsafe(u64 handle); +int rtas_ibm_suspend_me(int *fw_status); struct rtc_time; extern time64_t rtas_get_boot_time(void); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 33adefa84a42..70c570269d7b 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -684,6 +684,63 @@ int rtas_set_indicator_fast(int indicator, int index, int new_value) return rc; } +/** + * rtas_ibm_suspend_me() - Call ibm,suspend-me to suspend the LPAR. + * + * @fw_status: RTAS call status will be placed here if not NULL. + * + * rtas_ibm_suspend_me() should be called only on a CPU which has + * received H_CONTINUE from the H_JOIN hcall. All other active CPUs + * should be waiting to return from H_JOIN. + * + * rtas_ibm_suspend_me() may suspend execution of the OS + * indefinitely. Callers should take appropriate measures upon return, such as + * resetting watchdog facilities. + * + * Callers may choose to retry this call if @fw_status is + * %RTAS_THREADS_ACTIVE. + * + * Return: + * 0 - The partition has resumed from suspend, possibly after + * migration to a different host. + * -ECANCELED - The operation was aborted. + * -EAGAIN- There were other CPUs not in H_JOIN at the time of the call. + * -EBUSY - Some other condition prevented the suspend from succeeding. + * -EIO - Hardware/platform error. + */ +int rtas_ibm_suspend_me(int *fw_status) +{ + int fwrc; + int ret; + + fwrc = rtas_call(rtas_token("ibm,suspend-me"), 0, 1, NULL); + + switch (fwrc) { + case 0: + ret = 0; + break; + case RTAS_SUSPEND_ABORTED: + ret = -ECANCELED; + break; + case RTAS_THREADS_ACTIVE: + ret = -EAGAIN; + break; + case RTAS_NOT_SUSPENDABLE: + case RTAS_OUTSTANDING_COPROC: + ret = -EBUSY; + break; + case -1: + default: + ret = -EIO; + break; + } + + if (fw_status) + *fw_status = fwrc; + + return ret; +} + void __noreturn rtas_restart(char *cmd) { if (rtas_flash_term_hook) -- 2.25.4
[PATCH 00/29] partition suspend updates
This series aims to improve the pseries-specific partition migration and hibernation implementation, part of which has been living in kernel/rtas.c. Most of that code is eliminated or moved to platforms/pseries, and the following major functional changes are made: - Use stop_machine() instead of on_each_cpu() to avoid deadlock in the join/suspend sequence. - Retry the join/suspend sequence on errors that are likely to be transient. This is a mitigation for the fact that drivers currently have no way to prepare for an impending partition suspension, sometimes resulting in a virtual adapter being in a state which causes the platform to fail the suspend call. - Request cancellation of the migration via H_VASI_SIGNAL if Linux is going to error out of the suspend attempt. This allows the management console and other entities to promptly clean up their operations instead of relying on long timeouts to fail the migration. - Little-endian users of ibm,suspend-me, ibm,update-nodes and ibm,update-properties via sys_rtas are blocked when CONFIG_PPC_RTAS_FILTERS is enabled. - Legacy user space code (drmgr) historically has driven the migration process by using sys_rtas to separately call ibm,suspend-me, ibm,activate-firmware, and ibm,update-nodes/properties, in that order. With these changes, when sys_rtas() dispatches ibm,suspend-me, the kernel performs the device tree update and firmware activation before returning. This is more reliable, and drmgr does not seem bothered by it. - If the H_VASI_STATE hcall is absent, the implementation proceeds with the suspend instead of erroring out. This allows us to exercise these code paths in QEMU. Nathan Lynch (29): powerpc/rtas: move rtas_call_reentrant() out of pseries guards powerpc/rtas: prevent suspend-related sys_rtas use on LE powerpc/rtas: complete ibm,suspend-me status codes powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe powerpc/rtas: add rtas_ibm_suspend_me() powerpc/rtas: add rtas_activate_firmware() powerpc/hvcall: add token and codes for H_VASI_SIGNAL powerpc/pseries/mobility: don't error on absence of ibm,update-nodes powerpc/pseries/mobility: add missing break to default case powerpc/pseries/mobility: error message improvements powerpc/pseries/mobility: use rtas_activate_firmware() on resume powerpc/pseries/mobility: extract VASI session polling logic powerpc/pseries/mobility: use stop_machine for join/suspend powerpc/pseries/mobility: signal suspend cancellation to platform powerpc/pseries/mobility: retry partition suspend after error powerpc/rtas: dispatch partition migration requests to pseries powerpc/rtas: remove rtas_ibm_suspend_me_unsafe() powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops powerpc/pseries/hibernation: pass stream id via function arguments powerpc/pseries/hibernation: remove pseries_suspend_cpu() powerpc/machdep: remove suspend_disable_cpu() powerpc/rtas: remove rtas_suspend_cpu() powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me() powerpc/rtas: remove unused rtas_suspend_last_cpu() powerpc/pseries/hibernation: remove redundant cacheinfo update powerpc/pseries/hibernation: perform post-suspend fixups later powerpc/pseries/hibernation: remove prepare_late() callback powerpc/rtas: remove unused rtas_suspend_me_data powerpc/pseries/mobility: refactor node lookup during DT update arch/powerpc/include/asm/hvcall.h | 9 + arch/powerpc/include/asm/machdep.h| 1 - arch/powerpc/include/asm/rtas-types.h | 8 - arch/powerpc/include/asm/rtas.h | 13 +- arch/powerpc/kernel/rtas.c| 245 ++- arch/powerpc/platforms/pseries/mobility.c | 361 ++ arch/powerpc/platforms/pseries/suspend.c | 79 + 7 files changed, 420 insertions(+), 296 deletions(-) -- 2.25.4
[PATCH 04/29] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe
The pseries partition suspend sequence requires that all active CPUs call H_JOIN, which suspends all but one of them with interrupts disabled. The "chosen" CPU is then to call ibm,suspend-me to complete the suspend. Upon returning from ibm,suspend-me, the chosen CPU is to use H_PROD to wake the joined CPUs. Using on_each_cpu() for this, as rtas_ibm_suspend_me() does to implement partition migration, is susceptible to deadlock with other users of on_each_cpu() and with users of stop_machine APIs. The callback passed to on_each_cpu() is not allowed to synchronize with other CPUs in the way it is used here. Complicating the fix is the fact that rtas_ibm_suspend_me() also occupies the function name that should be used to provide a more conventional wrapper for ibm,suspend-me. Rename rtas_ibm_suspend_me() to rtas_ibm_suspend_me_unsafe() to free up the name and indicate that it should not gain users. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 2 +- arch/powerpc/kernel/rtas.c| 6 +++--- arch/powerpc/platforms/pseries/mobility.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index f060181a0d32..8436ed01567b 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -257,7 +257,7 @@ extern int rtas_set_indicator_fast(int indicator, int index, int new_value); extern void rtas_progress(char *s, unsigned short hex); extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data); extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); -extern int rtas_ibm_suspend_me(u64 handle); +int rtas_ibm_suspend_me_unsafe(u64 handle); struct rtc_time; extern time64_t rtas_get_boot_time(void); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 132b2ae39009..33adefa84a42 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -843,7 +843,7 @@ static void rtas_percpu_suspend_me(void *info) __rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1); } -int rtas_ibm_suspend_me(u64 handle) +int rtas_ibm_suspend_me_unsafe(u64 handle) { long state; long rc; @@ -898,7 +898,7 @@ int rtas_ibm_suspend_me(u64 handle) return atomic_read(&data.error); } #else /* CONFIG_PPC_PSERIES */ -int rtas_ibm_suspend_me(u64 handle) +int rtas_ibm_suspend_me_unsafe(u64 handle) { return -ENOSYS; } @@ -1184,7 +1184,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) int rc = 0; u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32) | be32_to_cpu(args.args[1]); - rc = rtas_ibm_suspend_me(handle); + rc = rtas_ibm_suspend_me_unsafe(handle); if (rc == -EAGAIN) args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE); else if (rc == -EIO) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index d6f4162478a5..b6de65cbfcd9 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -370,7 +370,7 @@ static ssize_t migration_store(struct class *class, return rc; do { - rc = rtas_ibm_suspend_me(streamid); + rc = rtas_ibm_suspend_me_unsafe(streamid); if (rc == -EAGAIN) ssleep(1); } while (rc == -EAGAIN); -- 2.25.4
[PATCH 06/29] powerpc/rtas: add rtas_activate_firmware()
Provide a documented wrapper function for the ibm,activate-firmware service, which must be called after a partition migration or hibernation. If the function is absent or the call fails, the OS will continue to run normally with the current firmware, so there is no need to perform any recovery. Just log it and continue. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 1 + arch/powerpc/kernel/rtas.c | 30 ++ 2 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index b43165fc6c2a..fdefe6a974eb 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -247,6 +247,7 @@ extern void __noreturn rtas_restart(char *cmd); extern void rtas_power_off(void); extern void __noreturn rtas_halt(void); extern void rtas_os_term(char *str); +void rtas_activate_firmware(void); extern int rtas_get_sensor(int sensor, int index, int *state); extern int rtas_get_sensor_fast(int sensor, int index, int *state); extern int rtas_get_power_level(int powerdomain, int *level); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 70c570269d7b..58bbd69a233f 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -961,6 +961,36 @@ int rtas_ibm_suspend_me_unsafe(u64 handle) } #endif +/** + * rtas_activate_firmware() - Activate a new version of firmware. + * + * Activate a new version of partition firmware. The OS must call this + * after resuming from a partition hibernation or migration in order + * to maintain the ability to perform live firmware updates. It's not + * catastrophic for this method to be absent or to fail; just log the + * condition in that case. + * + * Context: This function may sleep. + */ +void rtas_activate_firmware(void) +{ + int token; + int fwrc; + + token = rtas_token("ibm,activate-firmware"); + if (token == RTAS_UNKNOWN_SERVICE) { + pr_notice("ibm,activate-firmware method unavailable\n"); + return; + } + + do { + fwrc = rtas_call(token, 0, 1, NULL); + } while (rtas_busy_delay(fwrc)); + + if (fwrc) + pr_err("ibm,activate-firmware failed (%i)\n", fwrc); +} + /** * rtas_call_reentrant() - Used for reentrant rtas calls * @token: Token for desired reentrant RTAS call -- 2.25.4
[PATCH 03/29] powerpc/rtas: complete ibm,suspend-me status codes
We don't completely account for the possible return codes for ibm,suspend-me. Add definitions for these. Signed-off-by: Nathan Lynch --- arch/powerpc/include/asm/rtas.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 55f9a154c95d..f060181a0d32 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -23,11 +23,16 @@ #define RTAS_RMOBUF_MAX (64 * 1024) /* RTAS return status codes */ -#define RTAS_NOT_SUSPENDABLE -9004 #define RTAS_BUSY -2/* RTAS Busy */ #define RTAS_EXTENDED_DELAY_MIN9900 #define RTAS_EXTENDED_DELAY_MAX9905 +/* statuses specific to ibm,suspend-me */ +#define RTAS_SUSPEND_ABORTED 9000 /* Suspension aborted */ +#define RTAS_NOT_SUSPENDABLE-9004 /* Partition not suspendable */ +#define RTAS_THREADS_ACTIVE -9005 /* Multiple processor threads active */ +#define RTAS_OUTSTANDING_COPROC -9006 /* Outstanding coprocessor operations */ + /* * In general to call RTAS use rtas_token("string") to lookup * an RTAS token for the given string (e.g. "event-scan"). -- 2.25.4
[PATCH 02/29] powerpc/rtas: prevent suspend-related sys_rtas use on LE
While drmgr has had work in some areas to make its RTAS syscall interactions endian-neutral, its code for performing partition migration via the syscall has never worked on LE. While it is able to complete ibm,suspend-me successfully, it crashes when attempting the subsequent ibm,update-nodes call. drmgr is the only known (or plausible) user of these ibm,suspend-me, ibm,update-nodes, and ibm,update-properties, so allow them only in big-endian configurations. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index b40fc892138b..132b2ae39009 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1049,9 +1049,11 @@ static struct rtas_filter rtas_filters[] __ro_after_init = { { "set-time-for-power-on", -1, -1, -1, -1, -1 }, { "ibm,set-system-parameter", -1, 1, -1, -1, -1 }, { "set-time-of-day", -1, -1, -1, -1, -1 }, +#ifdef CONFIG_CPU_BIG_ENDIAN { "ibm,suspend-me", -1, -1, -1, -1, -1 }, { "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 }, { "ibm,update-properties", -1, 0, -1, -1, -1, 4096 }, +#endif { "ibm,physical-attestation", -1, 0, 1, -1, -1 }, }; -- 2.25.4
[PATCH 01/29] powerpc/rtas: move rtas_call_reentrant() out of pseries guards
rtas_call_reentrant() isn't platform-dependent; move it out of CONFIG_PPC_PSERIES-guarded code. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 954f41676f69..b40fc892138b 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -897,6 +897,12 @@ int rtas_ibm_suspend_me(u64 handle) return atomic_read(&data.error); } +#else /* CONFIG_PPC_PSERIES */ +int rtas_ibm_suspend_me(u64 handle) +{ + return -ENOSYS; +} +#endif /** * rtas_call_reentrant() - Used for reentrant rtas calls @@ -948,13 +954,6 @@ int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...) return ret; } -#else /* CONFIG_PPC_PSERIES */ -int rtas_ibm_suspend_me(u64 handle) -{ - return -ENOSYS; -} -#endif - /** * Find a specific pseries error log in an RTAS extended event log. * @log: RTAS error/event log -- 2.25.4
Re: [PATCH] powerpc: add support for TIF_NOTIFY_SIGNAL
Jens Axboe writes: > Wire up TIF_NOTIFY_SIGNAL handling for powerpc. > > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Jens Axboe > --- > > 5.11 has support queued up for TIF_NOTIFY_SIGNAL, see this posting > for details: > > https://lore.kernel.org/io-uring/20201026203230.386348-1-ax...@kernel.dk/ > > As part of that work, I'm adding TIF_NOTIFY_SIGNAL support to all archs, > as that will enable a set of cleanups once all of them support it. I'm > happy carrying this patch if need be, or it can be funelled through the > arch tree. Let me know. Happy for you to take it along with the rest of the series. Acked-by: Michael Ellerman cheers > diff --git a/arch/powerpc/include/asm/thread_info.h > b/arch/powerpc/include/asm/thread_info.h > index 46a210b03d2b..53115ae61495 100644 > --- a/arch/powerpc/include/asm/thread_info.h > +++ b/arch/powerpc/include/asm/thread_info.h > @@ -90,6 +90,7 @@ void arch_setup_new_exec(void); > #define TIF_SYSCALL_TRACE0 /* syscall trace active */ > #define TIF_SIGPENDING 1 /* signal pending */ > #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ > +#define TIF_NOTIFY_SIGNAL3 /* signal notifications exist */ > #define TIF_SYSCALL_EMU 4 /* syscall emulation active */ > #define TIF_RESTORE_TM 5 /* need to restore TM > FP/VEC/VSX */ > #define TIF_PATCH_PENDING6 /* pending live patching update */ > @@ -115,6 +116,7 @@ void arch_setup_new_exec(void); > #define _TIF_SYSCALL_TRACE (1< #define _TIF_SIGPENDING (1< #define _TIF_NEED_RESCHED(1< +#define _TIF_NOTIFY_SIGNAL (1< #define _TIF_POLLING_NRFLAG (1< #define _TIF_32BIT (1< #define _TIF_RESTORE_TM (1< @@ -136,7 +138,8 @@ void arch_setup_new_exec(void); > > #define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \ >_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ > - _TIF_RESTORE_TM | _TIF_PATCH_PENDING) > + _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \ > + _TIF_NOTIFY_SIGNAL) > #define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR) > > /* Bits in local_flags */ > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > index d2c356f37077..a8bb0aca1d02 100644 > --- a/arch/powerpc/kernel/signal.c > +++ b/arch/powerpc/kernel/signal.c > @@ -318,7 +318,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long > thread_info_flags) > if (thread_info_flags & _TIF_PATCH_PENDING) > klp_update_patch_state(current); > > - if (thread_info_flags & _TIF_SIGPENDING) { > + if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) { > BUG_ON(regs != current->thread.regs); > do_signal(current); > } > -- > 2.29.0 > > -- > Jens Axboe
Test Results: RE: [V2,03/18] highmem: Provide generic variant of kmap_atomic*
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2, 05/18] arc/mm/highmem: Use generic kmap atomic implementation
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2, 07/18] csky/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Re: [patch V2 00/18] mm/highmem: Preemptible variant of kmap_atomic & friends
On Thu, Oct 29 2020 at 16:11, Linus Torvalds wrote: > On Thu, Oct 29, 2020 at 3:32 PM Thomas Gleixner wrote: >> >> Though I wanted to share the current state of affairs before investigating >> that further. If there is consensus in going forward with this, I'll have a >> deeper look into this issue. > > Me likee. I think this looks like the right thing to do. > > I didn't actually apply the patches, but just from reading them it > _looks_ to me like you do the migrate_disable() unconditionally, even > if it's not a highmem page.. > > That sounds like it might be a good thing for debugging, but not > necessarily great in general. > > Or am I misreading things? No, you're not misreading it, but doing it conditionally would be a complete semantical disaster. kmap_atomic*() also disables preemption and pagefaults unconditionaly. If that wouldn't be the case then every caller would have to have conditionals like 'if (CONFIG_HIGHMEM)' or worse 'if (PageHighMem(page)'. Let's not go there. Migrate disable is a less horrible plague than preempt and pagefault disable even if the scheduler people disagree due to the lack of theory backing that up :) The charm of the new interface is that users still can rely on per cpuness independent of being on a highmem plagued system. For non highmem systems the extra migrate disable/enable is really a minor nuissance. Thanks, tglx
Test Results: RE: [V2,06/18] ARM: highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,08/18] microblaze/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2, 09/18] mips/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,10/18] nds32/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,11/18] powerpc/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,12/18] sparc/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,13/18] xtensa/mm/highmem: Switch to generic kmap atomic
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,16/18] sched: highmem: Store local kmaps in task struct
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,14/18] mm/highmem: Remove the old kmap_atomic cruft
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,15/18] io-mapping: Cleanup atomic iomap
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Thu, 2020-10-29 at 10:12 +0200, Mike Rapoport wrote: > This series goal was primarily to separate dependincies and make it > clearer what DEBUG_PAGEALLOC and what SET_DIRECT_MAP are. As it > turned > out, there is also some lack of consistency between architectures > that > implement either of this so I tried to improve this as well. > > Honestly, I don't know if a thread can be paused at the time > __vunmap() > left invalid pages, but it could, there is an issue on arm64 with > DEBUG_PAGEALLOC=n and this set fixes it. Ah, ok. So from this and the other thread, this is about the logic in arm's cpa for when it will try the un/map operations. I think the logic actually works currently. And this series introduces a problem on ARM similar to the one you are saying preexists. I put the details in the other thread.
Re: [PATCH 2/4] PM: hibernate: improve robustness of mapping pages in the direct map
On Thu, 2020-10-29 at 09:54 +0200, Mike Rapoport wrote: > __kernel_map_pages() on arm64 will also bail out if rodata_full is > false: > void __kernel_map_pages(struct page *page, int numpages, int enable) > { > if (!debug_pagealloc_enabled() && !rodata_full) > return; > > set_memory_valid((unsigned long)page_address(page), numpages, > enable); > } > > So using set_direct_map() to map back pages removed from the direct > map > with __kernel_map_pages() seems safe to me. Heh, one of us must have some simple boolean error in our head. I hope its not me! :) I'll try on more time. __kernel_map_pages() will bail out if rodata_full is false **AND** debug page alloc is off. So it will only bail under conditions where there could be nothing unmapped on the direct map. Equivalent logic would be: if (!(debug_pagealloc_enabled() || rodata_full)) return; Or: if (debug_pagealloc_enabled() || rodata_full) set_memory_valid(blah) So if either is on, the existing code will try to re-map. But the set_direct_map_()'s will only work if rodata_full is on. So switching hibernate to set_direct_map() will cause the remap to be missed for the debug page alloc case, with !rodata_full. It also breaks normal debug page alloc usage with !rodata_full for similar reasons after patch 3. The pages would never get unmapped.
Re: [patch V2 00/18] mm/highmem: Preemptible variant of kmap_atomic & friends
On Thu, Oct 29, 2020 at 3:32 PM Thomas Gleixner wrote: > > > Though I wanted to share the current state of affairs before investigating > that further. If there is consensus in going forward with this, I'll have a > deeper look into this issue. Me likee. I think this looks like the right thing to do. I didn't actually apply the patches, but just from reading them it _looks_ to me like you do the migrate_disable() unconditionally, even if it's not a highmem page.. That sounds like it might be a good thing for debugging, but not necessarily great in general. Or am I misreading things? Linus
Test Results: RE: [V2,17/18] mm/highmem: Provide kmap_local*
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Test Results: RE: [V2,18/18] io-mapping: Provide iomap_local variant
Thanks for your contribution, unfortunately we've found some issues. Your patch failed to apply to any branch.
Re: [PATCH 12/13] PCI: dwc: Move dw_pcie_setup_rc() to DWC common code
On 10/28/20, 4:47 PM, Rob Herring wrote: > > All RC complex drivers must call dw_pcie_setup_rc(). The ordering of the > call shouldn't be too important other than being after any RC resets. > > There's a few calls of dw_pcie_setup_rc() left as drivers implementing > suspend/resume need it. > > Cc: Kishon Vijay Abraham I > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Kukjin Kim > Cc: Krzysztof Kozlowski > Cc: Richard Zhu > Cc: Lucas Stach > Cc: Shawn Guo > Cc: Sascha Hauer > Cc: Pengutronix Kernel Team > Cc: Fabio Estevam > Cc: NXP Linux Team > Cc: Murali Karicheri > Cc: Minghuan Lian > Cc: Mingkai Hu > Cc: Roy Zang > Cc: Yue Wang > Cc: Kevin Hilman > Cc: Neil Armstrong > Cc: Jerome Brunet > Cc: Martin Blumenstingl > Cc: Thomas Petazzoni > Cc: Jesper Nilsson > Cc: Gustavo Pimentel > Cc: Xiaowei Song > Cc: Binghui Wang > Cc: Andy Gross > Cc: Bjorn Andersson > Cc: Stanimir Varbanov > Cc: Pratyush Anand > Cc: Kunihiko Hayashi > Cc: Masahiro Yamada > Cc: linux-o...@vger.kernel.org > Cc: linux-samsung-...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-amlo...@lists.infradead.org > Cc: linux-arm-ker...@axis.com > Cc: linux-arm-...@vger.kernel.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-dra7xx.c | 1 - > drivers/pci/controller/dwc/pci-exynos.c | 1 - > drivers/pci/controller/dwc/pci-imx6.c | 1 - > drivers/pci/controller/dwc/pci-keystone.c | 2 -- > drivers/pci/controller/dwc/pci-layerscape.c | 2 -- > drivers/pci/controller/dwc/pci-meson.c| 2 -- > drivers/pci/controller/dwc/pcie-armada8k.c| 2 -- > drivers/pci/controller/dwc/pcie-artpec6.c | 1 - > drivers/pci/controller/dwc/pcie-designware-host.c | 1 + > drivers/pci/controller/dwc/pcie-designware-plat.c | 8 > drivers/pci/controller/dwc/pcie-histb.c | 3 --- > drivers/pci/controller/dwc/pcie-kirin.c | 2 -- > drivers/pci/controller/dwc/pcie-qcom.c| 1 - > drivers/pci/controller/dwc/pcie-spear13xx.c | 2 -- > drivers/pci/controller/dwc/pcie-uniphier.c| 2 -- > 15 files changed, 1 insertion(+), 30 deletions(-) [...]
[patch V2 18/18] io-mapping: Provide iomap_local variant
Similar to kmap local provide a iomap local variant which only disables migration, but neither disables pagefaults nor preemption. Signed-off-by: Thomas Gleixner --- V2: Split out from the large combo patch and add the !IOMAP_ATOMIC variants --- include/linux/io-mapping.h | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -83,6 +83,23 @@ io_mapping_unmap_atomic(void __iomem *va } static inline void __iomem * +io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset) +{ + resource_size_t phys_addr; + + BUG_ON(offset >= mapping->size); + phys_addr = mapping->base + offset; + migrate_disable(); + return __iomap_local_pfn_prot(PHYS_PFN(phys_addr), mapping->prot); +} + +static inline void io_mapping_unmap_local(void __iomem *vaddr) +{ + kunmap_local_indexed((void __force *)vaddr); + migrate_enable(); +} + +static inline void __iomem * io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset, unsigned long size) @@ -101,7 +118,7 @@ io_mapping_unmap(void __iomem *vaddr) iounmap(vaddr); } -#else +#else /* HAVE_ATOMIC_IOMAP */ #include @@ -166,7 +183,20 @@ io_mapping_unmap_atomic(void __iomem *va preempt_enable(); } -#endif /* HAVE_ATOMIC_IOMAP */ +static inline void __iomem * +io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset) +{ + migrate_disable(); + return io_mapping_map_wc(mapping, offset, PAGE_SIZE); +} + +static inline void io_mapping_unmap_local(void __iomem *vaddr) +{ + io_mapping_unmap(vaddr); + migrate_enable(); +} + +#endif /* !HAVE_ATOMIC_IOMAP */ static inline struct io_mapping * io_mapping_create_wc(resource_size_t base,
[patch V2 17/18] mm/highmem: Provide kmap_local*
Now that the kmap atomic index is stored in task struct provide a preemptible variant. On context switch the maps of an outgoing task are removed and the map of the incoming task are restored. That's obviously slow, but highmem is slow anyway. The kmap_local.*() functions can be invoked from both preemptible and atomic context. kmap local sections disable migration to keep the resulting virtual mapping address correct, but disable neither pagefaults nor preemption. A wholesale conversion of kmap_atomic to be fully preemptible is not possible because some of the usage sites might rely on the preemption disable for serialization or on the implicit pagefault disable. Needs to be done on a case by case basis. Signed-off-by: Thomas Gleixner --- V2: Make it more consistent and add commentry --- include/linux/highmem.h | 115 +--- 1 file changed, 100 insertions(+), 15 deletions(-) --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -86,17 +86,56 @@ static inline void kunmap(struct page *p } /* - * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because - * no global lock is needed and because the kmap code must perform a global TLB - * invalidation when the kmap pool wraps. - * - * However when holding an atomic kmap it is not legal to sleep, so atomic - * kmaps are appropriate for short, tight code paths only. - * - * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap - * gives a more generic (and caching) interface. But kmap_atomic can - * be used in IRQ contexts, so in some (very limited) cases we need - * it. + * For highmem systems it is required to temporarily map pages + * which reside in the portion of memory which is not covered + * by the permanent kernel mapping. + * + * This comes in three flavors: + * + * 1) kmap/kunmap: + * + *An interface to acquire longer term mappings with no restrictions + *on preemption and migration. This comes with an overhead as the + *mapping space is restricted and protected by a global lock. It + *also requires global TLB invalidation when the kmap pool wraps. + * + *kmap() might block when the mapping space is fully utilized until a + *slot becomes available. Only callable from preemptible thread + *context. + * + * 2) kmap_local.*()/kunmap_local.*() + * + *An interface to acquire short term mappings. Can be invoked from any + *context including interrupts. The mapping is per thread, CPU local + *and not globaly visible. It can only be used in the context which + *acquried the mapping. Nesting kmap_local.*() and kmap_atomic.*() + *mappings is allowed to a certain extent (up to KMAP_TYPE_NR). + * + *Nested kmap_local.*() and kunmap_local.*() invocations have to be + *strictly ordered because the map implementation is stack based. + * + *kmap_local.*() disables migration, but keeps preemption enabled. It's + *valid to take pagefaults in a kmap_local region unless the context in + *which the local kmap is acquired does not allow it for other reasons. + * + *If a task holding local kmaps is preempted, the maps are removed on + *context switch and restored when the task comes back on the CPU. As + *the maps are strictly CPU local it is guaranteed that the task stays + *on the CPU and the CPU cannot be unplugged until the local kmaps are + *released. + * + * 3) kmap_atomic.*()/kunmap_atomic.*() + * + *Based on the same mechanism as kmap local. Atomic kmap disables + *preemption and pagefaults. Only use if absolutely required, use + *the corresponding kmap_local variant if possible. + * + * Local and atomic kmaps are faster than kmap/kunmap, but impose + * restrictions. Only use them when required. + * + * For !HIGHMEM enabled systems the kmap flavours are not doing any mapping + * operation and kmap() won't sleep, but the kmap local and atomic variants + * still disable migration resp. pagefaults and preemption. */ static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { @@ -122,6 +161,28 @@ static inline void __kunmap_atomic(void kunmap_local_indexed(addr); } +static inline void *kmap_local_page_prot(struct page *page, pgprot_t prot) +{ + migrate_disable(); + return __kmap_local_page_prot(page, prot); +} + +static inline void *kmap_local_page(struct page *page) +{ + return kmap_local_page_prot(page, kmap_prot); +} + +static inline void *kmap_local_pfn(unsigned long pfn) +{ + migrate_disable(); + return __kmap_local_pfn_prot(pfn, kmap_prot); +} + +static inline void __kunmap_local(void *vaddr) +{ + kunmap_local_indexed(vaddr); +} + /* declarations for linux/mm/highmem.c */ unsigned int nr_free_highpages(void); extern atomic_long_t _totalhigh_pages; @@ -201,10 +262,27 @@ static inline void *kmap_atomic_pfn(unsi static inline void __kunmap_atomic(void *addr) { - /* -* Mostly nothing to do in the
[patch V2 15/18] io-mapping: Cleanup atomic iomap
Switch the atomic iomap implementation over to kmap_local and stick the preempt/pagefault mechanics into the generic code similar to the kmap_atomic variants. Rename the x86 map function in preparation for a non-atomic variant. Signed-off-by: Thomas Gleixner --- V2: New patch to make review easier --- arch/x86/include/asm/iomap.h |9 + arch/x86/mm/iomap_32.c |6 ++ include/linux/io-mapping.h |8 ++-- 3 files changed, 9 insertions(+), 14 deletions(-) --- a/arch/x86/include/asm/iomap.h +++ b/arch/x86/include/asm/iomap.h @@ -13,14 +13,7 @@ #include #include -void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot); - -static inline void iounmap_atomic(void __iomem *vaddr) -{ - kunmap_local_indexed((void __force *)vaddr); - pagefault_enable(); - preempt_enable(); -} +void __iomem *__iomap_local_pfn_prot(unsigned long pfn, pgprot_t prot); int iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot); --- a/arch/x86/mm/iomap_32.c +++ b/arch/x86/mm/iomap_32.c @@ -44,7 +44,7 @@ void iomap_free(resource_size_t base, un } EXPORT_SYMBOL_GPL(iomap_free); -void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot) +void __iomem *__iomap_local_pfn_prot(unsigned long pfn, pgprot_t prot) { /* * For non-PAT systems, translate non-WB request to UC- just in @@ -60,8 +60,6 @@ void __iomem *iomap_atomic_pfn_prot(unsi /* Filter out unsupported __PAGE_KERNEL* bits: */ pgprot_val(prot) &= __default_kernel_pte_mask; - preempt_disable(); - pagefault_disable(); return (void __force __iomem *)__kmap_local_pfn_prot(pfn, prot); } -EXPORT_SYMBOL_GPL(iomap_atomic_pfn_prot); +EXPORT_SYMBOL_GPL(__iomap_local_pfn_prot); --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -69,13 +69,17 @@ io_mapping_map_atomic_wc(struct io_mappi BUG_ON(offset >= mapping->size); phys_addr = mapping->base + offset; - return iomap_atomic_pfn_prot(PHYS_PFN(phys_addr), mapping->prot); + preempt_disable(); + pagefault_disable(); + return __iomap_local_pfn_prot(PHYS_PFN(phys_addr), mapping->prot); } static inline void io_mapping_unmap_atomic(void __iomem *vaddr) { - iounmap_atomic(vaddr); + kunmap_local_indexed((void __force *)vaddr); + pagefault_enable(); + preempt_enable(); } static inline void __iomem *
[patch V2 14/18] mm/highmem: Remove the old kmap_atomic cruft
All users gone. Signed-off-by: Thomas Gleixner --- include/linux/highmem.h | 61 ++-- mm/highmem.c| 28 ++ 2 files changed, 27 insertions(+), 62 deletions(-) --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -88,31 +88,16 @@ static inline void kunmap(struct page *p * be used in IRQ contexts, so in some (very limited) cases we need * it. */ - -#ifndef CONFIG_KMAP_LOCAL -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot); -void kunmap_atomic_high(void *kvaddr); - static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { preempt_disable(); pagefault_disable(); - if (!PageHighMem(page)) - return page_address(page); - return kmap_atomic_high_prot(page, prot); -} - -static inline void __kunmap_atomic(void *vaddr) -{ - kunmap_atomic_high(vaddr); + return __kmap_local_page_prot(page, prot); } -#else /* !CONFIG_KMAP_LOCAL */ -static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) +static inline void *kmap_atomic(struct page *page) { - preempt_disable(); - pagefault_disable(); - return __kmap_local_page_prot(page, prot); + return kmap_atomic_prot(page, kmap_prot); } static inline void *kmap_atomic_pfn(unsigned long pfn) @@ -127,13 +112,6 @@ static inline void __kunmap_atomic(void kunmap_local_indexed(addr); } -#endif /* CONFIG_KMAP_LOCAL */ - -static inline void *kmap_atomic(struct page *page) -{ - return kmap_atomic_prot(page, kmap_prot); -} - /* declarations for linux/mm/highmem.c */ unsigned int nr_free_highpages(void); extern atomic_long_t _totalhigh_pages; @@ -226,39 +204,6 @@ static inline void __kunmap_atomic(void #endif /* CONFIG_HIGHMEM */ -#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32) - -DECLARE_PER_CPU(int, __kmap_atomic_idx); - -static inline int kmap_atomic_idx_push(void) -{ - int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1; - -#ifdef CONFIG_DEBUG_HIGHMEM - WARN_ON_ONCE(in_irq() && !irqs_disabled()); - BUG_ON(idx >= KM_TYPE_NR); -#endif - return idx; -} - -static inline int kmap_atomic_idx(void) -{ - return __this_cpu_read(__kmap_atomic_idx) - 1; -} - -static inline void kmap_atomic_idx_pop(void) -{ -#ifdef CONFIG_DEBUG_HIGHMEM - int idx = __this_cpu_dec_return(__kmap_atomic_idx); - - BUG_ON(idx < 0); -#else - __this_cpu_dec(__kmap_atomic_idx); -#endif -} - -#endif - /* * Prevent people trying to call kunmap_atomic() as if it were kunmap() * kunmap_atomic() should get the return value of kmap_atomic, not the page. --- a/mm/highmem.c +++ b/mm/highmem.c @@ -32,10 +32,6 @@ #include #include -#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32) -DEFINE_PER_CPU(int, __kmap_atomic_idx); -#endif - /* * Virtual_count is not a pure "count". * 0 means that it is not mapped, and has not been mapped @@ -370,6 +366,30 @@ EXPORT_SYMBOL(kunmap_high); #endif /* CONFIG_HIGHMEM */ #ifdef CONFIG_KMAP_LOCAL + +static DEFINE_PER_CPU(int, __kmap_atomic_idx); + +static inline int kmap_atomic_idx_push(void) +{ + int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1; + + WARN_ON_ONCE(in_irq() && !irqs_disabled()); + BUG_ON(idx >= KM_TYPE_NR); + return idx; +} + +static inline int kmap_atomic_idx(void) +{ + return __this_cpu_read(__kmap_atomic_idx) - 1; +} + +static inline void kmap_atomic_idx_pop(void) +{ + int idx = __this_cpu_dec_return(__kmap_atomic_idx); + + BUG_ON(idx < 0); +} + #ifndef arch_kmap_local_post_map # define arch_kmap_local_post_map(vaddr, pteval) do { } while (0) #endif
[patch V2 16/18] sched: highmem: Store local kmaps in task struct
Instead of storing the map per CPU provide and use per task storage. That prepares for local kmaps which are preemptible. The context switch code is preparatory and not yet in use because kmap_atomic() runs with preemption disabled. Will be made usable in the next step. The context switch logic is safe even when an interrupt happens after clearing or before restoring the kmaps. The kmap index in task struct is not modified so any nesting kmap in an interrupt will use unused indices and on return the counter is the same as before. Also add an assert into the return to user space code. Going back to user space with an active kmap local is a nono. Signed-off-by: Thomas Gleixner --- include/linux/highmem.h | 10 + include/linux/sched.h |9 kernel/entry/common.c |2 + kernel/fork.c |1 kernel/sched/core.c | 18 + mm/highmem.c| 96 +--- 6 files changed, 123 insertions(+), 13 deletions(-) --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -38,6 +38,16 @@ static inline void invalidate_kernel_vma void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot); void *__kmap_local_page_prot(struct page *page, pgprot_t prot); void kunmap_local_indexed(void *vaddr); +void kmap_local_fork(struct task_struct *tsk); +void __kmap_local_sched_out(void); +void __kmap_local_sched_in(void); +static inline void kmap_assert_nomap(void) +{ + DEBUG_LOCKS_WARN_ON(current->kmap_ctrl.idx); +} +#else +static inline void kmap_local_fork(struct task_struct *tsk) { } +static inline void kmap_assert_nomap(void) { } #endif #ifdef CONFIG_HIGHMEM --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -34,6 +34,7 @@ #include #include #include +#include /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @@ -629,6 +630,13 @@ struct wake_q_node { struct wake_q_node *next; }; +struct kmap_ctrl { +#ifdef CONFIG_KMAP_LOCAL + int idx; + pte_t pteval[KM_TYPE_NR]; +#endif +}; + struct task_struct { #ifdef CONFIG_THREAD_INFO_IN_TASK /* @@ -1294,6 +1302,7 @@ struct task_struct { unsigned intsequential_io; unsigned intsequential_io_avg; #endif + struct kmap_ctrlkmap_ctrl; #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; #endif --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -194,6 +195,7 @@ static void exit_to_user_mode_prepare(st /* Ensure that the address limit is intact and no locks are held */ addr_limit_user_check(); + kmap_assert_nomap(); lockdep_assert_irqs_disabled(); lockdep_sys_exit(); } --- a/kernel/fork.c +++ b/kernel/fork.c @@ -930,6 +930,7 @@ static struct task_struct *dup_task_stru account_kernel_stack(tsk, 1); kcov_task_init(tsk); + kmap_local_fork(tsk); #ifdef CONFIG_FAULT_INJECTION tsk->fail_nth = 0; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4053,6 +4053,22 @@ static inline void finish_lock_switch(st # define finish_arch_post_lock_switch()do { } while (0) #endif +static inline void kmap_local_sched_out(void) +{ +#ifdef CONFIG_KMAP_LOCAL + if (unlikely(current->kmap_ctrl.idx)) + __kmap_local_sched_out(); +#endif +} + +static inline void kmap_local_sched_in(void) +{ +#ifdef CONFIG_KMAP_LOCAL + if (unlikely(current->kmap_ctrl.idx)) + __kmap_local_sched_in(); +#endif +} + /** * prepare_task_switch - prepare to switch tasks * @rq: the runqueue preparing to switch @@ -4075,6 +4091,7 @@ prepare_task_switch(struct rq *rq, struc perf_event_task_sched_out(prev, next); rseq_preempt(prev); fire_sched_out_preempt_notifiers(prev, next); + kmap_local_sched_out(); prepare_task(next); prepare_arch_switch(next); } @@ -4141,6 +4158,7 @@ static struct rq *finish_task_switch(str finish_lock_switch(rq); finish_arch_post_lock_switch(); kcov_finish_switch(current); + kmap_local_sched_in(); fire_sched_in_preempt_notifiers(current); /* --- a/mm/highmem.c +++ b/mm/highmem.c @@ -367,27 +367,24 @@ EXPORT_SYMBOL(kunmap_high); #ifdef CONFIG_KMAP_LOCAL -static DEFINE_PER_CPU(int, __kmap_atomic_idx); - -static inline int kmap_atomic_idx_push(void) +static inline int kmap_local_idx_push(void) { - int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1; + int idx = current->kmap_ctrl.idx++; WARN_ON_ONCE(in_irq() && !irqs_disabled()); BUG_ON(idx >= KM_TYPE_NR); return idx; } -static inline int kmap_atomic_idx(void) +static inline int kmap_local_idx(void) { - return __this_cpu_read(__kmap_atomic_idx) - 1
[patch V2 13/18] xtensa/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture Signed-off-by: Thomas Gleixner Cc: Chris Zankel Cc: Max Filippov Cc: linux-xte...@linux-xtensa.org --- arch/xtensa/Kconfig |1 arch/xtensa/include/asm/highmem.h |9 +++ arch/xtensa/mm/highmem.c | 44 +++--- 3 files changed, 14 insertions(+), 40 deletions(-) --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -666,6 +666,7 @@ endchoice config HIGHMEM bool "High Memory Support" depends on MMU +select KMAP_LOCAL help Linux can use the full amount of RAM in the system by default. However, the default MMUv2 setup only maps the --- a/arch/xtensa/include/asm/highmem.h +++ b/arch/xtensa/include/asm/highmem.h @@ -68,6 +68,15 @@ static inline void flush_cache_kmaps(voi flush_cache_all(); } +enum fixed_addresses kmap_local_map_idx(int type, unsigned long pfn); +#define arch_kmap_local_map_idxkmap_local_map_idx + +enum fixed_addresses kmap_local_unmap_idx(int type, unsigned long addr); +#define arch_kmap_local_unmap_idx kmap_local_unmap_idx + +#define arch_kmap_local_post_unmap(vaddr) \ + local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE) + void kmap_init(void); #endif --- a/arch/xtensa/mm/highmem.c +++ b/arch/xtensa/mm/highmem.c @@ -12,8 +12,6 @@ #include #include -static pte_t *kmap_pte; - #if DCACHE_WAY_SIZE > PAGE_SIZE unsigned int last_pkmap_nr_arr[DCACHE_N_COLORS]; wait_queue_head_t pkmap_map_wait_arr[DCACHE_N_COLORS]; @@ -37,55 +35,21 @@ static inline enum fixed_addresses kmap_ color; } -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) +enum fixed_addresses kmap_local_map_idx(int type, unsigned long pfn) { - enum fixed_addresses idx; - unsigned long vaddr; - - idx = kmap_idx(kmap_atomic_idx_push(), - DCACHE_ALIAS(page_to_phys(page))); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); -#ifdef CONFIG_DEBUG_HIGHMEM - BUG_ON(!pte_none(*(kmap_pte + idx))); -#endif - set_pte(kmap_pte + idx, mk_pte(page, prot)); - - return (void *)vaddr; + return kmap_idx(type, DCACHE_ALIAS(pfn << PAGE_SHIFT); } -EXPORT_SYMBOL(kmap_atomic_high_prot); -void kunmap_atomic_high(void *kvaddr) +enum fixed_addresses kmap_local_unmap_idx(int type, unsigned long addr) { - if (kvaddr >= (void *)FIXADDR_START && - kvaddr < (void *)FIXADDR_TOP) { - int idx = kmap_idx(kmap_atomic_idx(), - DCACHE_ALIAS((unsigned long)kvaddr)); - - /* -* Force other mappings to Oops if they'll try to access this -* pte without first remap it. Keeping stale mappings around -* is a bad idea also, in case the page changes cacheability -* attributes or becomes a protected page in a hypervisor. -*/ - pte_clear(&init_mm, kvaddr, kmap_pte + idx); - local_flush_tlb_kernel_range((unsigned long)kvaddr, -(unsigned long)kvaddr + PAGE_SIZE); - - kmap_atomic_idx_pop(); - } + return kmap_idx(type, DCACHE_ALIAS(addr)); } -EXPORT_SYMBOL(kunmap_atomic_high); void __init kmap_init(void) { - unsigned long kmap_vstart; - /* Check if this memory layout is broken because PKMAP overlaps * page table. */ BUILD_BUG_ON(PKMAP_BASE < TLBTEMP_BASE_1 + TLBTEMP_SIZE); - /* cache the first kmap pte */ - kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN); - kmap_pte = virt_to_kpte(kmap_vstart); kmap_waitqueues_init(); }
[patch V2 12/18] sparc/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture Signed-off-by: Thomas Gleixner Cc: "David S. Miller" Cc: sparcli...@vger.kernel.org --- arch/sparc/Kconfig |1 arch/sparc/include/asm/highmem.h |7 +- arch/sparc/mm/Makefile |3 - arch/sparc/mm/highmem.c | 115 --- arch/sparc/mm/srmmu.c|2 5 files changed, 6 insertions(+), 122 deletions(-) --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -139,6 +139,7 @@ config MMU config HIGHMEM bool default y if SPARC32 +select KMAP_LOCAL config ZONE_DMA bool --- a/arch/sparc/include/asm/highmem.h +++ b/arch/sparc/include/asm/highmem.h @@ -33,8 +33,6 @@ extern unsigned long highstart_pfn, high #define kmap_prot __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE) extern pte_t *pkmap_page_table; -void kmap_init(void) __init; - /* * Right now we initialize only a single pte table. It can be extended * easily, subsequent pte tables have to be allocated in one physical @@ -53,6 +51,11 @@ void kmap_init(void) __init; #define flush_cache_kmaps()flush_cache_all() +/* FIXME: Use __flush_tlb_one(vaddr) instead of flush_cache_all() -- Anton */ +#define arch_kmap_local_post_map(vaddr, pteval)flush_cache_all() +#define arch_kmap_local_post_unmap(vaddr) flush_cache_all() + + #endif /* __KERNEL__ */ #endif /* _ASM_HIGHMEM_H */ --- a/arch/sparc/mm/Makefile +++ b/arch/sparc/mm/Makefile @@ -15,6 +15,3 @@ obj-$(CONFIG_SPARC32) += leon_mm.o # Only used by sparc64 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o - -# Only used by sparc32 -obj-$(CONFIG_HIGHMEM) += highmem.o --- a/arch/sparc/mm/highmem.c +++ /dev/null @@ -1,115 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * highmem.c: virtual kernel memory mappings for high memory - * - * Provides kernel-static versions of atomic kmap functions originally - * found as inlines in include/asm-sparc/highmem.h. These became - * needed as kmap_atomic() and kunmap_atomic() started getting - * called from within modules. - * -- Tomas Szepe , September 2002 - * - * But kmap_atomic() and kunmap_atomic() cannot be inlined in - * modules because they are loaded with btfixup-ped functions. - */ - -/* - * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap - * gives a more generic (and caching) interface. But kmap_atomic can - * be used in IRQ contexts, so in some (very limited) cases we need it. - * - * XXX This is an old text. Actually, it's good to use atomic kmaps, - * provided you remember that they are atomic and not try to sleep - * with a kmap taken, much like a spinlock. Non-atomic kmaps are - * shared by CPUs, and so precious, and establishing them requires IPI. - * Atomic kmaps are lightweight and we may have NCPUS more of them. - */ -#include -#include -#include - -#include -#include -#include - -static pte_t *kmap_pte; - -void __init kmap_init(void) -{ - unsigned long address = __fix_to_virt(FIX_KMAP_BEGIN); - -/* cache the first kmap pte */ -kmap_pte = virt_to_kpte(address); -} - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned long vaddr; - long idx, type; - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - -/* XXX Fix - Anton */ -#if 0 - __flush_cache_one(vaddr); -#else - flush_cache_all(); -#endif - -#ifdef CONFIG_DEBUG_HIGHMEM - BUG_ON(!pte_none(*(kmap_pte-idx))); -#endif - set_pte(kmap_pte-idx, mk_pte(page, prot)); -/* XXX Fix - Anton */ -#if 0 - __flush_tlb_one(vaddr); -#else - flush_tlb_all(); -#endif - - return (void*) vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - int type; - - if (vaddr < FIXADDR_START) - return; - - type = kmap_atomic_idx(); - -#ifdef CONFIG_DEBUG_HIGHMEM - { - unsigned long idx; - - idx = type + KM_TYPE_NR * smp_processor_id(); - BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN+idx)); - - /* XXX Fix - Anton */ -#if 0 - __flush_cache_one(vaddr); -#else - flush_cache_all(); -#endif - - /* -* force other mappings to Oops if they'll try to access -* this pte without first remap it -*/ - pte_clear(&init_mm, vaddr, kmap_pte-idx); - /* XXX Fix - Anton */ -#if 0 - __flush_tlb_one(vaddr); -#else - flush_tlb_all(); -#endif - } -#endif - - kmap_atomic_idx_pop(); -} -EXPORT_SYMBOL(kunmap_atomic_high); --- a/arch/sparc/mm/srmmu.c +++ b/arch/sparc/mm/srmmu.c @@ -971,8 +971,6 @@ void __init srmmu_paging_init(void) sparc_context_init(num_contexts); -
[patch V2 11/18] powerpc/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture Signed-off-by: Thomas Gleixner Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/Kconfig |1 arch/powerpc/include/asm/highmem.h |6 ++- arch/powerpc/mm/Makefile |1 arch/powerpc/mm/highmem.c | 67 - arch/powerpc/mm/mem.c |7 --- 5 files changed, 6 insertions(+), 76 deletions(-) --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -409,6 +409,7 @@ menu "Kernel options" config HIGHMEM bool "High memory support" depends on PPC32 +select KMAP_LOCAL source "kernel/Kconfig.hz" --- a/arch/powerpc/include/asm/highmem.h +++ b/arch/powerpc/include/asm/highmem.h @@ -29,7 +29,6 @@ #include #include -extern pte_t *kmap_pte; extern pte_t *pkmap_page_table; /* @@ -60,6 +59,11 @@ extern pte_t *pkmap_page_table; #define flush_cache_kmaps()flush_cache_all() +#define arch_kmap_local_post_map(vaddr, pteval)\ + local_flush_tlb_page(NULL, vaddr) +#define arch_kmap_local_post_unmap(vaddr) \ + local_flush_tlb_page(NULL, vaddr) + #endif /* __KERNEL__ */ #endif /* _ASM_HIGHMEM_H */ --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -16,7 +16,6 @@ obj-$(CONFIG_NEED_MULTIPLE_NODES) += num obj-$(CONFIG_PPC_MM_SLICES)+= slice.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o -obj-$(CONFIG_HIGHMEM) += highmem.o obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o obj-$(CONFIG_PPC_PTDUMP) += ptdump/ obj-$(CONFIG_KASAN)+= kasan/ --- a/arch/powerpc/mm/highmem.c +++ /dev/null @@ -1,67 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * highmem.c: virtual kernel memory mappings for high memory - * - * PowerPC version, stolen from the i386 version. - * - * Used in CONFIG_HIGHMEM systems for memory pages which - * are not addressable by direct kernel virtual addresses. - * - * Copyright (C) 1999 Gerhard Wichert, Siemens AG - * gerhard.wich...@pdb.siemens.de - * - * - * Redesigned the x86 32-bit VM architecture to deal with - * up to 16 Terrabyte physical memory. With current x86 CPUs - * we now support up to 64 Gigabytes physical RAM. - * - * Copyright (C) 1999 Ingo Molnar - * - * Reworked for PowerPC by various contributors. Moved from - * highmem.h by Benjamin Herrenschmidt (c) 2009 IBM Corp. - */ - -#include -#include - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned long vaddr; - int idx, type; - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - WARN_ON(IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !pte_none(*(kmap_pte - idx))); - __set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot), 1); - local_flush_tlb_page(NULL, vaddr); - - return (void*) vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - - if (vaddr < __fix_to_virt(FIX_KMAP_END)) - return; - - if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM)) { - int type = kmap_atomic_idx(); - unsigned int idx; - - idx = type + KM_TYPE_NR * smp_processor_id(); - WARN_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx)); - - /* -* force other mappings to Oops if they'll try to access -* this pte without first remap it -*/ - pte_clear(&init_mm, vaddr, kmap_pte-idx); - local_flush_tlb_page(NULL, vaddr); - } - - kmap_atomic_idx_pop(); -} -EXPORT_SYMBOL(kunmap_atomic_high); --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -61,11 +61,6 @@ unsigned long long memory_limit; bool init_mem_is_free; -#ifdef CONFIG_HIGHMEM -pte_t *kmap_pte; -EXPORT_SYMBOL(kmap_pte); -#endif - pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t vma_prot) { @@ -235,8 +230,6 @@ void __init paging_init(void) map_kernel_page(PKMAP_BASE, 0, __pgprot(0));/* XXX gross */ pkmap_page_table = virt_to_kpte(PKMAP_BASE); - - kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN)); #endif /* CONFIG_HIGHMEM */ printk(KERN_DEBUG "Top of RAM: 0x%llx, Total RAM: 0x%llx\n",
[patch V2 10/18] nds32/mm/highmem: Switch to generic kmap atomic
The mapping code is odd and looks broken. See FIXME in the comment. Signed-off-by: Thomas Gleixner Cc: Nick Hu Cc: Greentime Hu Cc: Vincent Chen diff --git a/arch/nds32/Kconfig.cpu b/arch/nds32/Kconfig.cpu index f88a12fdf0f3..c7add11ea36e 100644 --- a/arch/nds32/Kconfig.cpu +++ b/arch/nds32/Kconfig.cpu @@ -157,6 +157,7 @@ config HW_SUPPORT_UNALIGNMENT_ACCESS config HIGHMEM bool "High Memory Support" depends on MMU && !CPU_CACHE_ALIASING +select KMAP_LOCAL help The address space of Andes processors is only 4 Gigabytes large and it has to accommodate user address space, kernel address diff --git a/arch/nds32/include/asm/highmem.h b/arch/nds32/include/asm/highmem.h index fe986d0e6e3f..d844c282c090 100644 --- a/arch/nds32/include/asm/highmem.h +++ b/arch/nds32/include/asm/highmem.h @@ -45,11 +45,22 @@ extern pte_t *pkmap_page_table; extern void kmap_init(void); /* - * The following functions are already defined by - * when CONFIG_HIGHMEM is not set. + * FIXME: The below looks broken vs. a kmap_atomic() in task context which + * is interupted and another kmap_atomic() happens in interrupt context. + * But what do I know about nds32. -- tglx */ -#ifdef CONFIG_HIGHMEM -extern void *kmap_atomic_pfn(unsigned long pfn); -#endif +#define arch_kmap_local_post_map(vaddr, pteval)\ + do {\ + __nds32__tlbop_inv(vaddr); \ + __nds32__mtsr_dsb(vaddr, NDS32_SR_TLB_VPN); \ + __nds32__tlbop_rwr(pteval); \ + __nds32__isb(); \ + } while (0) + +#define arch_kmap_local_pre_unmap(vaddr, pte) \ + do {\ + __nds32__tlbop_inv(vaddr); \ + __nds32__isb(); \ + } while (0) #endif diff --git a/arch/nds32/mm/Makefile b/arch/nds32/mm/Makefile index 897ecaf5cf54..14fb2e8eb036 100644 --- a/arch/nds32/mm/Makefile +++ b/arch/nds32/mm/Makefile @@ -3,7 +3,6 @@ obj-y := extable.o tlb.o fault.o init.o mmap.o \ mm-nds32.o cacheflush.o proc.o obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o -obj-$(CONFIG_HIGHMEM) += highmem.o ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_proc.o = $(CC_FLAGS_FTRACE) diff --git a/arch/nds32/mm/highmem.c b/arch/nds32/mm/highmem.c deleted file mode 100644 index 4284cd59e21a.. --- a/arch/nds32/mm/highmem.c +++ /dev/null @@ -1,48 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -// Copyright (C) 2005-2017 Andes Technology Corporation - -#include -#include -#include -#include -#include -#include -#include -#include - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned int idx; - unsigned long vaddr, pte; - int type; - pte_t *ptep; - - type = kmap_atomic_idx_push(); - - idx = type + KM_TYPE_NR * smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - pte = (page_to_pfn(page) << PAGE_SHIFT) | prot; - ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr); - set_pte(ptep, pte); - - __nds32__tlbop_inv(vaddr); - __nds32__mtsr_dsb(vaddr, NDS32_SR_TLB_VPN); - __nds32__tlbop_rwr(pte); - __nds32__isb(); - return (void *)vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - if (kvaddr >= (void *)FIXADDR_START) { - unsigned long vaddr = (unsigned long)kvaddr; - pte_t *ptep; - kmap_atomic_idx_pop(); - __nds32__tlbop_inv(vaddr); - __nds32__isb(); - ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr); - set_pte(ptep, 0); - } -} -EXPORT_SYMBOL(kunmap_atomic_high);
[patch V2 09/18] mips/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture Signed-off-by: Thomas Gleixner Cc: Thomas Bogendoerfer Cc: linux-m...@vger.kernel.org diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 8f328298f8cc..ed6b3de944a8 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2654,6 +2654,7 @@ config MIPS_CRC_SUPPORT config HIGHMEM bool "High Memory Support" depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA + select KMAP_LOCAL config CPU_SUPPORTS_HIGHMEM bool diff --git a/arch/mips/include/asm/highmem.h b/arch/mips/include/asm/highmem.h index f1f788b57166..cb2e0fb8483b 100644 --- a/arch/mips/include/asm/highmem.h +++ b/arch/mips/include/asm/highmem.h @@ -48,11 +48,11 @@ extern pte_t *pkmap_page_table; #define ARCH_HAS_KMAP_FLUSH_TLB extern void kmap_flush_tlb(unsigned long addr); -extern void *kmap_atomic_pfn(unsigned long pfn); #define flush_cache_kmaps()BUG_ON(cpu_has_dc_aliases) -extern void kmap_init(void); +#define arch_kmap_local_post_map(vaddr, pteval) local_flush_tlb_one(vaddr) +#define arch_kmap_local_post_unmap(vaddr) local_flush_tlb_one(vaddr) #endif /* __KERNEL__ */ diff --git a/arch/mips/mm/highmem.c b/arch/mips/mm/highmem.c index 5fec7f45d79a..57e2f08f00d0 100644 --- a/arch/mips/mm/highmem.c +++ b/arch/mips/mm/highmem.c @@ -8,8 +8,6 @@ #include #include -static pte_t *kmap_pte; - unsigned long highstart_pfn, highend_pfn; void kmap_flush_tlb(unsigned long addr) @@ -17,78 +15,3 @@ void kmap_flush_tlb(unsigned long addr) flush_tlb_one(addr); } EXPORT_SYMBOL(kmap_flush_tlb); - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned long vaddr; - int idx, type; - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); -#ifdef CONFIG_DEBUG_HIGHMEM - BUG_ON(!pte_none(*(kmap_pte - idx))); -#endif - set_pte(kmap_pte-idx, mk_pte(page, prot)); - local_flush_tlb_one((unsigned long)vaddr); - - return (void*) vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - int type __maybe_unused; - - if (vaddr < FIXADDR_START) - return; - - type = kmap_atomic_idx(); -#ifdef CONFIG_DEBUG_HIGHMEM - { - int idx = type + KM_TYPE_NR * smp_processor_id(); - - BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx)); - - /* -* force other mappings to Oops if they'll try to access -* this pte without first remap it -*/ - pte_clear(&init_mm, vaddr, kmap_pte-idx); - local_flush_tlb_one(vaddr); - } -#endif - kmap_atomic_idx_pop(); -} -EXPORT_SYMBOL(kunmap_atomic_high); - -/* - * This is the same as kmap_atomic() but can map memory that doesn't - * have a struct page associated with it. - */ -void *kmap_atomic_pfn(unsigned long pfn) -{ - unsigned long vaddr; - int idx, type; - - preempt_disable(); - pagefault_disable(); - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL)); - flush_tlb_one(vaddr); - - return (void*) vaddr; -} - -void __init kmap_init(void) -{ - unsigned long kmap_vstart; - - /* cache the first kmap pte */ - kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN); - kmap_pte = virt_to_kpte(kmap_vstart); -} diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c index 6c7bbfe35ba3..e5de8e9c2ede 100644 --- a/arch/mips/mm/init.c +++ b/arch/mips/mm/init.c @@ -402,9 +402,6 @@ void __init paging_init(void) pagetable_init(); -#ifdef CONFIG_HIGHMEM - kmap_init(); -#endif #ifdef CONFIG_ZONE_DMA max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN; #endif
[patch V2 08/18] microblaze/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture. Signed-off-by: Thomas Gleixner Cc: Michal Simek diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index d262ac0c8714..186a0526564c 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -170,6 +170,7 @@ config XILINX_UNCACHED_SHADOW config HIGHMEM bool "High memory support" depends on MMU + select KMAP_LOCAL help The address space of Microblaze processors is only 4 Gigabytes large and it has to accommodate user address space, kernel address diff --git a/arch/microblaze/include/asm/highmem.h b/arch/microblaze/include/asm/highmem.h index 284ca8fb54c1..4418633fb163 100644 --- a/arch/microblaze/include/asm/highmem.h +++ b/arch/microblaze/include/asm/highmem.h @@ -25,7 +25,6 @@ #include #include -extern pte_t *kmap_pte; extern pte_t *pkmap_page_table; /* @@ -52,6 +51,11 @@ extern pte_t *pkmap_page_table; #define flush_cache_kmaps(){ flush_icache(); flush_dcache(); } +#define arch_kmap_local_post_map(vaddr, pteval)\ + local_flush_tlb_page(NULL, vaddr); +#define arch_kmap_local_post_unmap(vaddr) \ + local_flush_tlb_page(NULL, vaddr); + #endif /* __KERNEL__ */ #endif /* _ASM_HIGHMEM_H */ diff --git a/arch/microblaze/mm/Makefile b/arch/microblaze/mm/Makefile index 1b16875cea70..8ced71100047 100644 --- a/arch/microblaze/mm/Makefile +++ b/arch/microblaze/mm/Makefile @@ -6,4 +6,3 @@ obj-y := consistent.o init.o obj-$(CONFIG_MMU) += pgtable.o mmu_context.o fault.o -obj-$(CONFIG_HIGHMEM) += highmem.o diff --git a/arch/microblaze/mm/highmem.c b/arch/microblaze/mm/highmem.c deleted file mode 100644 index 92e0890416c9.. --- a/arch/microblaze/mm/highmem.c +++ /dev/null @@ -1,78 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * highmem.c: virtual kernel memory mappings for high memory - * - * PowerPC version, stolen from the i386 version. - * - * Used in CONFIG_HIGHMEM systems for memory pages which - * are not addressable by direct kernel virtual addresses. - * - * Copyright (C) 1999 Gerhard Wichert, Siemens AG - * gerhard.wich...@pdb.siemens.de - * - * - * Redesigned the x86 32-bit VM architecture to deal with - * up to 16 Terrabyte physical memory. With current x86 CPUs - * we now support up to 64 Gigabytes physical RAM. - * - * Copyright (C) 1999 Ingo Molnar - * - * Reworked for PowerPC by various contributors. Moved from - * highmem.h by Benjamin Herrenschmidt (c) 2009 IBM Corp. - */ - -#include -#include - -/* - * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap - * gives a more generic (and caching) interface. But kmap_atomic can - * be used in IRQ contexts, so in some (very limited) cases we need - * it. - */ -#include - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - - unsigned long vaddr; - int idx, type; - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); -#ifdef CONFIG_DEBUG_HIGHMEM - BUG_ON(!pte_none(*(kmap_pte-idx))); -#endif - set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot)); - local_flush_tlb_page(NULL, vaddr); - - return (void *) vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - int type; - unsigned int idx; - - if (vaddr < __fix_to_virt(FIX_KMAP_END)) - return; - - type = kmap_atomic_idx(); - - idx = type + KM_TYPE_NR * smp_processor_id(); -#ifdef CONFIG_DEBUG_HIGHMEM - BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx)); -#endif - /* -* force other mappings to Oops if they'll try to access -* this pte without first remap it -*/ - pte_clear(&init_mm, vaddr, kmap_pte-idx); - local_flush_tlb_page(NULL, vaddr); - - kmap_atomic_idx_pop(); -} -EXPORT_SYMBOL(kunmap_atomic_high); diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c index 3344d4a1fe89..3f4e41787a4e 100644 --- a/arch/microblaze/mm/init.c +++ b/arch/microblaze/mm/init.c @@ -49,17 +49,11 @@ unsigned long lowmem_size; EXPORT_SYMBOL(min_low_pfn); EXPORT_SYMBOL(max_low_pfn); -#ifdef CONFIG_HIGHMEM -pte_t *kmap_pte; -EXPORT_SYMBOL(kmap_pte); - static void __init highmem_init(void) { pr_debug("%x\n", (u32)PKMAP_BASE); map_page(PKMAP_BASE, 0, 0); /* XXX gross */ pkmap_page_table = virt_to_kpte(PKMAP_BASE); - - kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN)); } static void highmem_setup(void)
[patch V2 06/18] ARM: highmem: Switch to generic kmap atomic
No reason having the same code in every architecture. Signed-off-by: Thomas Gleixner Cc: Russell King Cc: Arnd Bergmann Cc: linux-arm-ker...@lists.infradead.org diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e00d94b16658..410235e350cc 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1499,6 +1499,7 @@ config HAVE_ARCH_PFN_VALID config HIGHMEM bool "High Memory Support" depends on MMU + select KMAP_LOCAL help The address space of ARM processors is only 4 Gigabytes large and it has to accommodate user address space, kernel address diff --git a/arch/arm/include/asm/highmem.h b/arch/arm/include/asm/highmem.h index 31811be38d78..99a99862c474 100644 --- a/arch/arm/include/asm/highmem.h +++ b/arch/arm/include/asm/highmem.h @@ -46,19 +46,32 @@ extern pte_t *pkmap_page_table; #ifdef ARCH_NEEDS_KMAP_HIGH_GET extern void *kmap_high_get(struct page *page); -#else + +static inline void *arch_kmap_local_high_get(struct page *page) +{ + if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !cache_is_vivt()) + return NULL; + return kmap_high_get(page); +} +#define arch_kmap_local_high_get arch_kmap_local_high_get + +#else /* ARCH_NEEDS_KMAP_HIGH_GET */ static inline void *kmap_high_get(struct page *page) { return NULL; } -#endif +#endif /* !ARCH_NEEDS_KMAP_HIGH_GET */ -/* - * The following functions are already defined by - * when CONFIG_HIGHMEM is not set. - */ -#ifdef CONFIG_HIGHMEM -extern void *kmap_atomic_pfn(unsigned long pfn); -#endif +#define arch_kmap_local_post_map(vaddr, pteval) \ + local_flush_tlb_kernel_page(vaddr) + +#define arch_kmap_local_pre_unmap(vaddr) \ +do { \ + if (cache_is_vivt())\ + __cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE); \ +} while (0) + +#define arch_kmap_local_post_unmap(vaddr) \ + local_flush_tlb_kernel_page(vaddr) #endif diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 7cb1699fbfc4..c4ce477c5261 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -19,7 +19,6 @@ obj-$(CONFIG_MODULES) += proc-syms.o obj-$(CONFIG_DEBUG_VIRTUAL)+= physaddr.o obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o -obj-$(CONFIG_HIGHMEM) += highmem.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_ARM_PV_FIXUP) += pv-fixup-asm.o diff --git a/arch/arm/mm/highmem.c b/arch/arm/mm/highmem.c deleted file mode 100644 index 187fab227b50.. --- a/arch/arm/mm/highmem.c +++ /dev/null @@ -1,121 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * arch/arm/mm/highmem.c -- ARM highmem support - * - * Author: Nicolas Pitre - * Created:september 8, 2008 - * Copyright: Marvell Semiconductors Inc. - */ - -#include -#include -#include -#include -#include -#include -#include "mm.h" - -static inline void set_fixmap_pte(int idx, pte_t pte) -{ - unsigned long vaddr = __fix_to_virt(idx); - pte_t *ptep = virt_to_kpte(vaddr); - - set_pte_ext(ptep, pte, 0); - local_flush_tlb_kernel_page(vaddr); -} - -static inline pte_t get_fixmap_pte(unsigned long vaddr) -{ - pte_t *ptep = virt_to_kpte(vaddr); - - return *ptep; -} - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned int idx; - unsigned long vaddr; - void *kmap; - int type; - -#ifdef CONFIG_DEBUG_HIGHMEM - /* -* There is no cache coherency issue when non VIVT, so force the -* dedicated kmap usage for better debugging purposes in that case. -*/ - if (!cache_is_vivt()) - kmap = NULL; - else -#endif - kmap = kmap_high_get(page); - if (kmap) - return kmap; - - type = kmap_atomic_idx_push(); - - idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id(); - vaddr = __fix_to_virt(idx); -#ifdef CONFIG_DEBUG_HIGHMEM - /* -* With debugging enabled, kunmap_atomic forces that entry to 0. -* Make sure it was indeed properly unmapped. -*/ - BUG_ON(!pte_none(get_fixmap_pte(vaddr))); -#endif - /* -* When debugging is off, kunmap_atomic leaves the previous mapping -* in place, so the contained TLB flush ensures the TLB is updated -* with the new mapping. -*/ - set_fixmap_pte(idx, mk_pte(page, prot)); - - return (void *)vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - int idx, type; - - if (kvaddr >= (void *)FIXADDR_START) { - type = kmap_atomic_idx(); - idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id(); - - if (cache_i
[patch V2 07/18] csky/mm/highmem: Switch to generic kmap atomic
No reason having the same code in every architecture. Signed-off-by: Thomas Gleixner Acked-by: Guo Ren Cc: linux-c...@vger.kernel.org --- arch/csky/Kconfig |1 arch/csky/include/asm/highmem.h |4 +- arch/csky/mm/highmem.c | 75 3 files changed, 5 insertions(+), 75 deletions(-) --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -286,6 +286,7 @@ config NR_CPUS config HIGHMEM bool "High Memory Support" depends on !CPU_CK610 + select KMAP_LOCAL default y config FORCE_MAX_ZONEORDER --- a/arch/csky/include/asm/highmem.h +++ b/arch/csky/include/asm/highmem.h @@ -32,10 +32,12 @@ extern pte_t *pkmap_page_table; #define ARCH_HAS_KMAP_FLUSH_TLB extern void kmap_flush_tlb(unsigned long addr); -extern void *kmap_atomic_pfn(unsigned long pfn); #define flush_cache_kmaps() do {} while (0) +#define arch_kmap_local_post_map(vaddr, pteval)kmap_flush_tlb(vaddr) +#define arch_kmap_local_post_unmap(vaddr) kmap_flush_tlb(vaddr) + extern void kmap_init(void); #endif /* __KERNEL__ */ --- a/arch/csky/mm/highmem.c +++ b/arch/csky/mm/highmem.c @@ -9,8 +9,6 @@ #include #include -static pte_t *kmap_pte; - unsigned long highstart_pfn, highend_pfn; void kmap_flush_tlb(unsigned long addr) @@ -19,67 +17,7 @@ void kmap_flush_tlb(unsigned long addr) } EXPORT_SYMBOL(kmap_flush_tlb); -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned long vaddr; - int idx, type; - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); -#ifdef CONFIG_DEBUG_HIGHMEM - BUG_ON(!pte_none(*(kmap_pte - idx))); -#endif - set_pte(kmap_pte-idx, mk_pte(page, prot)); - flush_tlb_one((unsigned long)vaddr); - - return (void *)vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - int idx; - - if (vaddr < FIXADDR_START) - return; - -#ifdef CONFIG_DEBUG_HIGHMEM - idx = KM_TYPE_NR*smp_processor_id() + kmap_atomic_idx(); - - BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx)); - - pte_clear(&init_mm, vaddr, kmap_pte - idx); - flush_tlb_one(vaddr); -#else - (void) idx; /* to kill a warning */ -#endif - kmap_atomic_idx_pop(); -} -EXPORT_SYMBOL(kunmap_atomic_high); - -/* - * This is the same as kmap_atomic() but can map memory that doesn't - * have a struct page associated with it. - */ -void *kmap_atomic_pfn(unsigned long pfn) -{ - unsigned long vaddr; - int idx, type; - - pagefault_disable(); - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL)); - flush_tlb_one(vaddr); - - return (void *) vaddr; -} - -static void __init kmap_pages_init(void) +void __init kmap_init(void) { unsigned long vaddr; pgd_t *pgd; @@ -96,14 +34,3 @@ static void __init kmap_pages_init(void) pte = pte_offset_kernel(pmd, vaddr); pkmap_page_table = pte; } - -void __init kmap_init(void) -{ - unsigned long vaddr; - - kmap_pages_init(); - - vaddr = __fix_to_virt(FIX_KMAP_BEGIN); - - kmap_pte = pte_offset_kernel((pmd_t *)pgd_offset_k(vaddr), vaddr); -}
[patch V2 05/18] arc/mm/highmem: Use generic kmap atomic implementation
Adopt the map ordering to match the other architectures and the generic code. Signed-off-by: Thomas Gleixner Cc: Vineet Gupta Cc: linux-snps-...@lists.infradead.org --- arch/arc/Kconfig |1 arch/arc/include/asm/highmem.h |8 ++- arch/arc/mm/highmem.c | 44 - 3 files changed, 9 insertions(+), 44 deletions(-) --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -507,6 +507,7 @@ config LINUX_RAM_BASE config HIGHMEM bool "High Memory Support" select ARCH_DISCONTIGMEM_ENABLE + select KMAP_LOCAL help With ARC 2G:2G address split, only upper 2G is directly addressable by kernel. Enable this to potentially allow access to rest of 2G and PAE --- a/arch/arc/include/asm/highmem.h +++ b/arch/arc/include/asm/highmem.h @@ -15,7 +15,10 @@ #define FIXMAP_BASE(PAGE_OFFSET - FIXMAP_SIZE - PKMAP_SIZE) #define FIXMAP_SIZEPGDIR_SIZE /* only 1 PGD worth */ #define KM_TYPE_NR ((FIXMAP_SIZE >> PAGE_SHIFT)/NR_CPUS) -#define FIXMAP_ADDR(nr)(FIXMAP_BASE + ((nr) << PAGE_SHIFT)) + +#define FIX_KMAP_BEGIN (0) +#define FIX_KMAP_END ((FIXMAP_SIZE >> PAGE_SHIFT) - 1) +#define FIXADDR_TOP(FIXMAP_BASE + FIXMAP_SIZE - PAGE_SIZE) /* start after fixmap area */ #define PKMAP_BASE (FIXMAP_BASE + FIXMAP_SIZE) @@ -29,6 +32,9 @@ extern void kmap_init(void); +#define arch_kmap_local_post_unmap(vaddr) \ + local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE) + static inline void flush_cache_kmaps(void) { flush_cache_all(); --- a/arch/arc/mm/highmem.c +++ b/arch/arc/mm/highmem.c @@ -47,48 +47,6 @@ */ extern pte_t * pkmap_page_table; -static pte_t * fixmap_page_table; - -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - int idx, cpu_idx; - unsigned long vaddr; - - cpu_idx = kmap_atomic_idx_push(); - idx = cpu_idx + KM_TYPE_NR * smp_processor_id(); - vaddr = FIXMAP_ADDR(idx); - - set_pte_at(&init_mm, vaddr, fixmap_page_table + idx, - mk_pte(page, prot)); - - return (void *)vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -void kunmap_atomic_high(void *kv) -{ - unsigned long kvaddr = (unsigned long)kv; - - if (kvaddr >= FIXMAP_BASE && kvaddr < (FIXMAP_BASE + FIXMAP_SIZE)) { - - /* -* Because preemption is disabled, this vaddr can be associated -* with the current allocated index. -* But in case of multiple live kmap_atomic(), it still relies on -* callers to unmap in right order. -*/ - int cpu_idx = kmap_atomic_idx(); - int idx = cpu_idx + KM_TYPE_NR * smp_processor_id(); - - WARN_ON(kvaddr != FIXMAP_ADDR(idx)); - - pte_clear(&init_mm, kvaddr, fixmap_page_table + idx); - local_flush_tlb_kernel_range(kvaddr, kvaddr + PAGE_SIZE); - - kmap_atomic_idx_pop(); - } -} -EXPORT_SYMBOL(kunmap_atomic_high); static noinline pte_t * __init alloc_kmap_pgtable(unsigned long kvaddr) { @@ -113,5 +71,5 @@ void __init kmap_init(void) pkmap_page_table = alloc_kmap_pgtable(PKMAP_BASE); BUILD_BUG_ON(LAST_PKMAP > PTRS_PER_PTE); - fixmap_page_table = alloc_kmap_pgtable(FIXMAP_BASE); + alloc_kmap_pgtable(FIXMAP_BASE); }
[patch V2 03/18] highmem: Provide generic variant of kmap_atomic*
The kmap_atomic* interfaces in all architectures are pretty much the same except for post map operations (flush) and pre- and post unmap operations. Provide a generic variant for that. Signed-off-by: Thomas Gleixner Cc: Andrew Morton Cc: linux...@kvack.org --- V2: Address review comments from Christoph (style and EXPORT variant) --- include/linux/highmem.h | 79 ++-- mm/Kconfig |3 + mm/highmem.c| 118 +++- 3 files changed, 183 insertions(+), 17 deletions(-) --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -31,9 +31,16 @@ static inline void invalidate_kernel_vma #include +/* + * Outside of CONFIG_HIGHMEM to support X86 32bit iomap_atomic() cruft. + */ +#ifdef CONFIG_KMAP_LOCAL +void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot); +void *__kmap_local_page_prot(struct page *page, pgprot_t prot); +void kunmap_local_indexed(void *vaddr); +#endif + #ifdef CONFIG_HIGHMEM -extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot); -extern void kunmap_atomic_high(void *kvaddr); #include #ifndef ARCH_HAS_KMAP_FLUSH_TLB @@ -81,6 +88,11 @@ static inline void kunmap(struct page *p * be used in IRQ contexts, so in some (very limited) cases we need * it. */ + +#ifndef CONFIG_KMAP_LOCAL +void *kmap_atomic_high_prot(struct page *page, pgprot_t prot); +void kunmap_atomic_high(void *kvaddr); + static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { preempt_disable(); @@ -89,7 +101,38 @@ static inline void *kmap_atomic_prot(str return page_address(page); return kmap_atomic_high_prot(page, prot); } -#define kmap_atomic(page) kmap_atomic_prot(page, kmap_prot) + +static inline void __kunmap_atomic(void *vaddr) +{ + kunmap_atomic_high(vaddr); +} +#else /* !CONFIG_KMAP_LOCAL */ + +static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) +{ + preempt_disable(); + pagefault_disable(); + return __kmap_local_page_prot(page, prot); +} + +static inline void *kmap_atomic_pfn(unsigned long pfn) +{ + preempt_disable(); + pagefault_disable(); + return __kmap_local_pfn_prot(pfn, kmap_prot); +} + +static inline void __kunmap_atomic(void *addr) +{ + kunmap_local_indexed(addr); +} + +#endif /* CONFIG_KMAP_LOCAL */ + +static inline void *kmap_atomic(struct page *page) +{ + return kmap_atomic_prot(page, kmap_prot); +} /* declarations for linux/mm/highmem.c */ unsigned int nr_free_highpages(void); @@ -157,21 +200,28 @@ static inline void *kmap_atomic(struct p pagefault_disable(); return page_address(page); } -#define kmap_atomic_prot(page, prot) kmap_atomic(page) -static inline void kunmap_atomic_high(void *addr) +static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) +{ + return kmap_atomic(page); +} + +static inline void *kmap_atomic_pfn(unsigned long pfn) +{ + return kmap_atomic(pfn_to_page(pfn)); +} + +static inline void __kunmap_atomic(void *addr) { /* * Mostly nothing to do in the CONFIG_HIGHMEM=n case as kunmap_atomic() -* handles re-enabling faults + preemption +* handles re-enabling faults and preemption */ #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); #endif } -#define kmap_atomic_pfn(pfn) kmap_atomic(pfn_to_page(pfn)) - #define kmap_flush_unused()do {} while(0) #endif /* CONFIG_HIGHMEM */ @@ -213,15 +263,14 @@ static inline void kmap_atomic_idx_pop(v * Prevent people trying to call kunmap_atomic() as if it were kunmap() * kunmap_atomic() should get the return value of kmap_atomic, not the page. */ -#define kunmap_atomic(addr) \ -do {\ - BUILD_BUG_ON(__same_type((addr), struct page *)); \ - kunmap_atomic_high(addr); \ - pagefault_enable(); \ - preempt_enable(); \ +#define kunmap_atomic(__addr) \ +do { \ + BUILD_BUG_ON(__same_type((__addr), struct page *)); \ + __kunmap_atomic(__addr);\ + pagefault_enable(); \ + preempt_enable(); \ } while (0) - /* when CONFIG_HIGHMEM is not set these will be plain clear/copy_page */ #ifndef clear_user_highpage static inline void clear_user_highpage(struct page *page, unsigned long vaddr) --- a/mm/Kconfig +++ b/mm/Kconfig @@ -872,4 +872,7 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +config KMAP_LOCAL + bool + endmenu --- a/mm/highmem.c +++ b/mm/highmem.c @@ -30,6 +30,7 @@ #include #include
[patch V2 02/18] mm/highmem: Un-EXPORT __kmap_atomic_idx()
Nothing in modules can use that. Signed-off-by: Thomas Gleixner Reviewed-by: Christoph Hellwig Cc: Andrew Morton Cc: linux...@kvack.org --- mm/highmem.c |2 -- 1 file changed, 2 deletions(-) --- a/mm/highmem.c +++ b/mm/highmem.c @@ -108,8 +108,6 @@ static inline wait_queue_head_t *get_pkm atomic_long_t _totalhigh_pages __read_mostly; EXPORT_SYMBOL(_totalhigh_pages); -EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx); - unsigned int nr_free_highpages (void) { struct zone *zone;
[patch V2 01/18] sched: Make migrate_disable/enable() independent of RT
Now that the scheduler can deal with migrate disable properly, there is no real compelling reason to make it only available for RT. There are quite some code pathes which needlessly disable preemption in order to prevent migration and some constructs like kmap_atomic() enforce it implicitly. Making it available independent of RT allows to provide a preemptible variant of kmap_atomic() and makes the code more consistent in general. FIXME: Rework the comment in preempt.h Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira --- include/linux/preempt.h | 38 +++--- include/linux/sched.h |2 +- kernel/sched/core.c | 12 ++-- kernel/sched/sched.h|2 +- lib/smp_processor_id.c |2 +- 5 files changed, 8 insertions(+), 48 deletions(-) --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -322,7 +322,7 @@ static inline void preempt_notifier_init #endif -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP /* * Migrate-Disable and why it is undesired. @@ -382,43 +382,11 @@ static inline void preempt_notifier_init extern void migrate_disable(void); extern void migrate_enable(void); -#elif defined(CONFIG_PREEMPT_RT) +#else static inline void migrate_disable(void) { } static inline void migrate_enable(void) { } -#else /* !CONFIG_PREEMPT_RT */ - -/** - * migrate_disable - Prevent migration of the current task - * - * Maps to preempt_disable() which also disables preemption. Use - * migrate_disable() to annotate that the intent is to prevent migration, - * but not necessarily preemption. - * - * Can be invoked nested like preempt_disable() and needs the corresponding - * number of migrate_enable() invocations. - */ -static __always_inline void migrate_disable(void) -{ - preempt_disable(); -} - -/** - * migrate_enable - Allow migration of the current task - * - * Counterpart to migrate_disable(). - * - * As migrate_disable() can be invoked nested, only the outermost invocation - * reenables migration. - * - * Currently mapped to preempt_enable(). - */ -static __always_inline void migrate_enable(void) -{ - preempt_enable(); -} - -#endif /* CONFIG_SMP && CONFIG_PREEMPT_RT */ +#endif /* CONFIG_SMP */ #endif /* __LINUX_PREEMPT_H */ --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -715,7 +715,7 @@ struct task_struct { const cpumask_t *cpus_ptr; cpumask_t cpus_mask; void*migration_pending; -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP unsigned short migration_disabled; #endif unsigned short migration_flags; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1696,8 +1696,6 @@ void check_preempt_curr(struct rq *rq, s #ifdef CONFIG_SMP -#ifdef CONFIG_PREEMPT_RT - static void __do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask, u32 flags); @@ -1772,8 +1770,6 @@ static inline bool rq_has_pinned_tasks(s return rq->nr_pinned; } -#endif - /* * Per-CPU kthreads are allowed to run on !active && online CPUs, see * __set_cpus_allowed_ptr() and select_fallback_rq(). @@ -2841,7 +2837,7 @@ void sched_set_stop_task(int cpu, struct } } -#else +#else /* CONFIG_SMP */ static inline int __set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask, @@ -2850,10 +2846,6 @@ static inline int __set_cpus_allowed_ptr return set_cpus_allowed_ptr(p, new_mask); } -#endif /* CONFIG_SMP */ - -#if !defined(CONFIG_SMP) || !defined(CONFIG_PREEMPT_RT) - static inline void migrate_disable_switch(struct rq *rq, struct task_struct *p) { } static inline bool rq_has_pinned_tasks(struct rq *rq) @@ -2861,7 +2853,7 @@ static inline bool rq_has_pinned_tasks(s return false; } -#endif +#endif /* !CONFIG_SMP */ static void ttwu_stat(struct task_struct *p, int cpu, int wake_flags) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1056,7 +1056,7 @@ struct rq { struct cpuidle_state*idle_state; #endif -#if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP) +#if CONFIG_SMP unsigned intnr_pinned; #endif unsigned intpush_busy; --- a/lib/smp_processor_id.c +++ b/lib/smp_processor_id.c @@ -26,7 +26,7 @@ unsigned int check_preemption_disabled(c if (current->nr_cpus_allowed == 1) goto out; -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP if (current->migration_disabled) goto out; #endif
[patch V2 04/18] x86/mm/highmem: Use generic kmap atomic implementation
Convert X86 to the generic kmap atomic implementation and make the iomap_atomic() naming convention consistent while at it. Signed-off-by: Thomas Gleixner Cc: x...@kernel.org --- arch/x86/Kconfig |3 +- arch/x86/include/asm/fixmap.h |1 arch/x86/include/asm/highmem.h | 12 ++-- arch/x86/include/asm/iomap.h | 18 ++-- arch/x86/mm/highmem_32.c | 59 - arch/x86/mm/init_32.c | 15 -- arch/x86/mm/iomap_32.c | 59 +++-- include/linux/io-mapping.h |2 - 8 files changed, 27 insertions(+), 142 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -14,10 +14,11 @@ config X86_32 select ARCH_WANT_IPC_PARSE_VERSION select CLKSRC_I8253 select CLONE_BACKWARDS + select GENERIC_VDSO_32 select HAVE_DEBUG_STACKOVERFLOW + select KMAP_LOCAL select MODULES_USE_ELF_REL select OLD_SIGACTION - select GENERIC_VDSO_32 config X86_64 def_bool y --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -151,7 +151,6 @@ extern void reserve_top_address(unsigned extern int fixmaps_set; -extern pte_t *kmap_pte; extern pte_t *pkmap_page_table; void __native_set_fixmap(enum fixed_addresses idx, pte_t pte); --- a/arch/x86/include/asm/highmem.h +++ b/arch/x86/include/asm/highmem.h @@ -58,11 +58,17 @@ extern unsigned long highstart_pfn, high #define PKMAP_NR(virt) ((virt-PKMAP_BASE) >> PAGE_SHIFT) #define PKMAP_ADDR(nr) (PKMAP_BASE + ((nr) << PAGE_SHIFT)) -void *kmap_atomic_pfn(unsigned long pfn); -void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot); - #define flush_cache_kmaps()do { } while (0) +#definearch_kmap_local_post_map(vaddr, pteval) \ + arch_flush_lazy_mmu_mode() + +#definearch_kmap_local_post_unmap(vaddr) \ + do {\ + flush_tlb_one_kernel((vaddr)); \ + arch_flush_lazy_mmu_mode(); \ + } while (0) + extern void add_highpages_with_active_regions(int nid, unsigned long start_pfn, unsigned long end_pfn); --- a/arch/x86/include/asm/iomap.h +++ b/arch/x86/include/asm/iomap.h @@ -9,19 +9,21 @@ #include #include #include +#include #include #include -void __iomem * -iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot); +void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot); -void -iounmap_atomic(void __iomem *kvaddr); +static inline void iounmap_atomic(void __iomem *vaddr) +{ + kunmap_local_indexed((void __force *)vaddr); + pagefault_enable(); + preempt_enable(); +} -int -iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot); +int iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot); -void -iomap_free(resource_size_t base, unsigned long size); +void iomap_free(resource_size_t base, unsigned long size); #endif /* _ASM_X86_IOMAP_H */ --- a/arch/x86/mm/highmem_32.c +++ b/arch/x86/mm/highmem_32.c @@ -4,65 +4,6 @@ #include /* for totalram_pages */ #include -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot) -{ - unsigned long vaddr; - int idx, type; - - type = kmap_atomic_idx_push(); - idx = type + KM_TYPE_NR*smp_processor_id(); - vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - BUG_ON(!pte_none(*(kmap_pte-idx))); - set_pte(kmap_pte-idx, mk_pte(page, prot)); - arch_flush_lazy_mmu_mode(); - - return (void *)vaddr; -} -EXPORT_SYMBOL(kmap_atomic_high_prot); - -/* - * This is the same as kmap_atomic() but can map memory that doesn't - * have a struct page associated with it. - */ -void *kmap_atomic_pfn(unsigned long pfn) -{ - return kmap_atomic_prot_pfn(pfn, kmap_prot); -} -EXPORT_SYMBOL_GPL(kmap_atomic_pfn); - -void kunmap_atomic_high(void *kvaddr) -{ - unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK; - - if (vaddr >= __fix_to_virt(FIX_KMAP_END) && - vaddr <= __fix_to_virt(FIX_KMAP_BEGIN)) { - int idx, type; - - type = kmap_atomic_idx(); - idx = type + KM_TYPE_NR * smp_processor_id(); - -#ifdef CONFIG_DEBUG_HIGHMEM - WARN_ON_ONCE(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx)); -#endif - /* -* Force other mappings to Oops if they'll try to access this -* pte without first remap it. Keeping stale mappings around -* is a bad idea also, in case the page changes cacheability -* attributes or becomes a protected page in a hypervisor. -*/ - kpte_clear_flush(kmap_pte-idx, vaddr); - kmap_atomic_idx_pop(); - arch_flush_lazy_mmu_mode(); - } -#ifdef CONFIG_DEBUG_HIGHMEM - else {
[patch V2 00/18] mm/highmem: Preemptible variant of kmap_atomic & friends
Following up to the discussion in: https://lore.kernel.org/r/20200914204209.256266...@linutronix.de and the initial version of this: https://lore.kernel.org/r/20200919091751.06...@linutronix.de this series provides a preemptible variant of kmap_atomic & related interfaces. Now that the scheduler folks have wrapped their heads around the migration disable scheduler woes, there is not a real reason anymore to confine migration disabling to RT. As expressed in the earlier discussion by graphics and crypto folks, there is interest to get rid of their kmap_atomic* usage because they need only a temporary stable map and not all the bells and whistels of kmap_atomic*. This series provides kmap_local.* iomap_local variants which only disable migration to keep the virtual mapping address stable accross preemption, but do neither disable pagefaults nor preemption. The new functions can be used in any context, but if used in atomic context the caller has to take care of eventually disabling pagefaults. This is achieved by: - Removing the RT dependency from migrate_disable/enable() - Consolidating all kmap atomic implementations in generic code - Switching from per CPU storage of the kmap index to a per task storage - Adding a pteval array to the per task storage which contains the ptevals of the currently active temporary kmaps - Adding context switch code which checks whether the outgoing or the incoming task has active temporary kmaps. If so, the outgoing task's kmaps are removed and the incoming task's kmaps are restored. - Adding new interfaces k[un]map_temporary*() which are not disabling preemption and can be called from any context (except NMI). Contrary to kmap() which provides preemptible and "persistant" mappings, these interfaces are meant to replace the temporary mappings provided by kmap_atomic*() today. This allows to get rid of conditional mapping choices and allows to have preemptible short term mappings on 64bit which are today enforced to be non-preemptible due to the highmem constraints. It clearly puts overhead on the highmem users, but highmem is slow anyway. This is not a wholesale conversion which makes kmap_atomic magically preemptible because there might be usage sites which rely on the implicit preempt disable. So this needs to be done on a case by case basis and the call sites converted to kmap_temporary. Note, that this is only lightly tested on X86 and completely untested on all other architectures. There is also a still to be investigated question from Linus on the initial posting versus the per cpu / per task mapping stack depth which might need to be made larger due to the ability to take page faults within a mapping region. Though I wanted to share the current state of affairs before investigating that further. If there is consensus in going forward with this, I'll have a deeper look into this issue. The lot is available from git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem It is based on Peter Zijlstras migrate disable branch which is close to be merged into the tip tree, but still not finalized: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/migrate-disable Changes vs. V1: - Make it truly functional by depending on migrate disable/enable (Brown paperbag) - Rename to kmap_local.* (Linus) - Fix the sched in/out issue Linus pointed out - Fix a few style issues (Christoph) - Split a few things out into seperate patches to make review simpler - Pick up acked/reviewed tags as appropriate Thanks, tglx --- a/arch/arm/mm/highmem.c | 121 -- a/arch/microblaze/mm/highmem.c| 78 a/arch/nds32/mm/highmem.c | 48 --- a/arch/powerpc/mm/highmem.c | 67 -- a/arch/sparc/mm/highmem.c | 115 - arch/arc/Kconfig |1 arch/arc/include/asm/highmem.h|8 + arch/arc/mm/highmem.c | 44 -- arch/arm/Kconfig |1 arch/arm/include/asm/highmem.h| 31 +++- arch/arm/mm/Makefile |1 arch/csky/Kconfig |1 arch/csky/include/asm/highmem.h |4 arch/csky/mm/highmem.c| 75 --- arch/microblaze/Kconfig |1 arch/microblaze/include/asm/highmem.h |6 arch/microblaze/mm/Makefile |1 arch/microblaze/mm/init.c |6 arch/mips/Kconfig |1 arch/mips/include/asm/highmem.h |4 arch/mips/mm/highmem.c| 77 arch/mips/mm/init.c |3 arch/nds32/Kconfig.cpu|1 arch/nds32/include/asm/highmem.h | 21 ++- arch/nds32/mm/Makefile|1 arch/powerpc/Kconfig |1 arch/powerpc/include/asm/highmem.h|6 arch/powe
Re: [PATCH 11/13] PCI: dwc: Move dw_pcie_msi_init() into core
On 10/28/20, 4:47 PM, Rob Herring wrote: > > The host drivers which call dw_pcie_msi_init() are all the ones using > the built-in MSI controller, so let's move it into the common DWC code. > > Cc: Kishon Vijay Abraham I > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Kukjin Kim > Cc: Krzysztof Kozlowski > Cc: Richard Zhu > Cc: Lucas Stach > Cc: Shawn Guo > Cc: Sascha Hauer > Cc: Pengutronix Kernel Team > Cc: Fabio Estevam > Cc: NXP Linux Team > Cc: Yue Wang > Cc: Kevin Hilman > Cc: Neil Armstrong > Cc: Jerome Brunet > Cc: Martin Blumenstingl > Cc: Jesper Nilsson > Cc: Gustavo Pimentel > Cc: Xiaowei Song > Cc: Binghui Wang > Cc: Stanimir Varbanov > Cc: Andy Gross > Cc: Bjorn Andersson > Cc: Pratyush Anand > Cc: Thierry Reding > Cc: Jonathan Hunter > Cc: Kunihiko Hayashi > Cc: Masahiro Yamada > Cc: linux-o...@vger.kernel.org > Cc: linux-samsung-...@vger.kernel.org > Cc: linux-amlo...@lists.infradead.org > Cc: linux-arm-ker...@axis.com > Cc: linux-arm-...@vger.kernel.org > Cc: linux-te...@vger.kernel.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-dra7xx.c | 2 -- > drivers/pci/controller/dwc/pci-exynos.c | 4 > drivers/pci/controller/dwc/pci-imx6.c | 1 - > drivers/pci/controller/dwc/pci-meson.c| 1 - > drivers/pci/controller/dwc/pcie-artpec6.c | 1 - > drivers/pci/controller/dwc/pcie-designware-host.c | 8 +--- > drivers/pci/controller/dwc/pcie-designware-plat.c | 1 - > drivers/pci/controller/dwc/pcie-designware.h | 10 -- > drivers/pci/controller/dwc/pcie-histb.c | 2 -- > drivers/pci/controller/dwc/pcie-kirin.c | 1 - > drivers/pci/controller/dwc/pcie-qcom.c| 2 -- > drivers/pci/controller/dwc/pcie-spear13xx.c | 6 +- > drivers/pci/controller/dwc/pcie-tegra194.c| 2 -- > drivers/pci/controller/dwc/pcie-uniphier.c| 1 - > 14 files changed, 6 insertions(+), 36 deletions(-) [...]
Re: [PATCH 10/13] PCI: dwc: Move link handling into common code
On 10/28/20, 4:47 PM, Rob Herring wrote: > > All the DWC drivers do link setup and checks at roughly the same time. > Let's use the existing .start_link() hook (currently only used in EP > mode) and move the link handling to the core code. > > The behavior for a link down was inconsistent as some drivers would fail > probe in that case while others succeed. Let's standardize this to > succeed as there are usecases where devices (and the link) appear later > even without hotplug. For example, a reconfigured FPGA device. > > Cc: Kishon Vijay Abraham I > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Kukjin Kim > Cc: Krzysztof Kozlowski > Cc: Richard Zhu > Cc: Lucas Stach > Cc: Shawn Guo > Cc: Sascha Hauer > Cc: Pengutronix Kernel Team > Cc: Fabio Estevam > Cc: NXP Linux Team > Cc: Murali Karicheri > Cc: Yue Wang > Cc: Kevin Hilman > Cc: Neil Armstrong > Cc: Jerome Brunet > Cc: Martin Blumenstingl > Cc: Thomas Petazzoni > Cc: Jesper Nilsson > Cc: Gustavo Pimentel > Cc: Xiaowei Song > Cc: Binghui Wang > Cc: Andy Gross > Cc: Bjorn Andersson > Cc: Stanimir Varbanov > Cc: Pratyush Anand > Cc: Thierry Reding > Cc: Jonathan Hunter > Cc: Kunihiko Hayashi > Cc: Masahiro Yamada > Cc: linux-o...@vger.kernel.org > Cc: linux-samsung-...@vger.kernel.org > Cc: linux-amlo...@lists.infradead.org > Cc: linux-arm-ker...@axis.com > Cc: linux-arm-...@vger.kernel.org > Cc: linux-te...@vger.kernel.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-dra7xx.c | 2 - > drivers/pci/controller/dwc/pci-exynos.c | 41 +++-- > drivers/pci/controller/dwc/pci-imx6.c | 9 ++-- > drivers/pci/controller/dwc/pci-keystone.c | 9 > drivers/pci/controller/dwc/pci-meson.c| 24 -- > drivers/pci/controller/dwc/pcie-armada8k.c| 39 +++- > drivers/pci/controller/dwc/pcie-artpec6.c | 2 - > .../pci/controller/dwc/pcie-designware-host.c | 9 > .../pci/controller/dwc/pcie-designware-plat.c | 3 -- > drivers/pci/controller/dwc/pcie-histb.c | 34 +++--- > drivers/pci/controller/dwc/pcie-kirin.c | 23 ++ > drivers/pci/controller/dwc/pcie-qcom.c| 19 ++-- > drivers/pci/controller/dwc/pcie-spear13xx.c | 46 --- > drivers/pci/controller/dwc/pcie-tegra194.c| 1 - > drivers/pci/controller/dwc/pcie-uniphier.c| 13 ++ > 15 files changed, 103 insertions(+), 171 deletions(-) [...]
Re: [PATCH 09/13] PCI: dwc: Rework MSI initialization
On 10/28/20, 4:47 PM, Rob Herring wrote: > > There are 3 possible MSI implementations for the DWC host. The first is > using the built-in DWC MSI controller. The 2nd is a custom MSI > controller as part of the PCI host (keystone only). The 3rd is an > external MSI controller (typically GICv3 ITS). Currently, the last 2 > are distinguished with a .msi_host_init() hook with the 3rd option using > an empty function. However we can detect the 3rd case with the presence > of 'msi-parent' or 'msi-map' properties, so let's do that instead and > remove the empty functions. > > Cc: Murali Karicheri > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Minghuan Lian > Cc: Mingkai Hu > Cc: Roy Zang > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Gustavo Pimentel > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-keystone.c | 9 --- > drivers/pci/controller/dwc/pci-layerscape.c | 25 --- > .../pci/controller/dwc/pcie-designware-host.c | 20 +-- > drivers/pci/controller/dwc/pcie-designware.h | 1 + > drivers/pci/controller/dwc/pcie-intel-gw.c| 9 --- > 5 files changed, 13 insertions(+), 51 deletions(-) [...]
Re: [PATCH 08/13] PCI: dwc: Move MSI interrupt setup into DWC common code
On 10/28/20, 4:47 PM, Rob Herring wrote: > > Platforms using the built-in DWC MSI controller all have a dedicated > interrupt with "msi" name or at index 0, so let's move setting up the > interrupt to the common DWC code. > > spear13xx and dra7xx are the 2 oddballs with muxed interrupts, so > we need to prevent configuring the MSI interrupt by setting msi_irq > to negative. > > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Kukjin Kim > Cc: Krzysztof Kozlowski > Cc: Richard Zhu > Cc: Lucas Stach > Cc: Shawn Guo > Cc: Sascha Hauer > Cc: Pengutronix Kernel Team > Cc: Fabio Estevam > Cc: NXP Linux Team > Cc: Yue Wang > Cc: Kevin Hilman > Cc: Neil Armstrong > Cc: Jerome Brunet > Cc: Martin Blumenstingl > Cc: Jesper Nilsson > Cc: Gustavo Pimentel > Cc: Xiaowei Song > Cc: Binghui Wang > Cc: Stanimir Varbanov > Cc: Andy Gross > Cc: Bjorn Andersson > Cc: Pratyush Anand > Cc: Thierry Reding > Cc: Jonathan Hunter > Cc: Kunihiko Hayashi > Cc: Masahiro Yamada > Cc: linux-samsung-...@vger.kernel.org > Cc: linux-amlo...@lists.infradead.org > Cc: linux-arm-ker...@axis.com > Cc: linux-arm-...@vger.kernel.org > Cc: linux-te...@vger.kernel.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-dra7xx.c | 3 +++ > drivers/pci/controller/dwc/pci-exynos.c | 6 - > drivers/pci/controller/dwc/pci-imx6.c | 6 - > drivers/pci/controller/dwc/pci-meson.c| 6 - > drivers/pci/controller/dwc/pcie-artpec6.c | 6 - > .../pci/controller/dwc/pcie-designware-host.c | 11 +- > .../pci/controller/dwc/pcie-designware-plat.c | 6 - > drivers/pci/controller/dwc/pcie-histb.c | 6 - > drivers/pci/controller/dwc/pcie-kirin.c | 22 --- > drivers/pci/controller/dwc/pcie-qcom.c| 8 --- > drivers/pci/controller/dwc/pcie-spear13xx.c | 1 + > drivers/pci/controller/dwc/pcie-tegra194.c| 8 --- > drivers/pci/controller/dwc/pcie-uniphier.c| 6 - > 13 files changed, 14 insertions(+), 81 deletions(-) [...]
Re: [PATCH 07/13] PCI: dwc: Drop the .set_num_vectors() host op
On 10/28/20, 4:47 PM, Rob Herring wrote: > > There's no reason for the .set_num_vectors() host op. Drivers needing a > non-default value can just initialize pcie_port.num_vectors directly. > > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Gustavo Pimentel > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Thierry Reding > Cc: Jonathan Hunter > Cc: linux-te...@vger.kernel.org > Signed-off-by: Rob Herring > --- > .../pci/controller/dwc/pcie-designware-host.c | 19 --- > .../pci/controller/dwc/pcie-designware-plat.c | 7 +-- > drivers/pci/controller/dwc/pcie-designware.h | 1 - > drivers/pci/controller/dwc/pcie-tegra194.c| 7 +-- > 4 files changed, 6 insertions(+), 28 deletions(-) [...]
Re: [PATCH 05/13] PCI: dwc: Ensure all outbound ATU windows are reset
On 10/28/20, 4:47 PM, Rob Herring wrote: > > The Layerscape driver clears the ATU registers which may have been > configured by the bootloader. Any driver could have the same issue > and doing it for all drivers doesn't hurt, so let's move it into the > common DWC code. > > Cc: Minghuan Lian > Cc: Mingkai Hu > Cc: Roy Zang > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Gustavo Pimentel > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-layerscape.c | 14 -- > drivers/pci/controller/dwc/pcie-designware-host.c | 5 + > 2 files changed, 5 insertions(+), 14 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c > b/drivers/pci/controller/dwc/pci-layerscape.c > index f24f79a70d9a..53e56d54c482 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape.c > +++ b/drivers/pci/controller/dwc/pci-layerscape.c > @@ -83,14 +83,6 @@ static void ls_pcie_drop_msg_tlp(struct ls_pcie *pcie) > iowrite32(val, pci->dbi_base + PCIE_STRFMR1); > } > > -static void ls_pcie_disable_outbound_atus(struct ls_pcie *pcie) > -{ > -int i; > - > -for (i = 0; i < PCIE_IATU_NUM; i++) > -dw_pcie_disable_atu(pcie->pci, i, DW_PCIE_REGION_OUTBOUND); > -} > - > static int ls1021_pcie_link_up(struct dw_pcie *pci) > { > u32 state; > @@ -136,12 +128,6 @@ static int ls_pcie_host_init(struct pcie_port *pp) > struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > struct ls_pcie *pcie = to_ls_pcie(pci); > > -/* > - * Disable outbound windows configured by the bootloader to avoid > - * one transaction hitting multiple outbound windows. > - * dw_pcie_setup_rc() will reconfigure the outbound windows. > - */ > -ls_pcie_disable_outbound_atus(pcie); > ls_pcie_fix_error_response(pcie); > > dw_pcie_dbi_ro_wr_en(pci); > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c > b/drivers/pci/controller/dwc/pcie-designware-host.c > index cde45b2076ee..265a48f1a0ae 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > @@ -534,6 +534,7 @@ static struct pci_ops dw_pcie_ops = { > > void dw_pcie_setup_rc(struct pcie_port *pp) > { > +int i; > u32 val, ctrl, num_ctrls; > struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > > @@ -583,6 +584,10 @@ void dw_pcie_setup_rc(struct pcie_port *pp) > PCI_COMMAND_MASTER | PCI_COMMAND_SERR; > dw_pcie_writel_dbi(pci, PCI_COMMAND, val); > > +/* Ensure all outbound windows are disabled so there are multiple > matches */ > +for (i = 0; i < pci->num_viewport; i++) > +dw_pcie_disable_atu(pci, i, DW_PCIE_REGION_OUTBOUND); > + > /* >* If the platform provides its own child bus config accesses, it means >* the platform uses its own address translation component rather than > -- > 2.25.1
Re: [PATCH 03/13] PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into common code
On 10/28/20, 4:46 PM, Rob Herring wrote: > > Most DWC drivers use the common register resource names "dbi", "dbi2", and > "addr_space", so let's move their setup into the DWC common code. > > This means 'dbi_base' in particular is setup later, but it looks like no > drivers touch DBI registers before dw_pcie_host_init or dw_pcie_ep_init. > > Cc: Kishon Vijay Abraham I > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Murali Karicheri > Cc: Minghuan Lian > Cc: Mingkai Hu > Cc: Roy Zang > Cc: Jonathan Chocron > Cc: Jesper Nilsson > Cc: Jingoo Han Acked-by: Jingoo Han Best regards, Jingoo Han > Cc: Gustavo Pimentel > Cc: Xiaowei Song > Cc: Binghui Wang > Cc: Andy Gross > Cc: Bjorn Andersson > Cc: Stanimir Varbanov > Cc: Pratyush Anand > Cc: Thierry Reding > Cc: Jonathan Hunter > Cc: Kunihiko Hayashi > Cc: Masahiro Yamada > Cc: linux-o...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-arm-ker...@axis.com > Cc: linux-arm-...@vger.kernel.org > Cc: linux-te...@vger.kernel.org > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-dra7xx.c | 8 > drivers/pci/controller/dwc/pci-keystone.c | 29 +--- > .../pci/controller/dwc/pci-layerscape-ep.c| 37 +-- > drivers/pci/controller/dwc/pcie-al.c | 9 +--- > drivers/pci/controller/dwc/pcie-artpec6.c | 43 ++ > .../pci/controller/dwc/pcie-designware-ep.c | 29 ++-- > .../pci/controller/dwc/pcie-designware-host.c | 7 +++ > .../pci/controller/dwc/pcie-designware-plat.c | 45 +-- > drivers/pci/controller/dwc/pcie-intel-gw.c| 4 -- > drivers/pci/controller/dwc/pcie-kirin.c | 5 --- > drivers/pci/controller/dwc/pcie-qcom.c| 8 > drivers/pci/controller/dwc/pcie-spear13xx.c | 11 + > drivers/pci/controller/dwc/pcie-tegra194.c| 22 - > drivers/pci/controller/dwc/pcie-uniphier-ep.c | 38 +--- > drivers/pci/controller/dwc/pcie-uniphier.c| 6 --- > 15 files changed, 47 insertions(+), 254 deletions(-) [...]
Re: [PATCH] powerpc/32s: Setup the early hash table at all time.
On Okt 01 2020, Christophe Leroy wrote: > At the time being, an early hash table is set up when > CONFIG_KASAN is selected. > > There is nothing wrong with setting such an early hash table > all the time, even if it is not used. This is a statically > allocated 256 kB table which lies in the init data section. > > This makes the code simpler and may in the future allow to > setup early IO mappings with fixmap instead of hard coding BATs. > > Put create_hpte() and flush_hash_pages() in the .ref.text section > in order to avoid warning for the reference to early_hash[]. This > reference is removed by MMU_init_hw_patch() before init memory is > freed. This breaks booting on the iBook G4. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output
On Wed, 28 Oct 2020 15:23:18 +0100 Mauro Carvalho Chehab wrote: > From: Mauro Carvalho Chehab > > Some files over there won't parse well by Sphinx. > > Fix them. > > Signed-off-by: Mauro Carvalho Chehab > Signed-off-by: Mauro Carvalho Chehab Query below... I'm going to guess a rebase issue? Other than that Acked-by: Jonathan Cameron # for IIO > diff --git a/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > b/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > index b7259234ad70..a10a4de3e5fe 100644 > --- a/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > +++ b/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > @@ -3,67 +3,85 @@ KernelVersion: 4.11 > Contact: benjamin.gaign...@st.com > Description: > Reading returns the list possible master modes which are: > - - "reset" : The UG bit from the TIMx_EGR register is > + > + > + - "reset" > + The UG bit from the TIMx_EGR register is > used as trigger output (TRGO). > - - "enable": The Counter Enable signal CNT_EN is used > + - "enable" > + The Counter Enable signal CNT_EN is used > as trigger output. > - - "update": The update event is selected as trigger output. > + - "update" > + The update event is selected as trigger output. > For instance a master timer can then be used > as a prescaler for a slave timer. > - - "compare_pulse" : The trigger output send a positive pulse > - when the CC1IF flag is to be set. > - - "OC1REF": OC1REF signal is used as trigger output. > - - "OC2REF": OC2REF signal is used as trigger output. > - - "OC3REF": OC3REF signal is used as trigger output. > - - "OC4REF": OC4REF signal is used as trigger output. > + - "compare_pulse" > + The trigger output send a positive pulse > + when the CC1IF flag is to be set. > + - "OC1REF" > + OC1REF signal is used as trigger output. > + - "OC2REF" > + OC2REF signal is used as trigger output. > + - "OC3REF" > + OC3REF signal is used as trigger output. > + - "OC4REF" > + OC4REF signal is used as trigger output. > + > Additional modes (on TRGO2 only): > - - "OC5REF": OC5REF signal is used as trigger output. > - - "OC6REF": OC6REF signal is used as trigger output. > + > + - "OC5REF" > + OC5REF signal is used as trigger output. > + - "OC6REF" > + OC6REF signal is used as trigger output. > - "compare_pulse_OC4REF": > - OC4REF rising or falling edges generate pulses. > + OC4REF rising or falling edges generate pulses. > - "compare_pulse_OC6REF": > - OC6REF rising or falling edges generate pulses. > + OC6REF rising or falling edges generate pulses. > - "compare_pulse_OC4REF_r_or_OC6REF_r": > - OC4REF or OC6REF rising edges generate pulses. > + OC4REF or OC6REF rising edges generate pulses. > - "compare_pulse_OC4REF_r_or_OC6REF_f": > - OC4REF rising or OC6REF falling edges generate pulses. > + OC4REF rising or OC6REF falling edges generate > + pulses. > - "compare_pulse_OC5REF_r_or_OC6REF_r": > - OC5REF or OC6REF rising edges generate pulses. > + OC5REF or OC6REF rising edges generate pulses. > - "compare_pulse_OC5REF_r_or_OC6REF_f": > - OC5REF rising or OC6REF falling edges generate pulses. > + OC5REF rising or OC6REF falling edges generate > + pulses. > > - +---+ +-++-+ > - | Prescaler +-> | Counter |+-> | Master | TRGO(2) > - +---+ +--++-+|-> | Control +--> > -|| || +-+ > - +--v+-+ OCxREF || +-+ > - | Chx compare +--> | Output | ChX > - +---+-+ | | Control +--> > - . | | +-+ > - . | |. > - +---
Re: [PATCH 1/3] powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()
# # Automatically generated file; DO NOT EDIT. # Linux/powerpc 5.10.0-rc1 Kernel Configuration # CONFIG_CC_VERSION_TEXT="gcc-4.9 (SUSE Linux) 4.9.3" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=40903 CONFIG_LD_VERSION=23501 CONFIG_CLANG_VERSION=0 CONFIG_CC_CAN_LINK=y CONFIG_CC_CAN_LINK_STATIC=y CONFIG_CC_HAS_ASM_GOTO=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_TABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_XZ is not set CONFIG_DEFAULT_INIT="" CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_WATCH_QUEUE=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_SHOW_LEVEL=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set # end of IRQ subsystem CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_ARCH_HAS_TICK_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # end of Timers subsystem # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # # CPU/Task time and stats accounting # CONFIG_VIRT_CPU_ACCOUNTING=y # CONFIG_TICK_CPU_ACCOUNTING is not set CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_PSI=y # CONFIG_PSI_DEFAULT_DISABLED is not set # end of CPU/Task time and stats accounting # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU_GENERIC=y CONFIG_TASKS_TRACE_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y # end of RCU Subsystem CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_IKHEADERS is not set CONFIG_LOG_BUF_SHIFT=18 CONFIG_LOG_CPU_MAX_BUF_SHIFT=15 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 # # Scheduler features # CONFIG_UCLAMP_TASK=y CONFIG_UCLAMP_BUCKETS_COUNT=5 # end of Scheduler features CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_CC_HAS_INT128=y CONFIG_CGROUPS=y CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y CONFIG_MEMCG_SWAP=y CONFIG_MEMCG_KMEM=y CONFIG_BLK_CGROUP=y CONFIG_CGROUP_WRITEBACK=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y CONFIG_RT_GROUP_SCHED=y # CONFIG_UCLAMP_TASK_GROUP is not set CONFIG_CGROUP_PIDS=y CONFIG_CGROUP_RDMA=y CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_HUGETLB=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_BPF=y # CONFIG_CGROUP_DEBUG is not set CONFIG_SOCK_CGROUP_DATA=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y CONFIG_CHECKPOINT_RESTORE=y # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y CONFIG_RD_LZ4=y CONFIG_RD_ZSTD=y CONFIG_BOOT_CONFIG=y # CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y # CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not set CONFIG_SYSCTL=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_BPF=y CONFIG_EXPERT=y CONFIG_MULTIUSER=y CONFIG_SGETMASK_SYSCALL=y CONFIG_SYSFS_SYSCALL=y CONFIG_FHANDLE=y CONFIG_POSIX_TIMERS=y CONFIG_PRINTK=y CONFIG_PRINTK_NMI=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_FUTEX_PI=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_IO_URING=y CONFIG_ADVISE_SYSCALLS=y CONFIG_MEMBARRIER=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_BASE_RELATIVE=y CONFIG_BPF_SYSCALL=y CONFIG_USERMODE_DRIVER=y # CONFIG_BPF_PRELOAD is not set CONFIG_USERFAULTFD=y CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS=y CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y CONFIG_RSEQ=y # CONFIG_DEBUG_RSEQ is not set # CONFIG_EMBEDDED is not set CONFIG_HAVE_PERF_EVENTS=y # CONFIG_PC104 is not set # # Kernel Performance Events And Counters # CONFIG_PERF_EVENTS=y # end of Kernel Performance Events And Counters CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB_MEMCG_SYSFS_ON=y # CONFIG_COMPAT_BRK is not set # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_SLAB_MERGE_DEFAU
Re: [PATCH v2 2/4] PM: hibernate: make direct map manipulations more explicit
On Thu, Oct 29, 2020 at 5:19 PM Mike Rapoport wrote: > > From: Mike Rapoport > > When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be > not present in the direct map and has to be explicitly mapped before it > could be copied. > > On arm64 it is possible that a page would be removed from the direct map > using set_direct_map_invalid_noflush() but __kernel_map_pages() will refuse > to map this page back if DEBUG_PAGEALLOC is disabled. > > Introduce hibernate_map_page() that will explicitly use > set_direct_map_{default,invalid}_noflush() for ARCH_HAS_SET_DIRECT_MAP case > and debug_pagealloc_map_pages() for DEBUG_PAGEALLOC case. > > The remapping of the pages in safe_copy_page() presumes that it only > changes protection bits in an existing PTE and so it is safe to ignore > return value of set_direct_map_{default,invalid}_noflush(). > > Still, add a WARN_ON() so that future changes in set_memory APIs will not > silently break hibernation. > > Signed-off-by: Mike Rapoport >From the hibernation support perspective: Acked-by: Rafael J. Wysocki > --- > include/linux/mm.h | 12 > kernel/power/snapshot.c | 30 -- > 2 files changed, 28 insertions(+), 14 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 1fc0609056dc..14e397f3752c 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2927,16 +2927,6 @@ static inline bool debug_pagealloc_enabled_static(void) > #if defined(CONFIG_DEBUG_PAGEALLOC) || > defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP) > extern void __kernel_map_pages(struct page *page, int numpages, int enable); > > -/* > - * When called in DEBUG_PAGEALLOC context, the call should most likely be > - * guarded by debug_pagealloc_enabled() or debug_pagealloc_enabled_static() > - */ > -static inline void > -kernel_map_pages(struct page *page, int numpages, int enable) > -{ > - __kernel_map_pages(page, numpages, enable); > -} > - > static inline void debug_pagealloc_map_pages(struct page *page, > int numpages, int enable) > { > @@ -2948,8 +2938,6 @@ static inline void debug_pagealloc_map_pages(struct > page *page, > extern bool kernel_page_present(struct page *page); > #endif /* CONFIG_HIBERNATION */ > #else /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */ > -static inline void > -kernel_map_pages(struct page *page, int numpages, int enable) {} > static inline void debug_pagealloc_map_pages(struct page *page, > int numpages, int enable) {} > #ifdef CONFIG_HIBERNATION > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > index 46b1804c1ddf..054c8cce4236 100644 > --- a/kernel/power/snapshot.c > +++ b/kernel/power/snapshot.c > @@ -76,6 +76,32 @@ static inline void hibernate_restore_protect_page(void > *page_address) {} > static inline void hibernate_restore_unprotect_page(void *page_address) {} > #endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_ARCH_HAS_SET_MEMORY */ > > +static inline void hibernate_map_page(struct page *page, int enable) > +{ > + if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { > + unsigned long addr = (unsigned long)page_address(page); > + int ret; > + > + /* > +* This should not fail because remapping a page here means > +* that we only update protection bits in an existing PTE. > +* It is still worth to have WARN_ON() here if something > +* changes and this will no longer be the case. > +*/ > + if (enable) > + ret = set_direct_map_default_noflush(page); > + else > + ret = set_direct_map_invalid_noflush(page); > + > + if (WARN_ON(ret)) > + return; > + > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + } else { > + debug_pagealloc_map_pages(page, 1, enable); > + } > +} > + > static int swsusp_page_is_free(struct page *); > static void swsusp_set_page_forbidden(struct page *); > static void swsusp_unset_page_forbidden(struct page *); > @@ -1355,9 +1381,9 @@ static void safe_copy_page(void *dst, struct page > *s_page) > if (kernel_page_present(s_page)) { > do_copy_page(dst, page_address(s_page)); > } else { > - kernel_map_pages(s_page, 1, 1); > + hibernate_map_page(s_page, 1); > do_copy_page(dst, page_address(s_page)); > - kernel_map_pages(s_page, 1, 0); > + hibernate_map_page(s_page, 0); > } > } > > -- > 2.28.0 >
[PATCH v1 4/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
Let's use alloc_contig_pages() for allocating memory and remove the linear mapping manually via arch_remove_linear_mapping(). Mark all pages PG_offline, such that they will definitely not get touched - e.g., when hibernating. When freeing memory, try to revert what we did. The original idea was discussed in: https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915...@redhat.com This is similar to CONFIG_DEBUG_PAGEALLOC handling on other architectures, whereby only single pages are unmapped from the linear mapping. Let's mimic what memory hot(un)plug would do with the linear mapping. We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Simple test under QEMU TCG (10GB RAM, single NUMA node): sh-5.0# mount -t debugfs none /sys/kernel/debug/ sh-5.0# cat /sys/devices/system/memory/block_size_bytes 4000 sh-5.0# echo 0x4000 > /sys/kernel/debug/powerpc/memtrace/enable [ 71.052836][ T356] memtrace: Allocated trace memory on node 0 at 0x8000 sh-5.0# echo 0x8000 > /sys/kernel/debug/powerpc/memtrace/enable [ 75.424302][ T356] radix-mmu: Mapped 0x8000-0xc000 with 64.0 KiB pages [ 75.430549][ T356] memtrace: Freed trace memory back on node 0 [ 75.604520][ T356] memtrace: Allocated trace memory on node 0 at 0x8000 sh-5.0# echo 0x1 > /sys/kernel/debug/powerpc/memtrace/enable [ 80.418835][ T356] radix-mmu: Mapped 0x8000-0x0001 with 64.0 KiB pages [ 80.430493][ T356] memtrace: Freed trace memory back on node 0 [ 80.433882][ T356] memtrace: Failed to allocate trace memory on node 0 sh-5.0# echo 0x4000 > /sys/kernel/debug/powerpc/memtrace/enable [ 91.920158][ T356] memtrace: Allocated trace memory on node 0 at 0x8000 Note 1: We currently won't be allocating from ZONE_MOVABLE - because our pages are not movable. However, as we don't run with any memory hot(un)plug mechanism around, we could make an exception to increase the chance of allocations succeeding. Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used along PG_reserved in hibernation code will always return "true" on powerpc, resulting in the pages getting touched. It's too generic - e.g., indicates boot allocations. Note 3: For now, we keep using memory_block_size_bytes() as minimum granularity. I'm not able to come up with a better guess (most probably, doing it on a section basis could be possible). Suggested-by: Michal Hocko Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Rashmica Gupta Cc: Andrew Morton Cc: Mike Rapoport Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/Kconfig| 8 +- arch/powerpc/platforms/powernv/memtrace.c | 134 -- 2 files changed, 49 insertions(+), 93 deletions(-) diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index 938803eab0ad..619b093a0657 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -27,11 +27,11 @@ config OPAL_PRD recovery diagnostics on OpenPower machines config PPC_MEMTRACE - bool "Enable removal of RAM from kernel mappings for tracing" - depends on PPC_POWERNV && MEMORY_HOTREMOVE + bool "Enable runtime allocation of RAM for tracing" + depends on PPC_POWERNV && MEMORY_HOTPLUG && CONTIG_ALLOC help - Enabling this option allows for the removal of memory (RAM) - from the kernel mappings to be used for hardware tracing. + Enabling this option allows for runtime allocation of memory (RAM) + for hardware tracing. config PPC_VAS bool "IBM Virtual Accelerator Switchboard (VAS)" diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 6828108486f8..8f47797a78c2 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -50,83 +50,50 @@ static const struct file_operations memtrace_fops = { .open = simple_open, }; -static int check_memblock_online(struct memory_block *mem, void *arg) -{ - if (mem->state != MEM_ONLINE) - return -1; - - return 0; -} - -static int change_memblock_state(struct memory_block *mem, void *arg) -{ - unsigned long state = (unsigned long)arg; - - mem->state = state; - - return 0; -} - -/* called with device_hotplug_lock held */ -static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) +static u64 memtrace_alloc_node(u32 nid, u64 size) { - const unsigned long start = PFN_PHYS(start_pfn); - const unsigned long size = PFN_PHYS(nr_pages); + const unsigned long nr_pages = PHYS_PFN(size); + unsigned long pfn, start_pfn; + struct page *page; - if (walk_memory_blocks(start, size
[PATCH v1 3/4] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
Let's revert what we did in case seomthing goes wrong and we return an error. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Rashmica Gupta Cc: Andrew Morton Cc: Mike Rapoport Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Signed-off-by: David Hildenbrand --- arch/powerpc/mm/mem.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 685028451dd2..69b3e8072261 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -165,7 +165,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, rc = arch_create_linear_mapping(nid, start, size, params); if (rc) return rc; - return __add_pages(nid, start_pfn, nr_pages, params); + rc = __add_pages(nid, start_pfn, nr_pages, params); + if (rc) + arch_remove_linear_mapping(start, size); + return rc; } void __ref arch_remove_memory(int nid, u64 start, u64 size, -- 2.26.2
[PATCH v1 2/4] powerpc/mm: print warning in arch_remove_linear_mapping()
Let's print a warning similar to in arch_add_linear_mapping() instead of WARN_ON_ONCE() and eventually crashing the kernel. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Rashmica Gupta Cc: Andrew Morton Cc: Mike Rapoport Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Signed-off-by: David Hildenbrand --- arch/powerpc/mm/mem.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 8a86d81f8df0..685028451dd2 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -145,7 +145,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size) flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE); ret = remove_section_mapping(start, start + size); - WARN_ON_ONCE(ret); + if (ret) + pr_warn("Unable to remove linear mapping for 0x%llx..0x%llx: %d\n", + start, start + size, ret); /* Ensure all vmalloc mappings are flushed in case they also * hit that section of memory -- 2.26.2
[PATCH v1 1/4] powerpc/mm: factor out creating/removing linear mapping
We want to stop abusing memory hotplug infrastructure in memtrace code to perform allocations and remove the linear mapping. Instead we will use alloc_contig_pages() and remove the identity mapping manually. Let's factor out creating/removing the linear mapping into arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the future, we might be able to have whole arch_add_memory() / arch_remove_memory() be implemented in common code. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Rashmica Gupta Cc: Andrew Morton Cc: Mike Rapoport Cc: Michal Hocko Cc: Oscar Salvador Cc: Wei Yang Signed-off-by: David Hildenbrand --- arch/powerpc/mm/mem.c | 41 +++--- include/linux/memory_hotplug.h | 3 +++ 2 files changed, 31 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 01ec2a252f09..8a86d81f8df0 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -120,34 +120,26 @@ static void flush_dcache_range_chunked(unsigned long start, unsigned long stop, } } -int __ref arch_add_memory(int nid, u64 start, u64 size, - struct mhp_params *params) +int __ref arch_create_linear_mapping(int nid, u64 start, u64 size, +struct mhp_params *params) { - unsigned long start_pfn = start >> PAGE_SHIFT; - unsigned long nr_pages = size >> PAGE_SHIFT; int rc; start = (unsigned long)__va(start); rc = create_section_mapping(start, start + size, nid, params->pgprot); if (rc) { - pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n", + pr_warn("Unable to create linear mapping for 0x%llx..0x%llx: %d\n", start, start + size, rc); return -EFAULT; } - - return __add_pages(nid, start_pfn, nr_pages, params); + return 0; } -void __ref arch_remove_memory(int nid, u64 start, u64 size, -struct vmem_altmap *altmap) +void __ref arch_remove_linear_mapping(u64 start, u64 size) { - unsigned long start_pfn = start >> PAGE_SHIFT; - unsigned long nr_pages = size >> PAGE_SHIFT; int ret; - __remove_pages(start_pfn, nr_pages, altmap); - /* Remove htab bolted mappings for this section of memory */ start = (unsigned long)__va(start); flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE); @@ -160,6 +152,29 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size, */ vm_unmap_aliases(); } + +int __ref arch_add_memory(int nid, u64 start, u64 size, + struct mhp_params *params) +{ + unsigned long start_pfn = start >> PAGE_SHIFT; + unsigned long nr_pages = size >> PAGE_SHIFT; + int rc; + + rc = arch_create_linear_mapping(nid, start, size, params); + if (rc) + return rc; + return __add_pages(nid, start_pfn, nr_pages, params); +} + +void __ref arch_remove_memory(int nid, u64 start, u64 size, + struct vmem_altmap *altmap) +{ + unsigned long start_pfn = start >> PAGE_SHIFT; + unsigned long nr_pages = size >> PAGE_SHIFT; + + __remove_pages(start_pfn, nr_pages, altmap); + arch_remove_linear_mapping(start, size); +} #endif #ifndef CONFIG_NEED_MULTIPLE_NODES diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d65c6fdc5cfc..00b9e9bd3850 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -375,6 +375,9 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn, unsigned long nr_pages); +extern int arch_create_linear_mapping(int nid, u64 start, u64 size, + struct mhp_params *params); +void arch_remove_linear_mapping(u64 start, u64 size); #endif /* CONFIG_MEMORY_HOTPLUG */ #endif /* __LINUX_MEMORY_HOTPLUG_H */ -- 2.26.2
[PATCH v1 0/4] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
powernv/memtrace is the only in-kernel user that rips out random memory it never added (doesn't own) in order to allocate memory without a linear mapping. Let's stop abusing memory hot(un)plug infrastructure for that - use alloc_contig_pages() for allocating memory and remove the linear mapping manually. The original idea was discussed in: https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915...@redhat.com I only tested allocations briefly via QEMU TCG - see patch #4 for more details. David Hildenbrand (4): powerpc/mm: factor out creating/removing linear mapping powerpc/mm: print warning in arch_remove_linear_mapping() powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations arch/powerpc/mm/mem.c | 48 +--- arch/powerpc/platforms/powernv/Kconfig| 8 +- arch/powerpc/platforms/powernv/memtrace.c | 134 -- include/linux/memory_hotplug.h| 3 + 4 files changed, 86 insertions(+), 107 deletions(-) -- 2.26.2
[PATCH v2 4/4] arch, mm: make kernel_page_present() always available
From: Mike Rapoport For architectures that enable ARCH_HAS_SET_MEMORY having the ability to verify that a page is mapped in the kernel direct map can be useful regardless of hibernation. Add RISC-V implementation of kernel_page_present(), update its forward declarations and stubs to be a part of set_memory API and remove ugly ifdefery in inlcude/linux/mm.h around current declarations of kernel_page_present(). Signed-off-by: Mike Rapoport --- arch/arm64/include/asm/cacheflush.h | 1 + arch/riscv/include/asm/set_memory.h | 1 + arch/riscv/mm/pageattr.c| 29 + arch/x86/include/asm/set_memory.h | 1 + arch/x86/mm/pat/set_memory.c| 2 -- include/linux/mm.h | 7 --- include/linux/set_memory.h | 5 + 7 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index 9384fd8fc13c..45217f21f1fe 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -140,6 +140,7 @@ int set_memory_valid(unsigned long addr, int numpages, int enable); int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +bool kernel_page_present(struct page *page); #include diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h index 4c5bae7ca01c..d690b08dff2a 100644 --- a/arch/riscv/include/asm/set_memory.h +++ b/arch/riscv/include/asm/set_memory.h @@ -24,6 +24,7 @@ static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +bool kernel_page_present(struct page *page); #endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index 321b09d2e2ea..87ba5a68bbb8 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -198,3 +198,32 @@ void __kernel_map_pages(struct page *page, int numpages, int enable) __pgprot(0), __pgprot(_PAGE_PRESENT)); } #endif + +bool kernel_page_present(struct page *page) +{ + unsigned long addr = (unsigned long)page_address(page); + pgd_t *pgd; + pud_t *pud; + p4d_t *p4d; + pmd_t *pmd; + pte_t *pte; + + pgd = pgd_offset_k(addr); + if (!pgd_present(*pgd)) + return false; + + p4d = p4d_offset(pgd, addr); + if (!p4d_present(*p4d)) + return false; + + pud = pud_offset(p4d, addr); + if (!pud_present(*pud)) + return false; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + return false; + + pte = pte_offset_kernel(pmd, addr); + return pte_present(*pte); +} diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 5948218f35c5..4352f08bfbb5 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -82,6 +82,7 @@ int set_pages_rw(struct page *page, int numpages); int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); +bool kernel_page_present(struct page *page); extern int kernel_set_to_readonly; diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 7f248fc45317..16f878c26667 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2228,7 +2228,6 @@ void __kernel_map_pages(struct page *page, int numpages, int enable) } #endif /* CONFIG_DEBUG_PAGEALLOC */ -#ifdef CONFIG_HIBERNATION bool kernel_page_present(struct page *page) { unsigned int level; @@ -2240,7 +2239,6 @@ bool kernel_page_present(struct page *page) pte = lookup_address((unsigned long)page_address(page), &level); return (pte_val(*pte) & _PAGE_PRESENT); } -#endif /* CONFIG_HIBERNATION */ int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, unsigned numpages, unsigned long page_flags) diff --git a/include/linux/mm.h b/include/linux/mm.h index ab0ef6bd351d..44b82f22e76a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2937,16 +2937,9 @@ static inline void debug_pagealloc_map_pages(struct page *page, if (debug_pagealloc_enabled_static()) __kernel_map_pages(page, numpages, enable); } - -#ifdef CONFIG_HIBERNATION -extern bool kernel_page_present(struct page *page); -#endif /* CONFIG_HIBERNATION */ #else /* CONFIG_DEBUG_PAGEALLOC */ static inline void debug_pagealloc_map_pages(struct page *page, int numpages, int enable) {} -#ifdef CONFIG_HIBERNATION -static inline bool kernel_page_present(struct page *page) { return true; } -#endif /* CONFIG_HIBERNATION */ #endif /* CONFIG_DEBUG_PAGEALLOC */ #ifdef __HAVE_ARCH_GATE_AREA diff --git a/include/l
[PATCH v2 3/4] arch, mm: restore dependency of __kernel_map_pages() of DEBUG_PAGEALLOC
From: Mike Rapoport The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must never fail. With this assumption is wouldn't be safe to allow general usage of this function. Moreover, some architectures that implement __kernel_map_pages() have this function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap pages when page allocation debugging is disabled at runtime. As all the users of __kernel_map_pages() were converted to use debug_pagealloc_map_pages() it is safe to make it available only when DEBUG_PAGEALLOC is set. Signed-off-by: Mike Rapoport --- arch/Kconfig | 3 +++ arch/arm64/Kconfig | 4 +--- arch/arm64/mm/pageattr.c | 6 -- arch/powerpc/Kconfig | 5 + arch/riscv/Kconfig | 4 +--- arch/riscv/include/asm/pgtable.h | 2 -- arch/riscv/mm/pageattr.c | 2 ++ arch/s390/Kconfig| 4 +--- arch/sparc/Kconfig | 4 +--- arch/x86/Kconfig | 4 +--- arch/x86/mm/pat/set_memory.c | 2 ++ include/linux/mm.h | 10 +++--- 12 files changed, 24 insertions(+), 26 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 56b6ccc0e32d..56d4752b6db6 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1028,6 +1028,9 @@ config HAVE_STATIC_CALL_INLINE bool depends on HAVE_STATIC_CALL +config ARCH_SUPPORTS_DEBUG_PAGEALLOC + bool + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index f858c352f72a..5a01dfb77b93 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -71,6 +71,7 @@ config ARM64 select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_USE_SYM_ANNOTATIONS + select ARCH_SUPPORTS_DEBUG_PAGEALLOC select ARCH_SUPPORTS_MEMORY_FAILURE select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK select ARCH_SUPPORTS_ATOMIC_RMW @@ -1005,9 +1006,6 @@ config HOLES_IN_ZONE source "kernel/Kconfig.hz" -config ARCH_SUPPORTS_DEBUG_PAGEALLOC - def_bool y - config ARCH_SPARSEMEM_ENABLE def_bool y select SPARSEMEM_VMEMMAP_ENABLE diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 1b94f5b82654..18613d8834db 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -178,13 +178,15 @@ int set_direct_map_default_noflush(struct page *page) PAGE_SIZE, change_page_range, &data); } +#ifdef CONFIG_DEBUG_PAGEALLOC void __kernel_map_pages(struct page *page, int numpages, int enable) { - if (!debug_pagealloc_enabled() && !rodata_full) + if (!rodata_full) return; set_memory_valid((unsigned long)page_address(page), numpages, enable); } +#endif /* CONFIG_DEBUG_PAGEALLOC */ /* * This function is used to determine if a linear map page has been marked as @@ -204,7 +206,7 @@ bool kernel_page_present(struct page *page) pte_t *ptep; unsigned long addr = (unsigned long)page_address(page); - if (!debug_pagealloc_enabled() && !rodata_full) + if (!rodata_full) return true; pgdp = pgd_offset_k(addr); diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e9f13fe08492..ad8a83f3ddca 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -146,6 +146,7 @@ config PPC select ARCH_MIGHT_HAVE_PC_SERIO select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_SUPPORTS_ATOMIC_RMW + select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC32 || PPC_BOOK3S_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select ARCH_USE_QUEUED_RWLOCKS if PPC_QUEUED_SPINLOCKS @@ -355,10 +356,6 @@ config PPC_OF_PLATFORM_PCI depends on PCI depends on PPC64 # not supported on 32 bits yet -config ARCH_SUPPORTS_DEBUG_PAGEALLOC - depends on PPC32 || PPC_BOOK3S_64 - def_bool y - config ARCH_SUPPORTS_UPROBES def_bool y diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 44377fd7860e..9283c6f9ae2a 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -14,6 +14,7 @@ config RISCV def_bool y select ARCH_CLOCKSOURCE_INIT select ARCH_SUPPORTS_ATOMIC_RMW + select ARCH_SUPPORTS_DEBUG_PAGEALLOC if MMU select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEBUG_VIRTUAL if MMU @@ -153,9 +154,6 @@ config ARCH_SELECT_MEMORY_MODEL config ARCH_WANT_GENERAL_HUGETLB def_bool y -config ARCH_SUPPORTS_DEBUG_PAGEALLOC - def_bool y - config SYS_SUPPORTS_HUGETLBFS depends on MMU def_bool y diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 183f1f4b2ae6..41a72861987c 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/i
[PATCH] powerpc: add support for TIF_NOTIFY_SIGNAL
Wire up TIF_NOTIFY_SIGNAL handling for powerpc. Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Jens Axboe --- 5.11 has support queued up for TIF_NOTIFY_SIGNAL, see this posting for details: https://lore.kernel.org/io-uring/20201026203230.386348-1-ax...@kernel.dk/ As part of that work, I'm adding TIF_NOTIFY_SIGNAL support to all archs, as that will enable a set of cleanups once all of them support it. I'm happy carrying this patch if need be, or it can be funelled through the arch tree. Let me know. arch/powerpc/include/asm/thread_info.h | 5 - arch/powerpc/kernel/signal.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 46a210b03d2b..53115ae61495 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -90,6 +90,7 @@ void arch_setup_new_exec(void); #define TIF_SYSCALL_TRACE 0 /* syscall trace active */ #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ +#define TIF_NOTIFY_SIGNAL 3 /* signal notifications exist */ #define TIF_SYSCALL_EMU4 /* syscall emulation active */ #define TIF_RESTORE_TM 5 /* need to restore TM FP/VEC/VSX */ #define TIF_PATCH_PENDING 6 /* pending live patching update */ @@ -115,6 +116,7 @@ void arch_setup_new_exec(void); #define _TIF_SYSCALL_TRACE (1
[PATCH v2 2/4] PM: hibernate: make direct map manipulations more explicit
From: Mike Rapoport When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be not present in the direct map and has to be explicitly mapped before it could be copied. On arm64 it is possible that a page would be removed from the direct map using set_direct_map_invalid_noflush() but __kernel_map_pages() will refuse to map this page back if DEBUG_PAGEALLOC is disabled. Introduce hibernate_map_page() that will explicitly use set_direct_map_{default,invalid}_noflush() for ARCH_HAS_SET_DIRECT_MAP case and debug_pagealloc_map_pages() for DEBUG_PAGEALLOC case. The remapping of the pages in safe_copy_page() presumes that it only changes protection bits in an existing PTE and so it is safe to ignore return value of set_direct_map_{default,invalid}_noflush(). Still, add a WARN_ON() so that future changes in set_memory APIs will not silently break hibernation. Signed-off-by: Mike Rapoport --- include/linux/mm.h | 12 kernel/power/snapshot.c | 30 -- 2 files changed, 28 insertions(+), 14 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1fc0609056dc..14e397f3752c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2927,16 +2927,6 @@ static inline bool debug_pagealloc_enabled_static(void) #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP) extern void __kernel_map_pages(struct page *page, int numpages, int enable); -/* - * When called in DEBUG_PAGEALLOC context, the call should most likely be - * guarded by debug_pagealloc_enabled() or debug_pagealloc_enabled_static() - */ -static inline void -kernel_map_pages(struct page *page, int numpages, int enable) -{ - __kernel_map_pages(page, numpages, enable); -} - static inline void debug_pagealloc_map_pages(struct page *page, int numpages, int enable) { @@ -2948,8 +2938,6 @@ static inline void debug_pagealloc_map_pages(struct page *page, extern bool kernel_page_present(struct page *page); #endif /* CONFIG_HIBERNATION */ #else /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */ -static inline void -kernel_map_pages(struct page *page, int numpages, int enable) {} static inline void debug_pagealloc_map_pages(struct page *page, int numpages, int enable) {} #ifdef CONFIG_HIBERNATION diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 46b1804c1ddf..054c8cce4236 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -76,6 +76,32 @@ static inline void hibernate_restore_protect_page(void *page_address) {} static inline void hibernate_restore_unprotect_page(void *page_address) {} #endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_ARCH_HAS_SET_MEMORY */ +static inline void hibernate_map_page(struct page *page, int enable) +{ + if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { + unsigned long addr = (unsigned long)page_address(page); + int ret; + + /* +* This should not fail because remapping a page here means +* that we only update protection bits in an existing PTE. +* It is still worth to have WARN_ON() here if something +* changes and this will no longer be the case. +*/ + if (enable) + ret = set_direct_map_default_noflush(page); + else + ret = set_direct_map_invalid_noflush(page); + + if (WARN_ON(ret)) + return; + + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + } else { + debug_pagealloc_map_pages(page, 1, enable); + } +} + static int swsusp_page_is_free(struct page *); static void swsusp_set_page_forbidden(struct page *); static void swsusp_unset_page_forbidden(struct page *); @@ -1355,9 +1381,9 @@ static void safe_copy_page(void *dst, struct page *s_page) if (kernel_page_present(s_page)) { do_copy_page(dst, page_address(s_page)); } else { - kernel_map_pages(s_page, 1, 1); + hibernate_map_page(s_page, 1); do_copy_page(dst, page_address(s_page)); - kernel_map_pages(s_page, 1, 0); + hibernate_map_page(s_page, 0); } } -- 2.28.0
[PATCH v2 1/4] mm: introduce debug_pagealloc_map_pages() helper
From: Mike Rapoport When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel direct mapping after free_pages(). The pages than need to be mapped back before they could be used. Theese mapping operations use __kernel_map_pages() guarded with with debug_pagealloc_enabled(). The only place that calls __kernel_map_pages() without checking whether DEBUG_PAGEALLOC is enabled is the hibernation code that presumes availability of this function when ARCH_HAS_SET_DIRECT_MAP is set. Still, on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not enabled but set_direct_map_invalid_noflush() may render some pages not present in the direct map and hibernation code won't be able to save such pages. To make page allocation debugging and hibernation interaction more robust, the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be made more explicit. Start with combining the guard condition and the call to __kernel_map_pages() into a single debug_pagealloc_map_pages() function to emphasize that __kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use this new function to map/unmap pages when page allocation debug is enabled. Signed-off-by: Mike Rapoport Reviewed-by: David Hildenbrand --- include/linux/mm.h | 10 ++ mm/memory_hotplug.c | 3 +-- mm/page_alloc.c | 6 ++ mm/slab.c | 8 +++- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ef360fe70aaf..1fc0609056dc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2936,12 +2936,22 @@ kernel_map_pages(struct page *page, int numpages, int enable) { __kernel_map_pages(page, numpages, enable); } + +static inline void debug_pagealloc_map_pages(struct page *page, +int numpages, int enable) +{ + if (debug_pagealloc_enabled_static()) + __kernel_map_pages(page, numpages, enable); +} + #ifdef CONFIG_HIBERNATION extern bool kernel_page_present(struct page *page); #endif /* CONFIG_HIBERNATION */ #else /* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */ static inline void kernel_map_pages(struct page *page, int numpages, int enable) {} +static inline void debug_pagealloc_map_pages(struct page *page, +int numpages, int enable) {} #ifdef CONFIG_HIBERNATION static inline bool kernel_page_present(struct page *page) { return true; } #endif /* CONFIG_HIBERNATION */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b44d4c7ba73b..e2b6043a4428 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -614,8 +614,7 @@ void generic_online_page(struct page *page, unsigned int order) * so we should map it first. This is better than introducing a special * case in page freeing fast path. */ - if (debug_pagealloc_enabled_static()) - kernel_map_pages(page, 1 << order, 1); + debug_pagealloc_map_pages(page, 1 << order, 1); __free_pages_core(page, order); totalram_pages_add(1UL << order); #ifdef CONFIG_HIGHMEM diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 23f5066bd4a5..9a66a1ff9193 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1272,8 +1272,7 @@ static __always_inline bool free_pages_prepare(struct page *page, */ arch_free_page(page, order); - if (debug_pagealloc_enabled_static()) - kernel_map_pages(page, 1 << order, 0); + debug_pagealloc_map_pages(page, 1 << order, 0); kasan_free_nondeferred_pages(page, order); @@ -2270,8 +2269,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, set_page_refcounted(page); arch_alloc_page(page, order); - if (debug_pagealloc_enabled_static()) - kernel_map_pages(page, 1 << order, 1); + debug_pagealloc_map_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); kernel_poison_pages(page, 1 << order, 1); set_page_owner(page, order, gfp_flags); diff --git a/mm/slab.c b/mm/slab.c index b1113561b98b..340db0ce74c4 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1431,10 +1431,8 @@ static bool is_debug_pagealloc_cache(struct kmem_cache *cachep) #ifdef CONFIG_DEBUG_PAGEALLOC static void slab_kernel_map(struct kmem_cache *cachep, void *objp, int map) { - if (!is_debug_pagealloc_cache(cachep)) - return; - - kernel_map_pages(virt_to_page(objp), cachep->size / PAGE_SIZE, map); + debug_pagealloc_map_pages(virt_to_page(objp), + cachep->size / PAGE_SIZE, map); } #else @@ -2062,7 +2060,7 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags) #if DEBUG /* -* If we're going to use the generic kernel_map_pages() +* If we're going to use the generic debug_pagealloc_map_pages() * poisoning, then it's going to smash the c
[PATCH v2 0/4] arch, mm: improve robustness of direct map manipulation
From: Mike Rapoport Hi, During recent discussion about KVM protected memory, David raised a concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC scope [1]. Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is possible that __kernel_map_pages() would fail, but since this function is void, the failure will go unnoticed. Moreover, there's lack of consistency of __kernel_map_pages() semantics across architectures as some guard this function with #ifdef DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation debugging is disabled at run time and some allow modifying the direct map regardless of DEBUG_PAGEALLOC settings. This set straightens this out by restoring dependency of __kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites accordingly. Since currently the only user of __kernel_map_pages() outside DEBUG_PAGEALLOC, it is updated to make direct map accesses there more explicit. [1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a403482...@redhat.com v2 changes: * Rephrase patch 2 changelog to better describe the change intentions and implications * Move removal of kernel_map_pages() from patch 1 to patch 2, per David v1: https://lore.kernel.org/lkml/20201025101555.3057-1-r...@kernel.org Mike Rapoport (4): mm: introduce debug_pagealloc_map_pages() helper PM: hibernate: make direct map manipulations more explicit arch, mm: restore dependency of __kernel_map_pages() of DEBUG_PAGEALLOC arch, mm: make kernel_page_present() always available arch/Kconfig| 3 +++ arch/arm64/Kconfig | 4 +--- arch/arm64/include/asm/cacheflush.h | 1 + arch/arm64/mm/pageattr.c| 6 +++-- arch/powerpc/Kconfig| 5 + arch/riscv/Kconfig | 4 +--- arch/riscv/include/asm/pgtable.h| 2 -- arch/riscv/include/asm/set_memory.h | 1 + arch/riscv/mm/pageattr.c| 31 + arch/s390/Kconfig | 4 +--- arch/sparc/Kconfig | 4 +--- arch/x86/Kconfig| 4 +--- arch/x86/include/asm/set_memory.h | 1 + arch/x86/mm/pat/set_memory.c| 4 ++-- include/linux/mm.h | 35 + include/linux/set_memory.h | 5 + kernel/power/snapshot.c | 30 +++-- mm/memory_hotplug.c | 3 +-- mm/page_alloc.c | 6 ++--- mm/slab.c | 8 +++ 20 files changed, 103 insertions(+), 58 deletions(-) -- 2.28.0
Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier
On Wed, 2020-10-28 at 17:31 -0700, Paul E. McKenney wrote: > On Thu, Oct 29, 2020 at 11:09:07AM +1100, Michael Ellerman wrote: > > Qian Cai writes: > > > The call to rcu_cpu_starting() in start_secondary() is not early enough > > > in the CPU-hotplug onlining process, which results in lockdep splats as > > > follows: > > > > Since when? > > What kernel version? > > > > I haven't seen this running CPU hotplug tests with PROVE_LOCKING=y on > > v5.10-rc1. Am I missing a CONFIG? > > My guess would be that adding CONFIG_PROVE_RAW_LOCK_NESTING=y will > get you some splats. Well, I don't have that set, so it should be CONFIG_PROVE_RCU_LIST=y. Anyway, this is .config to reproduce on Power9 NV: https://cailca.coding.net/public/linux/mm/git/files/master/powerpc.config