[PATCH 2/2] drivers core: remove assert_held_device_hotplug()
The last caller of assert_held_device_hotplug() is gone, so remove it again. Cc: Dan Williams Cc: Michal Hocko Cc: "Rafael J. Wysocki" Cc: Vladimir Davydov Cc: Ben Hutchings Cc: Gerald Schaefer Cc: Martin Schwidefsky Cc: Sebastian Ott Signed-off-by: Heiko Carstens --- drivers/base/core.c| 5 - include/linux/device.h | 1 - 2 files changed, 6 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 684bda4d14a1..6bb60fb6a30b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -639,11 +639,6 @@ int lock_device_hotplug_sysfs(void) return restart_syscall(); } -void assert_held_device_hotplug(void) -{ - lockdep_assert_held(_hotplug_lock); -} - #ifdef CONFIG_BLOCK static inline int device_is_not_partition(struct device *dev) { diff --git a/include/linux/device.h b/include/linux/device.h index 30c4570e928d..9ef518af5515 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1140,7 +1140,6 @@ static inline bool device_supports_offline(struct device *dev) extern void lock_device_hotplug(void); extern void unlock_device_hotplug(void); extern int lock_device_hotplug_sysfs(void); -void assert_held_device_hotplug(void); extern int device_offline(struct device *dev); extern int device_online(struct device *dev); extern void set_primary_fwnode(struct device *dev, struct fwnode_handle *fwnode); -- 2.8.4
[PATCH 0/2] mm: add private lock to serialize memory hotplug operations
These two patches are supposed to hopefully fix a memory hotplug problem reported by Sebastian Ott. Heiko Carstens (2): mm: add private lock to serialize memory hotplug operations drivers core: remove assert_held_device_hotplug() drivers/base/core.c| 5 - include/linux/device.h | 1 - kernel/memremap.c | 4 mm/memory_hotplug.c| 6 +- 4 files changed, 5 insertions(+), 11 deletions(-) -- 2.8.4
[PATCH 0/2] mm: add private lock to serialize memory hotplug operations
These two patches are supposed to hopefully fix a memory hotplug problem reported by Sebastian Ott. Heiko Carstens (2): mm: add private lock to serialize memory hotplug operations drivers core: remove assert_held_device_hotplug() drivers/base/core.c| 5 - include/linux/device.h | 1 - kernel/memremap.c | 4 mm/memory_hotplug.c| 6 +- 4 files changed, 5 insertions(+), 11 deletions(-) -- 2.8.4
Re: [PATCH 1/4] s390: get rid of superfluous __GFP_REPEAT
On Wed, Mar 08, 2017 at 03:11:10PM +0100, Michal Hocko wrote: > On Wed 08-03-17 09:23:40, Heiko Carstens wrote: > > On Tue, Mar 07, 2017 at 04:48:40PM +0100, Michal Hocko wrote: > > > From: Michal Hocko <mho...@suse.com> > > > > > > __GFP_REPEAT has a rather weak semantic but since it has been introduced > > > around 2.6.12 it has been ignored for low order allocations. > > > > > > page_table_alloc then uses the flag for a single page allocation. This > > > means that this flag has never been actually useful here because it has > > > always been used only for PAGE_ALLOC_COSTLY requests. > > > > > > An earlier attempt to remove the flag 10d58bf297e2 ("s390: get rid of > > > superfluous __GFP_REPEAT") has missed this one but the situation is very > > > same here. > > > > > > Cc: Heiko Carstens <heiko.carst...@de.ibm.com> > > > Signed-off-by: Michal Hocko <mho...@suse.com> > > > --- > > > arch/s390/mm/pgalloc.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > FWIW: > > Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> > > Thanks > > > If you want, this can be routed via the s390 tree, whatever you prefer. > > Yes, that would be great. I suspect the rest will take longer to get > merged or land to a conclusion. Ok, applied. Thanks! :)
Re: [PATCH 1/4] s390: get rid of superfluous __GFP_REPEAT
On Wed, Mar 08, 2017 at 03:11:10PM +0100, Michal Hocko wrote: > On Wed 08-03-17 09:23:40, Heiko Carstens wrote: > > On Tue, Mar 07, 2017 at 04:48:40PM +0100, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > __GFP_REPEAT has a rather weak semantic but since it has been introduced > > > around 2.6.12 it has been ignored for low order allocations. > > > > > > page_table_alloc then uses the flag for a single page allocation. This > > > means that this flag has never been actually useful here because it has > > > always been used only for PAGE_ALLOC_COSTLY requests. > > > > > > An earlier attempt to remove the flag 10d58bf297e2 ("s390: get rid of > > > superfluous __GFP_REPEAT") has missed this one but the situation is very > > > same here. > > > > > > Cc: Heiko Carstens > > > Signed-off-by: Michal Hocko > > > --- > > > arch/s390/mm/pgalloc.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > FWIW: > > Acked-by: Heiko Carstens > > Thanks > > > If you want, this can be routed via the s390 tree, whatever you prefer. > > Yes, that would be great. I suspect the rest will take longer to get > merged or land to a conclusion. Ok, applied. Thanks! :)
Re: [PATCH 1/4] s390: get rid of superfluous __GFP_REPEAT
On Tue, Mar 07, 2017 at 04:48:40PM +0100, Michal Hocko wrote: > From: Michal Hocko <mho...@suse.com> > > __GFP_REPEAT has a rather weak semantic but since it has been introduced > around 2.6.12 it has been ignored for low order allocations. > > page_table_alloc then uses the flag for a single page allocation. This > means that this flag has never been actually useful here because it has > always been used only for PAGE_ALLOC_COSTLY requests. > > An earlier attempt to remove the flag 10d58bf297e2 ("s390: get rid of > superfluous __GFP_REPEAT") has missed this one but the situation is very > same here. > > Cc: Heiko Carstens <heiko.carst...@de.ibm.com> > Signed-off-by: Michal Hocko <mho...@suse.com> > --- > arch/s390/mm/pgalloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) FWIW: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> If you want, this can be routed via the s390 tree, whatever you prefer.
Re: [PATCH 1/4] s390: get rid of superfluous __GFP_REPEAT
On Tue, Mar 07, 2017 at 04:48:40PM +0100, Michal Hocko wrote: > From: Michal Hocko > > __GFP_REPEAT has a rather weak semantic but since it has been introduced > around 2.6.12 it has been ignored for low order allocations. > > page_table_alloc then uses the flag for a single page allocation. This > means that this flag has never been actually useful here because it has > always been used only for PAGE_ALLOC_COSTLY requests. > > An earlier attempt to remove the flag 10d58bf297e2 ("s390: get rid of > superfluous __GFP_REPEAT") has missed this one but the situation is very > same here. > > Cc: Heiko Carstens > Signed-off-by: Michal Hocko > --- > arch/s390/mm/pgalloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) FWIW: Acked-by: Heiko Carstens If you want, this can be routed via the s390 tree, whatever you prefer.
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
Hello Dan, > > If you look at commit 5e33bc4165f3 ("driver core / ACPI: Avoid device hot > > remove locking issues") then lock_device_hotplug_sysfs() was introduced to > > avoid a different subtle deadlock, but it also sleeps uninterruptible, but > > not for more than 5ms ;) > > > > However I'm not sure if the device hotplug lock should also be used to fix > > an unrelated bug that was introduced with the get_online_mems() / > > put_online_mems() interface. Should it? > > No, I don't think it should. > > I like your proposed direction of creating a new lock internal to > mem_hotplug_begin() to protect active_writer, and stop relying on > lock_device_hotplug to serve this purpose. > > > If so, we need to sprinkle around a couple of lock_device_hotplug() calls > > near mem_hotplug_begin() calls, like Sebastian already started, and give it > > additional semantics (protecting mem_hotplug.active_writer), and hope it > > doesn't lead to deadlocks anywhere. > > I'll put your proposed patch through some testing. On s390 it _seems_ to work. Did it pass your testing too? If so I would send a patch with proper patch description for inclusion. Thanks, Heiko
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
Hello Dan, > > If you look at commit 5e33bc4165f3 ("driver core / ACPI: Avoid device hot > > remove locking issues") then lock_device_hotplug_sysfs() was introduced to > > avoid a different subtle deadlock, but it also sleeps uninterruptible, but > > not for more than 5ms ;) > > > > However I'm not sure if the device hotplug lock should also be used to fix > > an unrelated bug that was introduced with the get_online_mems() / > > put_online_mems() interface. Should it? > > No, I don't think it should. > > I like your proposed direction of creating a new lock internal to > mem_hotplug_begin() to protect active_writer, and stop relying on > lock_device_hotplug to serve this purpose. > > > If so, we need to sprinkle around a couple of lock_device_hotplug() calls > > near mem_hotplug_begin() calls, like Sebastian already started, and give it > > additional semantics (protecting mem_hotplug.active_writer), and hope it > > doesn't lead to deadlocks anywhere. > > I'll put your proposed patch through some testing. On s390 it _seems_ to work. Did it pass your testing too? If so I would send a patch with proper patch description for inclusion. Thanks, Heiko
Re: [PATCH 1/3] futex: remove duplicated code
On Fri, Mar 03, 2017 at 01:27:10PM +0100, Jiri Slaby wrote: > There is code duplicated over all architecture's headers for > futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, > and comparison of the result. > > Remove this duplication and leave up to the arches only the needed > assembly which is now in arch_futex_atomic_op_inuser. > > Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: > remove pointless access_ok() checks") as access_ok there returns true. > We introduce it back to the helper for the sake of simplicity (it gets > optimized away anyway). > > Signed-off-by: Jiri Slaby <jsl...@suse.cz> > --- > arch/s390/include/asm/futex.h | 23 - > include/asm-generic/futex.h | 50 > +++-- > kernel/futex.c | 36 ++ Looks good to me and still boots on s390. Therefore for the s390 bits: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> Thanks!
Re: [PATCH 1/3] futex: remove duplicated code
On Fri, Mar 03, 2017 at 01:27:10PM +0100, Jiri Slaby wrote: > There is code duplicated over all architecture's headers for > futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, > and comparison of the result. > > Remove this duplication and leave up to the arches only the needed > assembly which is now in arch_futex_atomic_op_inuser. > > Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: > remove pointless access_ok() checks") as access_ok there returns true. > We introduce it back to the helper for the sake of simplicity (it gets > optimized away anyway). > > Signed-off-by: Jiri Slaby > --- > arch/s390/include/asm/futex.h | 23 - > include/asm-generic/futex.h | 50 > +++-- > kernel/futex.c | 36 ++ Looks good to me and still boots on s390. Therefore for the s390 bits: Acked-by: Heiko Carstens Thanks!
[BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable
Hello Al, Gustavo reported the crash below within __d_lookup() on s390. I'm wondering if you can make any sense of it: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: f000 TEID: f803 Fault in home space mode while using kernel ASCE. AS:00ec8007 R3:0003dc5f4007 S:0020 Oops: 0038 ilc:3 [#1] SMP Modules linked in: dm_mirror dm_region_hash dm_log raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_service_time paes_s390 pkey zcrypt rng_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter aes_s390 des_s390 des_generic sha512_s390 sha256_s390 vmur sha1_s390 sha_common vhost_net nfsd tun vhost macvtap auth_rpcgss macvlan nfs_acl lockd kvm sch_fq_codel grace sunrpc dm_multipath dm_mod ip_tables x_tables autofs4 CPU: 1 PID: 31708 Comm: fio Tainted: GW 4.10.0-20170228.0.1e014e7.d661408.fc24.s390xperformance #1 Hardware name: IBM 2964 N96 704 (z/VM) task: f7e55d00 task.stack: 00023124c000 Krnl PSW : 0704e0018000 0033539e (__d_lookup+0x6e/0x168) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 002f 03e0806248f8 82313080 00d2abe0 00023124fc00 00023124fbfc 003b08a8 0003dafac220 82313080 00023124fdc8 00026248fcce fffe 0094b420 00023124faf0 00023124faa0 Krnl Code: 00335392: b9040082lgr %r8,%r2 00335396: a798lhi %r9,0 #0033539a: a7f40008brc 15,3353aa >0033539e: e3c0c004lg %r12,0(%r12) 003353a4: ecc8001d007ccgij%r12,0,8,3353de 003353aa: 59b0c01cc %r11,28(%r12) 003353ae: a774fff8brc 7,33539e 003353b2: 4120c050la %r2,80(%r12) Call Trace: ([<82313080>] 0x82313080) [<00327026>] lookup_fast+0x19e/0x340 [<00327530>] walk_component+0x48/0x358 [<00327980>] link_path_walk+0x140/0x508 [<00328806>] path_openat+0xae/0x1320 [<0032af26>] do_filp_open+0x86/0x108 [<00315b5c>] do_sys_open+0x174/0x250 [<00922d5c>] system_call+0xc4/0x264 Last Breaking-Event-Address: [<003353ae>] __d_lookup+0x7e/0x168 Kernel panic - not syncing: Fatal exception: panic_on_oops Looking at the relevant part of __d_lookup: struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name) { unsigned int hash = name->hash; struct hlist_bl_head *b = d_hash(hash); <--- points to corrupted entry struct hlist_bl_node *node; struct dentry *found = NULL; struct dentry *dentry; rcu_read_lock(); hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) { if (dentry->d_name.hash != hash) continue; ... The contents of *b within the dump is: > struct hlist_bl_head 03e0806248f8 struct hlist_bl_head { first = 0x } Note that 0x03e0806248f8 is a valid address within the dentry_hashtable. In addition all other entries look ok, as far as I can tell. This is the only entry that contains a -1UL value. We also have a second dump with a similar crash with a 4.9 kernel. In that case there are in total three entries spread within the dentry_hashtable with a -1UL value, while all other entries seem to look ok. So there seems to be a pattern. Note: these kernels do contain addon patches that are not mainline, but I don't believe that any of those can explain these corruptions.
[BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable
Hello Al, Gustavo reported the crash below within __d_lookup() on s390. I'm wondering if you can make any sense of it: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: f000 TEID: f803 Fault in home space mode while using kernel ASCE. AS:00ec8007 R3:0003dc5f4007 S:0020 Oops: 0038 ilc:3 [#1] SMP Modules linked in: dm_mirror dm_region_hash dm_log raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_service_time paes_s390 pkey zcrypt rng_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter aes_s390 des_s390 des_generic sha512_s390 sha256_s390 vmur sha1_s390 sha_common vhost_net nfsd tun vhost macvtap auth_rpcgss macvlan nfs_acl lockd kvm sch_fq_codel grace sunrpc dm_multipath dm_mod ip_tables x_tables autofs4 CPU: 1 PID: 31708 Comm: fio Tainted: GW 4.10.0-20170228.0.1e014e7.d661408.fc24.s390xperformance #1 Hardware name: IBM 2964 N96 704 (z/VM) task: f7e55d00 task.stack: 00023124c000 Krnl PSW : 0704e0018000 0033539e (__d_lookup+0x6e/0x168) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 002f 03e0806248f8 82313080 00d2abe0 00023124fc00 00023124fbfc 003b08a8 0003dafac220 82313080 00023124fdc8 00026248fcce fffe 0094b420 00023124faf0 00023124faa0 Krnl Code: 00335392: b9040082lgr %r8,%r2 00335396: a798lhi %r9,0 #0033539a: a7f40008brc 15,3353aa >0033539e: e3c0c004lg %r12,0(%r12) 003353a4: ecc8001d007ccgij%r12,0,8,3353de 003353aa: 59b0c01cc %r11,28(%r12) 003353ae: a774fff8brc 7,33539e 003353b2: 4120c050la %r2,80(%r12) Call Trace: ([<82313080>] 0x82313080) [<00327026>] lookup_fast+0x19e/0x340 [<00327530>] walk_component+0x48/0x358 [<00327980>] link_path_walk+0x140/0x508 [<00328806>] path_openat+0xae/0x1320 [<0032af26>] do_filp_open+0x86/0x108 [<00315b5c>] do_sys_open+0x174/0x250 [<00922d5c>] system_call+0xc4/0x264 Last Breaking-Event-Address: [<003353ae>] __d_lookup+0x7e/0x168 Kernel panic - not syncing: Fatal exception: panic_on_oops Looking at the relevant part of __d_lookup: struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name) { unsigned int hash = name->hash; struct hlist_bl_head *b = d_hash(hash); <--- points to corrupted entry struct hlist_bl_node *node; struct dentry *found = NULL; struct dentry *dentry; rcu_read_lock(); hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) { if (dentry->d_name.hash != hash) continue; ... The contents of *b within the dump is: > struct hlist_bl_head 03e0806248f8 struct hlist_bl_head { first = 0x } Note that 0x03e0806248f8 is a valid address within the dentry_hashtable. In addition all other entries look ok, as far as I can tell. This is the only entry that contains a -1UL value. We also have a second dump with a similar crash with a 4.9 kernel. In that case there are in total three entries spread within the dentry_hashtable with a -1UL value, while all other entries seem to look ok. So there seems to be a pattern. Note: these kernels do contain addon patches that are not mainline, but I don't believe that any of those can explain these corruptions.
Re: [PATCHv2 00/14] set_memory_* functions header refactor
On Wed, Mar 01, 2017 at 04:14:52PM -0800, Laura Abbott wrote: > Hi, > > This is v2 of my proposal to move set_memory_* function prototypes out of > cacheflush.h and into their own header file. This came out of a comment > Russell made while reviewing RODATA test cases > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/480855.html > While the final result of that series was the rodata code was refactored into > its own header file, the set_memory_* APIs are still out of place. > > This version refactored the common set_memory_* functions into an asm-generic > header. s390x added some features so it can't just use the asm-generic header. > I debated how much more to try and shove into the asm-generic version (e.g. > stub prototypes for the ARM nommu case, set_kernel_text_*) but decided to > stick > with just the basics for this version. > > I split out the cacheflush.h -> set_memory.h conversions into separate patches > to hopefully make merging easier. Worst case, the final patch to completely > separate the two can be delayed if there are more problems found. I'd like > for this to eventually go through the -mm tree so I'd like Acks where > appropriate. > > As always, feedback appreciated. For the s390 bits: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Re: [PATCHv2 00/14] set_memory_* functions header refactor
On Wed, Mar 01, 2017 at 04:14:52PM -0800, Laura Abbott wrote: > Hi, > > This is v2 of my proposal to move set_memory_* function prototypes out of > cacheflush.h and into their own header file. This came out of a comment > Russell made while reviewing RODATA test cases > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/480855.html > While the final result of that series was the rodata code was refactored into > its own header file, the set_memory_* APIs are still out of place. > > This version refactored the common set_memory_* functions into an asm-generic > header. s390x added some features so it can't just use the asm-generic header. > I debated how much more to try and shove into the asm-generic version (e.g. > stub prototypes for the ARM nommu case, set_kernel_text_*) but decided to > stick > with just the basics for this version. > > I split out the cacheflush.h -> set_memory.h conversions into separate patches > to hopefully make merging easier. Worst case, the final patch to completely > separate the two can be delayed if there are more problems found. I'd like > for this to eventually go through the -mm tree so I'd like Acks where > appropriate. > > As always, feedback appreciated. For the s390 bits: Acked-by: Heiko Carstens
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
On Wed, Mar 01, 2017 at 07:52:18AM -0800, Dan Williams wrote: > On Wed, Mar 1, 2017 at 4:51 AM, Heiko Carstens > <heiko.carst...@de.ibm.com> wrote: > > Since it is anything but obvious why Dan wrote in changelog of b5d24fda9c3d > > ("mm, devm_memremap_pages: hold device_hotplug lock over > > mem_hotplug_{begin, done}") that write accesses to > > mem_hotplug.active_writer are coordinated via lock_device_hotplug() I'd > > rather propose a new private memory_add_remove_lock which has similar > > semantics like the cpu_add_remove_lock for cpu hotplug (see patch below). > > > > However instead of sprinkling locking/unlocking of that new lock around all > > calls of mem_hotplug_begin() and mem_hotplug_end() simply include locking > > and unlocking into these two functions. > > > > This still allows get_online_mems() and put_online_mems() to work, while at > > the same time preventing mem_hotplug.active_writer corruption. > > > > Any opinions? > > Sorry, yes, I didn't make it clear that I derived that locking > requirement from store_mem_state() and its usage of > lock_device_hotplug_sysfs(). > > That routine is trying very hard not trip the soft-lockup detector. It > seems like that wants to be an interruptible wait. If you look at commit 5e33bc4165f3 ("driver core / ACPI: Avoid device hot remove locking issues") then lock_device_hotplug_sysfs() was introduced to avoid a different subtle deadlock, but it also sleeps uninterruptible, but not for more than 5ms ;) However I'm not sure if the device hotplug lock should also be used to fix an unrelated bug that was introduced with the get_online_mems() / put_online_mems() interface. Should it? If so, we need to sprinkle around a couple of lock_device_hotplug() calls near mem_hotplug_begin() calls, like Sebastian already started, and give it additional semantics (protecting mem_hotplug.active_writer), and hope it doesn't lead to deadlocks anywhere.
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
On Wed, Mar 01, 2017 at 07:52:18AM -0800, Dan Williams wrote: > On Wed, Mar 1, 2017 at 4:51 AM, Heiko Carstens > wrote: > > Since it is anything but obvious why Dan wrote in changelog of b5d24fda9c3d > > ("mm, devm_memremap_pages: hold device_hotplug lock over > > mem_hotplug_{begin, done}") that write accesses to > > mem_hotplug.active_writer are coordinated via lock_device_hotplug() I'd > > rather propose a new private memory_add_remove_lock which has similar > > semantics like the cpu_add_remove_lock for cpu hotplug (see patch below). > > > > However instead of sprinkling locking/unlocking of that new lock around all > > calls of mem_hotplug_begin() and mem_hotplug_end() simply include locking > > and unlocking into these two functions. > > > > This still allows get_online_mems() and put_online_mems() to work, while at > > the same time preventing mem_hotplug.active_writer corruption. > > > > Any opinions? > > Sorry, yes, I didn't make it clear that I derived that locking > requirement from store_mem_state() and its usage of > lock_device_hotplug_sysfs(). > > That routine is trying very hard not trip the soft-lockup detector. It > seems like that wants to be an interruptible wait. If you look at commit 5e33bc4165f3 ("driver core / ACPI: Avoid device hot remove locking issues") then lock_device_hotplug_sysfs() was introduced to avoid a different subtle deadlock, but it also sleeps uninterruptible, but not for more than 5ms ;) However I'm not sure if the device hotplug lock should also be used to fix an unrelated bug that was introduced with the get_online_mems() / put_online_mems() interface. Should it? If so, we need to sprinkle around a couple of lock_device_hotplug() calls near mem_hotplug_begin() calls, like Sebastian already started, and give it additional semantics (protecting mem_hotplug.active_writer), and hope it doesn't lead to deadlocks anywhere.
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
On Tue, Feb 28, 2017 at 12:57:29PM +0100, Heiko Carstens wrote: > On Mon, Feb 27, 2017 at 05:20:31PM +0100, Michal Hocko wrote: > > [CC Rafael] > > > > I've got lost in the acpi indirection (again). I can see > > acpi_device_hotplug calling lock_device_hotplug() but i cannot find a > > path down to add_memory() which might call add_memory_resource. But the > > patch below sounds suspicious to me. Is it possible that this could lead > > to a deadlock. I would suspect that it is the s390 code which needs to > > do the locking. But I would have to double check - it is really easy to > > get lost there. > > To me it rather looks like bfc8c90139eb ("mem-hotplug: implement > get/put_online_mems") introduced quite subtle and probably wrong locking > rules. > > The patch introduced mem_hotplug_begin() in order to have something like > cpu_hotplug_begin() for memory. Note that for cpu hotplug all > cpu_hotplug_begin() calls are serialized by cpu_maps_update_begin(). > > Especially this makes sure that active_writer can only be changed by one > process. (See also Dan's commit which introduced the lock_device_hotplug() > calls: https://marc.info/?l=linux-kernel=148693912419972=2 ) > > If you look at the above commit bfc8c90139eb: there is nothing like > cpu_maps_update_begin() for memory. And therefore it's possible to have > concurrent writers to active_writer. > > It looks like now lock_device_hotplug() is supposed to be the new > cpu_maps_update_begin() for memory. But.. this looks like a mess, unless I > read the code completely wrong ;) [Full quote since I now hopefully use a non-bouncing email address from Vladimir] Since it is anything but obvious why Dan wrote in changelog of b5d24fda9c3d ("mm, devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin, done}") that write accesses to mem_hotplug.active_writer are coordinated via lock_device_hotplug() I'd rather propose a new private memory_add_remove_lock which has similar semantics like the cpu_add_remove_lock for cpu hotplug (see patch below). However instead of sprinkling locking/unlocking of that new lock around all calls of mem_hotplug_begin() and mem_hotplug_end() simply include locking and unlocking into these two functions. This still allows get_online_mems() and put_online_mems() to work, while at the same time preventing mem_hotplug.active_writer corruption. Any opinions? --- kernel/memremap.c | 4 mm/memory_hotplug.c | 6 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index 06123234f118..07e85e5229da 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -247,11 +247,9 @@ static void devm_memremap_pages_release(struct device *dev, void *data) align_start = res->start & ~(SECTION_SIZE - 1); align_size = ALIGN(resource_size(res), SECTION_SIZE); - lock_device_hotplug(); mem_hotplug_begin(); arch_remove_memory(align_start, align_size); mem_hotplug_done(); - unlock_device_hotplug(); untrack_pfn(NULL, PHYS_PFN(align_start), align_size); pgmap_radix_release(res); @@ -364,11 +362,9 @@ void *devm_memremap_pages(struct device *dev, struct resource *res, if (error) goto err_pfn_remap; - lock_device_hotplug(); mem_hotplug_begin(); error = arch_add_memory(nid, align_start, align_size, true); mem_hotplug_done(); - unlock_device_hotplug(); if (error) goto err_add_memory; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1d3ed58f92ab..6ee6e6a17310 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -124,9 +124,12 @@ void put_online_mems(void) } +/* Needed to serialize write accesses to mem_hotplug.active_writer. */ +static DEFINE_MUTEX(memory_add_remove_lock); + void mem_hotplug_begin(void) { - assert_held_device_hotplug(); + mutex_lock(_add_remove_lock); mem_hotplug.active_writer = current; @@ -146,6 +149,7 @@ void mem_hotplug_done(void) mem_hotplug.active_writer = NULL; mutex_unlock(_hotplug.lock); memhp_lock_release(); + mutex_unlock(_add_remove_lock); } /* add this memory to iomem resource */ -- 2.8.4 > > On Sun 26-02-17 12:42:44, Sebastian Ott wrote: > > > With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390: > > > > > > WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 > > > assert_held_device_hotplug+0x4a/0x58 > > > [5.731214] Call Trace: > > > [5.731219] ([<0067b8b0>] assert_held_device_hotplug+0x40/0x58) > > > [5.731225] [<00337914>] mem_hotplug_begin+0x34/0xc8 > > > [5.731231] [<008b897e>] add_memory_resour
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
On Tue, Feb 28, 2017 at 12:57:29PM +0100, Heiko Carstens wrote: > On Mon, Feb 27, 2017 at 05:20:31PM +0100, Michal Hocko wrote: > > [CC Rafael] > > > > I've got lost in the acpi indirection (again). I can see > > acpi_device_hotplug calling lock_device_hotplug() but i cannot find a > > path down to add_memory() which might call add_memory_resource. But the > > patch below sounds suspicious to me. Is it possible that this could lead > > to a deadlock. I would suspect that it is the s390 code which needs to > > do the locking. But I would have to double check - it is really easy to > > get lost there. > > To me it rather looks like bfc8c90139eb ("mem-hotplug: implement > get/put_online_mems") introduced quite subtle and probably wrong locking > rules. > > The patch introduced mem_hotplug_begin() in order to have something like > cpu_hotplug_begin() for memory. Note that for cpu hotplug all > cpu_hotplug_begin() calls are serialized by cpu_maps_update_begin(). > > Especially this makes sure that active_writer can only be changed by one > process. (See also Dan's commit which introduced the lock_device_hotplug() > calls: https://marc.info/?l=linux-kernel=148693912419972=2 ) > > If you look at the above commit bfc8c90139eb: there is nothing like > cpu_maps_update_begin() for memory. And therefore it's possible to have > concurrent writers to active_writer. > > It looks like now lock_device_hotplug() is supposed to be the new > cpu_maps_update_begin() for memory. But.. this looks like a mess, unless I > read the code completely wrong ;) [Full quote since I now hopefully use a non-bouncing email address from Vladimir] Since it is anything but obvious why Dan wrote in changelog of b5d24fda9c3d ("mm, devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin, done}") that write accesses to mem_hotplug.active_writer are coordinated via lock_device_hotplug() I'd rather propose a new private memory_add_remove_lock which has similar semantics like the cpu_add_remove_lock for cpu hotplug (see patch below). However instead of sprinkling locking/unlocking of that new lock around all calls of mem_hotplug_begin() and mem_hotplug_end() simply include locking and unlocking into these two functions. This still allows get_online_mems() and put_online_mems() to work, while at the same time preventing mem_hotplug.active_writer corruption. Any opinions? --- kernel/memremap.c | 4 mm/memory_hotplug.c | 6 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index 06123234f118..07e85e5229da 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -247,11 +247,9 @@ static void devm_memremap_pages_release(struct device *dev, void *data) align_start = res->start & ~(SECTION_SIZE - 1); align_size = ALIGN(resource_size(res), SECTION_SIZE); - lock_device_hotplug(); mem_hotplug_begin(); arch_remove_memory(align_start, align_size); mem_hotplug_done(); - unlock_device_hotplug(); untrack_pfn(NULL, PHYS_PFN(align_start), align_size); pgmap_radix_release(res); @@ -364,11 +362,9 @@ void *devm_memremap_pages(struct device *dev, struct resource *res, if (error) goto err_pfn_remap; - lock_device_hotplug(); mem_hotplug_begin(); error = arch_add_memory(nid, align_start, align_size, true); mem_hotplug_done(); - unlock_device_hotplug(); if (error) goto err_add_memory; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1d3ed58f92ab..6ee6e6a17310 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -124,9 +124,12 @@ void put_online_mems(void) } +/* Needed to serialize write accesses to mem_hotplug.active_writer. */ +static DEFINE_MUTEX(memory_add_remove_lock); + void mem_hotplug_begin(void) { - assert_held_device_hotplug(); + mutex_lock(_add_remove_lock); mem_hotplug.active_writer = current; @@ -146,6 +149,7 @@ void mem_hotplug_done(void) mem_hotplug.active_writer = NULL; mutex_unlock(_hotplug.lock); memhp_lock_release(); + mutex_unlock(_add_remove_lock); } /* add this memory to iomem resource */ -- 2.8.4 > > On Sun 26-02-17 12:42:44, Sebastian Ott wrote: > > > With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390: > > > > > > WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 > > > assert_held_device_hotplug+0x4a/0x58 > > > [5.731214] Call Trace: > > > [5.731219] ([<0067b8b0>] assert_held_device_hotplug+0x40/0x58) > > > [5.731225] [<00337914>] mem_hotplug_begin+0x34/0xc8 > > > [5.731231] [<008b897e>] add_memory_resour
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
On Mon, Feb 27, 2017 at 05:20:31PM +0100, Michal Hocko wrote: > [CC Rafael] > > I've got lost in the acpi indirection (again). I can see > acpi_device_hotplug calling lock_device_hotplug() but i cannot find a > path down to add_memory() which might call add_memory_resource. But the > patch below sounds suspicious to me. Is it possible that this could lead > to a deadlock. I would suspect that it is the s390 code which needs to > do the locking. But I would have to double check - it is really easy to > get lost there. To me it rather looks like bfc8c90139eb ("mem-hotplug: implement get/put_online_mems") introduced quite subtle and probably wrong locking rules. The patch introduced mem_hotplug_begin() in order to have something like cpu_hotplug_begin() for memory. Note that for cpu hotplug all cpu_hotplug_begin() calls are serialized by cpu_maps_update_begin(). Especially this makes sure that active_writer can only be changed by one process. (See also Dan's commit which introduced the lock_device_hotplug() calls: https://marc.info/?l=linux-kernel=148693912419972=2 ) If you look at the above commit bfc8c90139eb: there is nothing like cpu_maps_update_begin() for memory. And therefore it's possible to have concurrent writers to active_writer. It looks like now lock_device_hotplug() is supposed to be the new cpu_maps_update_begin() for memory. But.. this looks like a mess, unless I read the code completely wrong ;) > On Sun 26-02-17 12:42:44, Sebastian Ott wrote: > > With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390: > > > > WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 > > assert_held_device_hotplug+0x4a/0x58 > > [5.731214] Call Trace: > > [5.731219] ([<0067b8b0>] assert_held_device_hotplug+0x40/0x58) > > [5.731225] [<00337914>] mem_hotplug_begin+0x34/0xc8 > > [5.731231] [<008b897e>] add_memory_resource+0x7e/0x1f8 > > [5.731236] [<008b8bd2>] add_memory+0xda/0x130 > > [5.731243] [<00d7f0dc>] add_memory_merged+0x15c/0x178 > > [5.731247] [<00d7f3a6>] sclp_detect_standby_memory+0x2ae/0x2f8 > > [5.731252] [<001002ba>] do_one_initcall+0xa2/0x150 > > [5.731258] [<00d3adc0>] kernel_init_freeable+0x228/0x2d8 > > [5.731263] [<008b6572>] kernel_init+0x2a/0x140 > > [5.731267] [<008c3972>] kernel_thread_starter+0x6/0xc > > [5.731272] [<008c396c>] kernel_thread_starter+0x0/0xc > > [5.731276] no locks held by swapper/0/1. > > [5.731280] Last Breaking-Event-Address: > > [5.731285] [<0067b8b6>] assert_held_device_hotplug+0x46/0x58 > > [5.731292] ---[ end trace 46480df21194c96a ]--- > > such an informtion belongs to the changelog > > > ->8 > > mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, > > done} > > > > With commit 3fc219241 ("mm: validate device_hotplug is held for memory > > hotplug") > > a lock assertion was added to mem_hotplug_begin() which led to a warning > > when add_memory() is called. Fix this by acquiring device_hotplug_lock in > > add_memory_resource(). > > > > Signed-off-by: Sebastian Ott> > --- > > mm/memory_hotplug.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 1d3ed58..c633bbc 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1361,6 +1361,7 @@ int __ref add_memory_resource(int nid, struct > > resource *res, bool online) > > new_pgdat = !p; > > } > > > > + lock_device_hotplug(); > > mem_hotplug_begin(); > > > > /* > > @@ -1416,6 +1417,7 @@ int __ref add_memory_resource(int nid, struct > > resource *res, bool online) > > > > out: > > mem_hotplug_done(); > > + unlock_device_hotplug(); > > return ret; > > } > > EXPORT_SYMBOL_GPL(add_memory_resource); > > -- > > 2.3.0 > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majord...@kvack.org. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- > Michal Hocko > SUSE Labs >
Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}
On Mon, Feb 27, 2017 at 05:20:31PM +0100, Michal Hocko wrote: > [CC Rafael] > > I've got lost in the acpi indirection (again). I can see > acpi_device_hotplug calling lock_device_hotplug() but i cannot find a > path down to add_memory() which might call add_memory_resource. But the > patch below sounds suspicious to me. Is it possible that this could lead > to a deadlock. I would suspect that it is the s390 code which needs to > do the locking. But I would have to double check - it is really easy to > get lost there. To me it rather looks like bfc8c90139eb ("mem-hotplug: implement get/put_online_mems") introduced quite subtle and probably wrong locking rules. The patch introduced mem_hotplug_begin() in order to have something like cpu_hotplug_begin() for memory. Note that for cpu hotplug all cpu_hotplug_begin() calls are serialized by cpu_maps_update_begin(). Especially this makes sure that active_writer can only be changed by one process. (See also Dan's commit which introduced the lock_device_hotplug() calls: https://marc.info/?l=linux-kernel=148693912419972=2 ) If you look at the above commit bfc8c90139eb: there is nothing like cpu_maps_update_begin() for memory. And therefore it's possible to have concurrent writers to active_writer. It looks like now lock_device_hotplug() is supposed to be the new cpu_maps_update_begin() for memory. But.. this looks like a mess, unless I read the code completely wrong ;) > On Sun 26-02-17 12:42:44, Sebastian Ott wrote: > > With 4.10.0-10265-gc4f3f22 the following warning is triggered on s390: > > > > WARNING: CPU: 6 PID: 1 at drivers/base/core.c:643 > > assert_held_device_hotplug+0x4a/0x58 > > [5.731214] Call Trace: > > [5.731219] ([<0067b8b0>] assert_held_device_hotplug+0x40/0x58) > > [5.731225] [<00337914>] mem_hotplug_begin+0x34/0xc8 > > [5.731231] [<008b897e>] add_memory_resource+0x7e/0x1f8 > > [5.731236] [<008b8bd2>] add_memory+0xda/0x130 > > [5.731243] [<00d7f0dc>] add_memory_merged+0x15c/0x178 > > [5.731247] [<00d7f3a6>] sclp_detect_standby_memory+0x2ae/0x2f8 > > [5.731252] [<001002ba>] do_one_initcall+0xa2/0x150 > > [5.731258] [<00d3adc0>] kernel_init_freeable+0x228/0x2d8 > > [5.731263] [<008b6572>] kernel_init+0x2a/0x140 > > [5.731267] [<008c3972>] kernel_thread_starter+0x6/0xc > > [5.731272] [<008c396c>] kernel_thread_starter+0x0/0xc > > [5.731276] no locks held by swapper/0/1. > > [5.731280] Last Breaking-Event-Address: > > [5.731285] [<0067b8b6>] assert_held_device_hotplug+0x46/0x58 > > [5.731292] ---[ end trace 46480df21194c96a ]--- > > such an informtion belongs to the changelog > > > ->8 > > mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, > > done} > > > > With commit 3fc219241 ("mm: validate device_hotplug is held for memory > > hotplug") > > a lock assertion was added to mem_hotplug_begin() which led to a warning > > when add_memory() is called. Fix this by acquiring device_hotplug_lock in > > add_memory_resource(). > > > > Signed-off-by: Sebastian Ott > > --- > > mm/memory_hotplug.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 1d3ed58..c633bbc 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1361,6 +1361,7 @@ int __ref add_memory_resource(int nid, struct > > resource *res, bool online) > > new_pgdat = !p; > > } > > > > + lock_device_hotplug(); > > mem_hotplug_begin(); > > > > /* > > @@ -1416,6 +1417,7 @@ int __ref add_memory_resource(int nid, struct > > resource *res, bool online) > > > > out: > > mem_hotplug_done(); > > + unlock_device_hotplug(); > > return ret; > > } > > EXPORT_SYMBOL_GPL(add_memory_resource); > > -- > > 2.3.0 > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majord...@kvack.org. For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: mailto:"d...@kvack.org;> em...@kvack.org > > -- > Michal Hocko > SUSE Labs >
Re: [RFC PATCH] mm, hotplug: get rid of auto_online_blocks
On Mon, Feb 27, 2017 at 04:43:04PM +0100, Michal Hocko wrote: > On Mon 27-02-17 12:25:10, Heiko Carstens wrote: > > On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote: > > > A couple of other thoughts: > > > 1) Having all newly added memory online ASAP is probably what people > > > want for all virtual machines. > > > > This is not true for s390. On s390 we have "standby" memory that a guest > > sees and potentially may use if it sets it online. Every guest that sets > > memory offline contributes to the hypervisor's standby memory pool, while > > onlining standby memory takes memory away from the standby pool. > > > > The use-case is that a system administrator in advance knows the maximum > > size a guest will ever have and also defines how much memory should be used > > at boot time. The difference is standby memory. > > > > Auto-onlining of standby memory is the last thing we want. > > > > > Unfortunately, we have additional complexity with memory zones > > > (ZONE_NORMAL, ZONE_MOVABLE) and in some cases manual intervention is > > > required. Especially, when further unplug is expected. > > > > This also is a reason why auto-onlining doesn't seem be the best way. > > Can you imagine any situation when somebody actually might want to have > this knob enabled? From what I understand it doesn't seem to be the > case. I can only speak for s390, and at least here I think auto-online is always wrong, especially if you consider the added complexity that you may want to online memory sometimes to ZONE_NORMAL and sometimes to ZONE_MOVABLE.
Re: [RFC PATCH] mm, hotplug: get rid of auto_online_blocks
On Mon, Feb 27, 2017 at 04:43:04PM +0100, Michal Hocko wrote: > On Mon 27-02-17 12:25:10, Heiko Carstens wrote: > > On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote: > > > A couple of other thoughts: > > > 1) Having all newly added memory online ASAP is probably what people > > > want for all virtual machines. > > > > This is not true for s390. On s390 we have "standby" memory that a guest > > sees and potentially may use if it sets it online. Every guest that sets > > memory offline contributes to the hypervisor's standby memory pool, while > > onlining standby memory takes memory away from the standby pool. > > > > The use-case is that a system administrator in advance knows the maximum > > size a guest will ever have and also defines how much memory should be used > > at boot time. The difference is standby memory. > > > > Auto-onlining of standby memory is the last thing we want. > > > > > Unfortunately, we have additional complexity with memory zones > > > (ZONE_NORMAL, ZONE_MOVABLE) and in some cases manual intervention is > > > required. Especially, when further unplug is expected. > > > > This also is a reason why auto-onlining doesn't seem be the best way. > > Can you imagine any situation when somebody actually might want to have > this knob enabled? From what I understand it doesn't seem to be the > case. I can only speak for s390, and at least here I think auto-online is always wrong, especially if you consider the added complexity that you may want to online memory sometimes to ZONE_NORMAL and sometimes to ZONE_MOVABLE.
Re: [RFC PATCH] mm, hotplug: get rid of auto_online_blocks
On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote: > A couple of other thoughts: > 1) Having all newly added memory online ASAP is probably what people > want for all virtual machines. This is not true for s390. On s390 we have "standby" memory that a guest sees and potentially may use if it sets it online. Every guest that sets memory offline contributes to the hypervisor's standby memory pool, while onlining standby memory takes memory away from the standby pool. The use-case is that a system administrator in advance knows the maximum size a guest will ever have and also defines how much memory should be used at boot time. The difference is standby memory. Auto-onlining of standby memory is the last thing we want. > Unfortunately, we have additional complexity with memory zones > (ZONE_NORMAL, ZONE_MOVABLE) and in some cases manual intervention is > required. Especially, when further unplug is expected. This also is a reason why auto-onlining doesn't seem be the best way.
Re: [RFC PATCH] mm, hotplug: get rid of auto_online_blocks
On Mon, Feb 27, 2017 at 11:02:09AM +0100, Vitaly Kuznetsov wrote: > A couple of other thoughts: > 1) Having all newly added memory online ASAP is probably what people > want for all virtual machines. This is not true for s390. On s390 we have "standby" memory that a guest sees and potentially may use if it sets it online. Every guest that sets memory offline contributes to the hypervisor's standby memory pool, while onlining standby memory takes memory away from the standby pool. The use-case is that a system administrator in advance knows the maximum size a guest will ever have and also defines how much memory should be used at boot time. The difference is standby memory. Auto-onlining of standby memory is the last thing we want. > Unfortunately, we have additional complexity with memory zones > (ZONE_NORMAL, ZONE_MOVABLE) and in some cases manual intervention is > required. Especially, when further unplug is expected. This also is a reason why auto-onlining doesn't seem be the best way.
Re: [PATCH 1/4] s390: convert debug_info.ref_count from atomic_t to refcount_t
On Mon, Feb 20, 2017 at 02:24:24PM +0100, Heiko Carstens wrote: > > @@ -361,7 +361,7 @@ debug_info_create(const char *name, int pages_per_area, > > int nr_areas, > > debug_area_last = rc; > > rc->next = NULL; > > > > - debug_info_get(rc); > > + refcount_set(>ref_count, 1); > > This is not wrong, but I will remove this hunk before applying your patch, > since this doesn't look like an obvious correct change at first glance. Actually your version is needed - just looked at refcount_inc(). Sorry for the confusion in my side now.
Re: [PATCH 1/4] s390: convert debug_info.ref_count from atomic_t to refcount_t
On Mon, Feb 20, 2017 at 02:24:24PM +0100, Heiko Carstens wrote: > > @@ -361,7 +361,7 @@ debug_info_create(const char *name, int pages_per_area, > > int nr_areas, > > debug_area_last = rc; > > rc->next = NULL; > > > > - debug_info_get(rc); > > + refcount_set(>ref_count, 1); > > This is not wrong, but I will remove this hunk before applying your patch, > since this doesn't look like an obvious correct change at first glance. Actually your version is needed - just looked at refcount_inc(). Sorry for the confusion in my side now.
Re: [PATCH 1/4] s390: convert debug_info.ref_count from atomic_t to refcount_t
On Mon, Feb 20, 2017 at 01:06:18PM +0200, Elena Reshetova wrote: > refcount_t type and corresponding API should be > used instead of atomic_t when the variable is used as > a reference counter. This allows to avoid accidental > refcounter overflows that might lead to use-after-free > situations. > > Signed-off-by: Elena Reshetova> Signed-off-by: Hans Liljestrand > Signed-off-by: Kees Cook > Signed-off-by: David Windsor > --- > arch/s390/include/asm/debug.h | 3 ++- > arch/s390/kernel/debug.c | 8 > 2 files changed, 6 insertions(+), 5 deletions(-) I can only see a pull request from Ingo a couple of hours ago for Peter's refcount code. So the refcount code is not merged yet. It would have been good if you would have waited until it is really merged to avoid confusion. > @@ -361,7 +361,7 @@ debug_info_create(const char *name, int pages_per_area, > int nr_areas, > debug_area_last = rc; > rc->next = NULL; > > - debug_info_get(rc); > + refcount_set(>ref_count, 1); This is not wrong, but I will remove this hunk before applying your patch, since this doesn't look like an obvious correct change at first glance. Thanks, Heiko
Re: [PATCH 1/4] s390: convert debug_info.ref_count from atomic_t to refcount_t
On Mon, Feb 20, 2017 at 01:06:18PM +0200, Elena Reshetova wrote: > refcount_t type and corresponding API should be > used instead of atomic_t when the variable is used as > a reference counter. This allows to avoid accidental > refcounter overflows that might lead to use-after-free > situations. > > Signed-off-by: Elena Reshetova > Signed-off-by: Hans Liljestrand > Signed-off-by: Kees Cook > Signed-off-by: David Windsor > --- > arch/s390/include/asm/debug.h | 3 ++- > arch/s390/kernel/debug.c | 8 > 2 files changed, 6 insertions(+), 5 deletions(-) I can only see a pull request from Ingo a couple of hours ago for Peter's refcount code. So the refcount code is not merged yet. It would have been good if you would have waited until it is really merged to avoid confusion. > @@ -361,7 +361,7 @@ debug_info_create(const char *name, int pages_per_area, > int nr_areas, > debug_area_last = rc; > rc->next = NULL; > > - debug_info_get(rc); > + refcount_set(>ref_count, 1); This is not wrong, but I will remove this hunk before applying your patch, since this doesn't look like an obvious correct change at first glance. Thanks, Heiko
Re: [PATCH 0/3] s390: audit and remove needless module.h includes
On Thu, Feb 09, 2017 at 03:20:22PM -0500, Paul Gortmaker wrote: > Paul Gortmaker (3): > s390: kernel: Audit and remove any unnecessary uses of module.h > s390: mm: Audit and remove any unnecessary uses of module.h > s390: Audit and remove any remaining unnecessary uses of module.h Applied, thanks!
Re: [PATCH 0/3] s390: audit and remove needless module.h includes
On Thu, Feb 09, 2017 at 03:20:22PM -0500, Paul Gortmaker wrote: > Paul Gortmaker (3): > s390: kernel: Audit and remove any unnecessary uses of module.h > s390: mm: Audit and remove any unnecessary uses of module.h > s390: Audit and remove any remaining unnecessary uses of module.h Applied, thanks!
Re: [RFC][PATCH] treewide: Move set_memory_* functions away from cacheflush.h
On Tue, Feb 07, 2017 at 01:10:16PM -0800, Laura Abbott wrote: > The set_memory_* APIs came out of a desire to have a better way to > change memory attributes. Many of these attributes were linked to cache > functionality so the prototypes were put in cacheflush.h. These days, > the APIs have grown and have a much wider use than just cache APIs. To > support this growth, split off set_memory_* and friends into a separate > header file to avoid growing cacheflush.h for APIs that have nothing to > do with caches. > > Signed-off-by: Laura Abbott <labb...@redhat.com> > --- > This came out of a comment Russell made while reviewing RODATA test cases > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/480855.html > While the final result of that series was the rodata code was refactored into > its own header file, the set_memory_* APIs are still out of place. > > This is a simple attempt at moving all the API stubs to their own file. > Another idea I had was throwing set_memory_{x,nx,ro,rw} in an asm-generic > file since those are commonly used for module setting across all arches. > > This is an RFC to see if this is actually beneficial. The diffstat is not > negative unfortunately due to header guards in newer files. > I have patches to convert call sites over to set_memory.h instead of > cacheflush.h if there is sufficient interest. > --- > arch/arm/include/asm/cacheflush.h | 21 +--- > arch/arm/include/asm/set_memory.h | 32 > arch/arm64/include/asm/cacheflush.h | 6 +-- > arch/arm64/include/asm/set_memory.h | 26 ++ > arch/s390/include/asm/cacheflush.h | 6 +-- > arch/s390/include/asm/set_memory.h | 9 > arch/x86/include/asm/cacheflush.h | 96 +- > arch/x86/include/asm/set_memory.h | 100 > > 8 files changed, 171 insertions(+), 125 deletions(-) > create mode 100644 arch/arm/include/asm/set_memory.h > create mode 100644 arch/arm64/include/asm/set_memory.h > create mode 100644 arch/s390/include/asm/set_memory.h > create mode 100644 arch/x86/include/asm/set_memory.h For s390: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Re: [RFC][PATCH] treewide: Move set_memory_* functions away from cacheflush.h
On Tue, Feb 07, 2017 at 01:10:16PM -0800, Laura Abbott wrote: > The set_memory_* APIs came out of a desire to have a better way to > change memory attributes. Many of these attributes were linked to cache > functionality so the prototypes were put in cacheflush.h. These days, > the APIs have grown and have a much wider use than just cache APIs. To > support this growth, split off set_memory_* and friends into a separate > header file to avoid growing cacheflush.h for APIs that have nothing to > do with caches. > > Signed-off-by: Laura Abbott > --- > This came out of a comment Russell made while reviewing RODATA test cases > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/480855.html > While the final result of that series was the rodata code was refactored into > its own header file, the set_memory_* APIs are still out of place. > > This is a simple attempt at moving all the API stubs to their own file. > Another idea I had was throwing set_memory_{x,nx,ro,rw} in an asm-generic > file since those are commonly used for module setting across all arches. > > This is an RFC to see if this is actually beneficial. The diffstat is not > negative unfortunately due to header guards in newer files. > I have patches to convert call sites over to set_memory.h instead of > cacheflush.h if there is sufficient interest. > --- > arch/arm/include/asm/cacheflush.h | 21 +--- > arch/arm/include/asm/set_memory.h | 32 > arch/arm64/include/asm/cacheflush.h | 6 +-- > arch/arm64/include/asm/set_memory.h | 26 ++ > arch/s390/include/asm/cacheflush.h | 6 +-- > arch/s390/include/asm/set_memory.h | 9 > arch/x86/include/asm/cacheflush.h | 96 +- > arch/x86/include/asm/set_memory.h | 100 > > 8 files changed, 171 insertions(+), 125 deletions(-) > create mode 100644 arch/arm/include/asm/set_memory.h > create mode 100644 arch/arm64/include/asm/set_memory.h > create mode 100644 arch/s390/include/asm/set_memory.h > create mode 100644 arch/x86/include/asm/set_memory.h For s390: Acked-by: Heiko Carstens
Re: [PATCHv3 1/2] arch: Move CONFIG_DEBUG_RODATA and CONFIG_SET_MODULE_RONX to be common
On Mon, Feb 06, 2017 at 04:31:57PM -0800, Laura Abbott wrote: > > There are multiple architectures that support CONFIG_DEBUG_RODATA and > CONFIG_SET_MODULE_RONX. These options also now have the ability to be > turned off at runtime. Move these to an architecture independent > location and make these options def_bool y for almost all of those > arches. > > Signed-off-by: Laura Abbott <labb...@redhat.com> > --- > v3: Make these configs selectable for arm. Include some documentation about > how the setup of the optional Kconfigs work as well. Stop spelling 'kenrel' > in prompt text. > --- > Documentation/security/self-protection.txt | 6 ++ > arch/Kconfig | 34 > ++ > arch/arm/Kconfig | 4 > arch/arm/Kconfig.debug | 11 -- > arch/arm/mm/Kconfig| 12 --- > arch/arm64/Kconfig | 5 ++--- > arch/arm64/Kconfig.debug | 11 -- > arch/parisc/Kconfig| 1 + > arch/parisc/Kconfig.debug | 11 -- > arch/s390/Kconfig | 5 ++--- > arch/s390/Kconfig.debug| 3 --- > arch/x86/Kconfig | 5 ++--- > arch/x86/Kconfig.debug | 11 -- > 13 files changed, 51 insertions(+), 68 deletions(-) For the s390 bits: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Re: [PATCHv3 1/2] arch: Move CONFIG_DEBUG_RODATA and CONFIG_SET_MODULE_RONX to be common
On Mon, Feb 06, 2017 at 04:31:57PM -0800, Laura Abbott wrote: > > There are multiple architectures that support CONFIG_DEBUG_RODATA and > CONFIG_SET_MODULE_RONX. These options also now have the ability to be > turned off at runtime. Move these to an architecture independent > location and make these options def_bool y for almost all of those > arches. > > Signed-off-by: Laura Abbott > --- > v3: Make these configs selectable for arm. Include some documentation about > how the setup of the optional Kconfigs work as well. Stop spelling 'kenrel' > in prompt text. > --- > Documentation/security/self-protection.txt | 6 ++ > arch/Kconfig | 34 > ++ > arch/arm/Kconfig | 4 > arch/arm/Kconfig.debug | 11 -- > arch/arm/mm/Kconfig| 12 --- > arch/arm64/Kconfig | 5 ++--- > arch/arm64/Kconfig.debug | 11 -- > arch/parisc/Kconfig| 1 + > arch/parisc/Kconfig.debug | 11 -- > arch/s390/Kconfig | 5 ++--- > arch/s390/Kconfig.debug| 3 --- > arch/x86/Kconfig | 5 ++--- > arch/x86/Kconfig.debug | 11 -- > 13 files changed, 51 insertions(+), 68 deletions(-) For the s390 bits: Acked-by: Heiko Carstens
[PATCH 1/3] memblock: let memblock_type_name know about physmem type
Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. memblock_type_name() should return "physmem" instead of "unknown" if the name of the physmem memblock_type is being asked for. Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- mm/memblock.c | 4 1 file changed, 4 insertions(+) diff --git a/mm/memblock.c b/mm/memblock.c index 7608bc305936..acbfa1d2 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -72,6 +72,10 @@ memblock_type_name(struct memblock_type *type) return "memory"; else if (type == ) return "reserved"; +#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP + else if (type == ) + return "physmem"; +#endif else return "unknown"; } -- 2.8.4
[PATCH 1/3] memblock: let memblock_type_name know about physmem type
Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. memblock_type_name() should return "physmem" instead of "unknown" if the name of the physmem memblock_type is being asked for. Signed-off-by: Heiko Carstens --- mm/memblock.c | 4 1 file changed, 4 insertions(+) diff --git a/mm/memblock.c b/mm/memblock.c index 7608bc305936..acbfa1d2 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -72,6 +72,10 @@ memblock_type_name(struct memblock_type *type) return "memory"; else if (type == ) return "reserved"; +#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP + else if (type == ) + return "physmem"; +#endif else return "unknown"; } -- 2.8.4
[PATCH 3/3] memblock: embed memblock type name within struct memblock_type
Provide the name of each memblock type with struct memblock_type. This allows to get rid of the function memblock_type_name() and duplicating the type names in __memblock_dump_all(). The only memblock_type usage out of mm/memblock.c seems to be arch/s390/kernel/crash_dump.c. While at it, give it a name. Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- arch/s390/kernel/crash_dump.c | 1 + include/linux/memblock.h | 1 + mm/memblock.c | 35 +++ 3 files changed, 13 insertions(+), 24 deletions(-) diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index f9293bfefb7f..9c9440dc253a 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -31,6 +31,7 @@ static struct memblock_type oldmem_type = { .max = 1, .total_size = 0, .regions = _region, + .name = "oldmem", }; struct save_area { diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 5b759c9acf97..8dee5ec80adf 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -42,6 +42,7 @@ struct memblock_type { unsigned long max; /* size of the allocated array */ phys_addr_t total_size; /* size of all regions */ struct memblock_region *regions; + char *name; }; struct memblock { diff --git a/mm/memblock.c b/mm/memblock.c index fbaaf713827c..82d21e598e8c 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -35,15 +35,18 @@ struct memblock memblock __initdata_memblock = { .memory.regions = memblock_memory_init_regions, .memory.cnt = 1,/* empty dummy entry */ .memory.max = INIT_MEMBLOCK_REGIONS, + .memory.name= "memory", .reserved.regions = memblock_reserved_init_regions, .reserved.cnt = 1,/* empty dummy entry */ .reserved.max = INIT_MEMBLOCK_REGIONS, + .reserved.name = "reserved", #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP .physmem.regions= memblock_physmem_init_regions, .physmem.cnt= 1,/* empty dummy entry */ .physmem.max= INIT_PHYSMEM_REGIONS, + .physmem.name = "physmem", #endif .bottom_up = false, @@ -64,22 +67,6 @@ ulong __init_memblock choose_memblock_flags(void) return system_has_some_mirror ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; } -/* inline so we don't get a warning when pr_debug is compiled out */ -static __init_memblock const char * -memblock_type_name(struct memblock_type *type) -{ - if (type == ) - return "memory"; - else if (type == ) - return "reserved"; -#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP - else if (type == ) - return "physmem"; -#endif - else - return "unknown"; -} - /* adjust *@size so that (@base + *@size) doesn't overflow, return new size */ static inline phys_addr_t memblock_cap_size(phys_addr_t base, phys_addr_t *size) { @@ -406,12 +393,12 @@ static int __init_memblock memblock_double_array(struct memblock_type *type, } if (!addr) { pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n", - memblock_type_name(type), type->max, type->max * 2); + type->name, type->max, type->max * 2); return -1; } memblock_dbg("memblock: %s is doubled to %ld at [%#010llx-%#010llx]", - memblock_type_name(type), type->max * 2, (u64)addr, + type->name, type->max * 2, (u64)addr, (u64)addr + new_size - 1); /* @@ -1675,14 +1662,14 @@ phys_addr_t __init_memblock memblock_get_current_limit(void) return memblock.current_limit; } -static void __init_memblock memblock_dump(struct memblock_type *type, char *name) +static void __init_memblock memblock_dump(struct memblock_type *type) { unsigned long long base, size; unsigned long flags; int idx; struct memblock_region *rgn; - pr_info(" %s.cnt = 0x%lx\n", name, type->cnt); + pr_info(" %s.cnt = 0x%lx\n", type->name, type->cnt); for_each_memblock_type(type, rgn) { char nid_buf[32] = ""; @@ -1696,7 +1683,7 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name memblock_get_region_node(rgn)); #endif pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s flags: %#lx\n", - name, idx, base, base + size - 1, size, nid_buf, flags); + type->name, idx, base, base + size - 1, size, nid_buf, flags);
[PATCH 2/3] memblock: also dump physmem list within __memblock_dump_all
Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. The physical memory list should also be dumped if memblock_dump_all() is called in case memblock_debug is switched on. This makes debugging a bit easier. Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- mm/memblock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memblock.c b/mm/memblock.c index acbfa1d2..fbaaf713827c 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1709,6 +1709,9 @@ void __init_memblock __memblock_dump_all(void) memblock_dump(, "memory"); memblock_dump(, "reserved"); +#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP + memblock_dump(, "physmem"); +#endif } void __init memblock_allow_resize(void) -- 2.8.4
[PATCH 0/3] memblock: physical memory list cleanups
Just a couple of trivial memblock patches, which could also be merged into one patch; whatever is preferred. Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. The memblock code should also print a sane name instead of "unknown" if it calls memblock_type_name() to get a name for the physmem memblock type. In addition the physmem list should also be dumped, if present, and memblock_dump_all is called to improve debugability. The last patch embeds the memblock type name into the memblock type structure in order to hopefully make the code a bit more easier and to get rid of a bit of code duplication. Thanks, Heiko Heiko Carstens (3): memblock: let memblock_type_name know about physmem type memblock: also dump physmem list within __memblock_dump_all memblock: embed memblock type name within struct memblock_type arch/s390/kernel/crash_dump.c | 1 + include/linux/memblock.h | 1 + mm/memblock.c | 32 +--- 3 files changed, 15 insertions(+), 19 deletions(-) -- 2.8.4
[PATCH 2/3] memblock: also dump physmem list within __memblock_dump_all
Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. The physical memory list should also be dumped if memblock_dump_all() is called in case memblock_debug is switched on. This makes debugging a bit easier. Signed-off-by: Heiko Carstens --- mm/memblock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memblock.c b/mm/memblock.c index acbfa1d2..fbaaf713827c 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1709,6 +1709,9 @@ void __init_memblock __memblock_dump_all(void) memblock_dump(, "memory"); memblock_dump(, "reserved"); +#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP + memblock_dump(, "physmem"); +#endif } void __init memblock_allow_resize(void) -- 2.8.4
[PATCH 0/3] memblock: physical memory list cleanups
Just a couple of trivial memblock patches, which could also be merged into one patch; whatever is preferred. Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the memblock structure knows about a physical memory list. The memblock code should also print a sane name instead of "unknown" if it calls memblock_type_name() to get a name for the physmem memblock type. In addition the physmem list should also be dumped, if present, and memblock_dump_all is called to improve debugability. The last patch embeds the memblock type name into the memblock type structure in order to hopefully make the code a bit more easier and to get rid of a bit of code duplication. Thanks, Heiko Heiko Carstens (3): memblock: let memblock_type_name know about physmem type memblock: also dump physmem list within __memblock_dump_all memblock: embed memblock type name within struct memblock_type arch/s390/kernel/crash_dump.c | 1 + include/linux/memblock.h | 1 + mm/memblock.c | 32 +--- 3 files changed, 15 insertions(+), 19 deletions(-) -- 2.8.4
[PATCH 3/3] memblock: embed memblock type name within struct memblock_type
Provide the name of each memblock type with struct memblock_type. This allows to get rid of the function memblock_type_name() and duplicating the type names in __memblock_dump_all(). The only memblock_type usage out of mm/memblock.c seems to be arch/s390/kernel/crash_dump.c. While at it, give it a name. Signed-off-by: Heiko Carstens --- arch/s390/kernel/crash_dump.c | 1 + include/linux/memblock.h | 1 + mm/memblock.c | 35 +++ 3 files changed, 13 insertions(+), 24 deletions(-) diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index f9293bfefb7f..9c9440dc253a 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -31,6 +31,7 @@ static struct memblock_type oldmem_type = { .max = 1, .total_size = 0, .regions = _region, + .name = "oldmem", }; struct save_area { diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 5b759c9acf97..8dee5ec80adf 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -42,6 +42,7 @@ struct memblock_type { unsigned long max; /* size of the allocated array */ phys_addr_t total_size; /* size of all regions */ struct memblock_region *regions; + char *name; }; struct memblock { diff --git a/mm/memblock.c b/mm/memblock.c index fbaaf713827c..82d21e598e8c 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -35,15 +35,18 @@ struct memblock memblock __initdata_memblock = { .memory.regions = memblock_memory_init_regions, .memory.cnt = 1,/* empty dummy entry */ .memory.max = INIT_MEMBLOCK_REGIONS, + .memory.name= "memory", .reserved.regions = memblock_reserved_init_regions, .reserved.cnt = 1,/* empty dummy entry */ .reserved.max = INIT_MEMBLOCK_REGIONS, + .reserved.name = "reserved", #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP .physmem.regions= memblock_physmem_init_regions, .physmem.cnt= 1,/* empty dummy entry */ .physmem.max= INIT_PHYSMEM_REGIONS, + .physmem.name = "physmem", #endif .bottom_up = false, @@ -64,22 +67,6 @@ ulong __init_memblock choose_memblock_flags(void) return system_has_some_mirror ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; } -/* inline so we don't get a warning when pr_debug is compiled out */ -static __init_memblock const char * -memblock_type_name(struct memblock_type *type) -{ - if (type == ) - return "memory"; - else if (type == ) - return "reserved"; -#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP - else if (type == ) - return "physmem"; -#endif - else - return "unknown"; -} - /* adjust *@size so that (@base + *@size) doesn't overflow, return new size */ static inline phys_addr_t memblock_cap_size(phys_addr_t base, phys_addr_t *size) { @@ -406,12 +393,12 @@ static int __init_memblock memblock_double_array(struct memblock_type *type, } if (!addr) { pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n", - memblock_type_name(type), type->max, type->max * 2); + type->name, type->max, type->max * 2); return -1; } memblock_dbg("memblock: %s is doubled to %ld at [%#010llx-%#010llx]", - memblock_type_name(type), type->max * 2, (u64)addr, + type->name, type->max * 2, (u64)addr, (u64)addr + new_size - 1); /* @@ -1675,14 +1662,14 @@ phys_addr_t __init_memblock memblock_get_current_limit(void) return memblock.current_limit; } -static void __init_memblock memblock_dump(struct memblock_type *type, char *name) +static void __init_memblock memblock_dump(struct memblock_type *type) { unsigned long long base, size; unsigned long flags; int idx; struct memblock_region *rgn; - pr_info(" %s.cnt = 0x%lx\n", name, type->cnt); + pr_info(" %s.cnt = 0x%lx\n", type->name, type->cnt); for_each_memblock_type(type, rgn) { char nid_buf[32] = ""; @@ -1696,7 +1683,7 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name memblock_get_region_node(rgn)); #endif pr_info(" %s[%#x]\t[%#016llx-%#016llx], %#llx bytes%s flags: %#lx\n", - name, idx, base, base + size - 1, size, nid_buf, flags); + type->name, idx, base, base + size - 1, size, nid_buf, flags); } } @@ -1707,10 +1694,10 @@
Re: [PATCH 1/2] security: Change name of CONFIG_DEBUG_RODATA
On Thu, Jan 19, 2017 at 10:56:46AM +, Mark Rutland wrote: > > +config HARDENED_PAGE_MAPPINGS > > + bool "Mark kernel mappings with stricter permissions (RO/W^X)" > > + default y > > + depends on ARCH_HAS_HARDENED_MAPPINGS > > + help > > + If this is set, kernel text and rodata memory will be made > > read-only, > > + and non-text memory will be made non-executable. This provides > > + protection against certain security attacks (e.g. executing the heap > > + or modifying text). > > + > > + Unless your system has known restrictions or performance issues, it > > + is recommended to say Y here. > > It's somewhat unfortunate that this means the feature is no longer > mandatory on arm64 (and s390+x86). We have a boot-time switch to turn > the protections off, so I was hoping we could make this mandatory on all > architectures with support. > > It would be good to see if we could make this mandatory for arm and > parisc, or if it really needs to be optional for either of those. Looks like the config option is a no-op on parisc just like it is on s390. Irrelavant of the config option at least on s390 the page tables for kernel text and rodata will be read-only anyway. This works since ages and I don't see a reason why this should be changed. Also trying to disable this with the "rodata=" command line option does not work at least on s390, and I guess this is true for parisc as well. The only thing implemented with CONFIG_DEBUG_RODATA on both architectures is to emit a message that states memory has been protected (mark_rodata_ro). This just avoids a wrong "Kernel memory protection disabled." message. So yes, I'd really like to keep this option mandatory.
Re: [PATCH 1/2] security: Change name of CONFIG_DEBUG_RODATA
On Thu, Jan 19, 2017 at 10:56:46AM +, Mark Rutland wrote: > > +config HARDENED_PAGE_MAPPINGS > > + bool "Mark kernel mappings with stricter permissions (RO/W^X)" > > + default y > > + depends on ARCH_HAS_HARDENED_MAPPINGS > > + help > > + If this is set, kernel text and rodata memory will be made > > read-only, > > + and non-text memory will be made non-executable. This provides > > + protection against certain security attacks (e.g. executing the heap > > + or modifying text). > > + > > + Unless your system has known restrictions or performance issues, it > > + is recommended to say Y here. > > It's somewhat unfortunate that this means the feature is no longer > mandatory on arm64 (and s390+x86). We have a boot-time switch to turn > the protections off, so I was hoping we could make this mandatory on all > architectures with support. > > It would be good to see if we could make this mandatory for arm and > parisc, or if it really needs to be optional for either of those. Looks like the config option is a no-op on parisc just like it is on s390. Irrelavant of the config option at least on s390 the page tables for kernel text and rodata will be read-only anyway. This works since ages and I don't see a reason why this should be changed. Also trying to disable this with the "rodata=" command line option does not work at least on s390, and I guess this is true for parisc as well. The only thing implemented with CONFIG_DEBUG_RODATA on both architectures is to emit a message that states memory has been protected (mark_rodata_ro). This just avoids a wrong "Kernel memory protection disabled." message. So yes, I'd really like to keep this option mandatory.
Re: [PATCH 2/2] security: Change name of CONFIG_DEBUG_SET_MODULE_RONX
On Thu, Jan 19, 2017 at 11:11:18AM +, Mark Rutland wrote: > > +config HARDENED_MODULE_MAPPINGS > > + bool "Mark module mappings with stricter permissions (RO/W^X)" > > + default y > > + depends on ARCH_HAS_HARDENED_MODULE_MAPPINGS > > + help > > + If this is set, module text and rodata memory will be made read-only, > > + and non-text memory will be made non-executable. This provides > > + protection against certain security vulnerabilities (e.g. modifying > > + code) > > + > > + Unless your system has known restrictions or performance issues, it > > + is recommended to say Y here. > > + > > I was hoping that we'd make this mandatory, as we'd already done for > DEBUG_RODATA. Same for s390: would be good to make this mandatory.
Re: [PATCH 2/2] security: Change name of CONFIG_DEBUG_SET_MODULE_RONX
On Thu, Jan 19, 2017 at 11:11:18AM +, Mark Rutland wrote: > > +config HARDENED_MODULE_MAPPINGS > > + bool "Mark module mappings with stricter permissions (RO/W^X)" > > + default y > > + depends on ARCH_HAS_HARDENED_MODULE_MAPPINGS > > + help > > + If this is set, module text and rodata memory will be made read-only, > > + and non-text memory will be made non-executable. This provides > > + protection against certain security vulnerabilities (e.g. modifying > > + code) > > + > > + Unless your system has known restrictions or performance issues, it > > + is recommended to say Y here. > > + > > I was hoping that we'd make this mandatory, as we'd already done for > DEBUG_RODATA. Same for s390: would be good to make this mandatory.
Re: [RFC 1/4] mm: remove unused TASK_SIZE_OF()
On Fri, Dec 30, 2016 at 06:56:31PM +0300, Dmitry Safonov wrote: > All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or > TASK_SIZE_MAX since: > commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for > FSBASE/GSBASE upper limits"), > commit a06db751c321 ("pagemap: check permissions and capabilities at > open time"), > > Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> > --- ... > diff --git a/arch/s390/include/asm/processor.h > b/arch/s390/include/asm/processor.h > index 6bca916a5ba0..c53e8e2a51ac 100644 > --- a/arch/s390/include/asm/processor.h > +++ b/arch/s390/include/asm/processor.h > @@ -89,10 +89,9 @@ extern void execve_tail(void); > * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit. > */ > > -#define TASK_SIZE_OF(tsk)((tsk)->mm->context.asce_limit) > #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \ > (1UL << 30) : (1UL << 41)) > -#define TASK_SIZETASK_SIZE_OF(current) > +#define TASK_SIZE(current->mm->context.asce_limit) > #define TASK_MAX_SIZE(1UL << 53) > > #define STACK_TOP(1UL << (test_thread_flag(TIF_31BIT) ? 31:42)) FWIW, for the s390 part: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Re: [RFC 1/4] mm: remove unused TASK_SIZE_OF()
On Fri, Dec 30, 2016 at 06:56:31PM +0300, Dmitry Safonov wrote: > All users of TASK_SIZE_OF(tsk) have migrated to mm->task_size or > TASK_SIZE_MAX since: > commit d696ca016d57 ("x86/fsgsbase/64: Use TASK_SIZE_MAX for > FSBASE/GSBASE upper limits"), > commit a06db751c321 ("pagemap: check permissions and capabilities at > open time"), > > Signed-off-by: Dmitry Safonov > --- ... > diff --git a/arch/s390/include/asm/processor.h > b/arch/s390/include/asm/processor.h > index 6bca916a5ba0..c53e8e2a51ac 100644 > --- a/arch/s390/include/asm/processor.h > +++ b/arch/s390/include/asm/processor.h > @@ -89,10 +89,9 @@ extern void execve_tail(void); > * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit. > */ > > -#define TASK_SIZE_OF(tsk)((tsk)->mm->context.asce_limit) > #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \ > (1UL << 30) : (1UL << 41)) > -#define TASK_SIZETASK_SIZE_OF(current) > +#define TASK_SIZE(current->mm->context.asce_limit) > #define TASK_MAX_SIZE(1UL << 53) > > #define STACK_TOP(1UL << (test_thread_flag(TIF_31BIT) ? 31:42)) FWIW, for the s390 part: Acked-by: Heiko Carstens
Re: [RFC] minimum gcc version for kernel: raise to gcc-4.3 or 4.6?
On Fri, Dec 16, 2016 at 11:00:27PM +0100, Arnd Bergmann wrote: > On Friday, December 16, 2016 6:00:43 PM CET Sebastian Andrzej Siewior wrote: > > On 2016-12-16 11:56:21 [+0100], Arnd Bergmann wrote: > > > The original gcc-4.3 release was in early 2008. If we decide to still > > > support that, we probably want the first 10 quirks in this series, > > > while gcc-4.6 (released in 2011) requires none of them. > > > > It this min gcc thingy ARM only? > > This is part of the question that I'm trying to figure out myself. > > Clearly having the same minimum version across all architectures simplifies > things a lot, because many of the bugs in old versions are architecture > independent. Then again, some architectures implicitly require a new version > because an old one never existed (e.g. arm64 or risc-v), while some other > architectures may require an old version. FWIW, s390 requires gcc 4.3 or newer since two years already. For older compilers we enforce a compile error (see arch/s390/kernel/asm-offsets.c).
Re: [RFC] minimum gcc version for kernel: raise to gcc-4.3 or 4.6?
On Fri, Dec 16, 2016 at 11:00:27PM +0100, Arnd Bergmann wrote: > On Friday, December 16, 2016 6:00:43 PM CET Sebastian Andrzej Siewior wrote: > > On 2016-12-16 11:56:21 [+0100], Arnd Bergmann wrote: > > > The original gcc-4.3 release was in early 2008. If we decide to still > > > support that, we probably want the first 10 quirks in this series, > > > while gcc-4.6 (released in 2011) requires none of them. > > > > It this min gcc thingy ARM only? > > This is part of the question that I'm trying to figure out myself. > > Clearly having the same minimum version across all architectures simplifies > things a lot, because many of the bugs in old versions are architecture > independent. Then again, some architectures implicitly require a new version > because an old one never existed (e.g. arm64 or risc-v), while some other > architectures may require an old version. FWIW, s390 requires gcc 4.3 or newer since two years already. For older compilers we enforce a compile error (see arch/s390/kernel/asm-offsets.c).
Re: [PATCH 1/2] s390/zcore: Remove unneeded linux/miscdevice.h include
On Thu, Dec 15, 2016 at 03:18:23PM +0100, Corentin Labbe wrote: > drivers/s390/char/zcore.c does not contain any miscdevice so the > inclusion of linux/miscdevice.h is uncessary. > > Signed-off-by: Corentin Labbe> --- > drivers/s390/char/zcore.c | 1 - > 1 file changed, 1 deletion(-) Applied, thanks.
Re: [PATCH 1/2] s390/zcore: Remove unneeded linux/miscdevice.h include
On Thu, Dec 15, 2016 at 03:18:23PM +0100, Corentin Labbe wrote: > drivers/s390/char/zcore.c does not contain any miscdevice so the > inclusion of linux/miscdevice.h is uncessary. > > Signed-off-by: Corentin Labbe > --- > drivers/s390/char/zcore.c | 1 - > 1 file changed, 1 deletion(-) Applied, thanks.
Re: linux-next: Tree for Dec 14
On Wed, Dec 14, 2016 at 11:25:49AM -0500, Paul Gortmaker wrote: > On Tue, Dec 13, 2016 at 11:22 PM, Stephen Rothwell <s...@canb.auug.org.au> > wrote: > > Hi all, > > > > Please do not add any material for v4.11 to your linux-next included > > branches until after v4.10-rc1 has been released. > > > > Changes since 20161213: > > > > The vfs-miklos tree gained a conflict against the ubifs tree. > > > > The akpm-current tree gained a conflict against the jc_docs tree. > > > > Non-merge commits (relative to Linus' tree): 4498 > > 4320 files changed, 189429 insertions(+), 94239 deletions(-) > > New s390 build failure today: > > http://kisskb.ellerman.id.au/kisskb/buildresult/12883922/ > > I reproduced it locally using the older korg s390 toolchain. > > Reverting this and the file compiles fine: > > commit d543a106f96d6f15e4507cf349128912d44356d9 > Author: Heiko Carstens <heiko.carst...@de.ibm.com> > Date: Tue Dec 6 15:52:10 2016 +0100 > > s390: fix initrd corruptions with gcov/kcov instrumented kernels > > Appears old toolchains don't like this commit. > > Paul. Yes, a fix is ready to be merged soon: https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features=75a357341e7c9d3893405ea6f9d722036012dd1f Thanks, Heiko
Re: linux-next: Tree for Dec 14
On Wed, Dec 14, 2016 at 11:25:49AM -0500, Paul Gortmaker wrote: > On Tue, Dec 13, 2016 at 11:22 PM, Stephen Rothwell > wrote: > > Hi all, > > > > Please do not add any material for v4.11 to your linux-next included > > branches until after v4.10-rc1 has been released. > > > > Changes since 20161213: > > > > The vfs-miklos tree gained a conflict against the ubifs tree. > > > > The akpm-current tree gained a conflict against the jc_docs tree. > > > > Non-merge commits (relative to Linus' tree): 4498 > > 4320 files changed, 189429 insertions(+), 94239 deletions(-) > > New s390 build failure today: > > http://kisskb.ellerman.id.au/kisskb/buildresult/12883922/ > > I reproduced it locally using the older korg s390 toolchain. > > Reverting this and the file compiles fine: > > commit d543a106f96d6f15e4507cf349128912d44356d9 > Author: Heiko Carstens > Date: Tue Dec 6 15:52:10 2016 +0100 > > s390: fix initrd corruptions with gcov/kcov instrumented kernels > > Appears old toolchains don't like this commit. > > Paul. Yes, a fix is ready to be merged soon: https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features=75a357341e7c9d3893405ea6f9d722036012dd1f Thanks, Heiko
Re: linux-next: manual merge of the kvms390 tree with the s390 tree
On Wed, Nov 23, 2016 at 09:24:17AM +0100, Christian Borntraeger wrote: > On 11/23/2016 04:45 AM, Stephen Rothwell wrote: > > Hi all, > > > > Today's linux-next merge of the kvms390 tree got a conflict in: > > > > arch/s390/include/asm/facilities_src.h > > > > between commit: > > > > d1f7e8f85b51 ("s390: squash facilities_src.h into gen_facilities.c") > > > > from the s390 tree and commits: > > > > 09ff894457e9 ("KVM: s390: gaccess: add ESOP2 handling") > > 88abf0b54f51 ("KVM: s390: instruction-execution-protection support") > > > > from the kvms390 tree. > > > > I fixed it up (I deleted the file and then added the following merge fix > > patch) and can carry the fix as necessary. This is now fixed as far as > > linux-next is concerned, but any non trivial conflicts should be mentioned > > to your upstream maintainer when your tree is submitted for merging. > > You may also want to consider cooperating with the maintainer of the > > conflicting tree to minimise any particularly complex conflicts. > > Thanks, looks fine. > > Hmmm, Martin, Heiko, KVM is going to touch that file from time to time. > Can we either have > a: a topic branch for this tool (like I did for the 4.9 merge window) > with just the patches for this tool. This topic branch can then be merged > into kvm/next and s390/features. That's up to Martin. > b: split out the kvm defines into arch/s390/tools/kvm_facilities.h > b would like something like this (cut/paste so whitespace damaged) > > From: Christian Borntraeger> Date: Wed, 23 Nov 2016 09:18:42 +0100 > Subject: [PATCH 1/1] s390:gen_facilites: Move kvm facilities into a separate > file > > Signed-off-by: Christian Borntraeger > --- > MAINTAINERS | 1 + > arch/s390/tools/gen_facilities.c | 25 + > arch/s390/tools/kvm_facilities.c | 26 ++ > 3 files changed, 28 insertions(+), 24 deletions(-) > create mode 100644 arch/s390/tools/kvm_facilities.c No, please don't split this file again. Only kvm will ever touch the kvm facilities, so we won't any conflicts here. This time it's only because the file moved, otherwise there wouldn't be any problem at all.
Re: linux-next: manual merge of the kvms390 tree with the s390 tree
On Wed, Nov 23, 2016 at 09:24:17AM +0100, Christian Borntraeger wrote: > On 11/23/2016 04:45 AM, Stephen Rothwell wrote: > > Hi all, > > > > Today's linux-next merge of the kvms390 tree got a conflict in: > > > > arch/s390/include/asm/facilities_src.h > > > > between commit: > > > > d1f7e8f85b51 ("s390: squash facilities_src.h into gen_facilities.c") > > > > from the s390 tree and commits: > > > > 09ff894457e9 ("KVM: s390: gaccess: add ESOP2 handling") > > 88abf0b54f51 ("KVM: s390: instruction-execution-protection support") > > > > from the kvms390 tree. > > > > I fixed it up (I deleted the file and then added the following merge fix > > patch) and can carry the fix as necessary. This is now fixed as far as > > linux-next is concerned, but any non trivial conflicts should be mentioned > > to your upstream maintainer when your tree is submitted for merging. > > You may also want to consider cooperating with the maintainer of the > > conflicting tree to minimise any particularly complex conflicts. > > Thanks, looks fine. > > Hmmm, Martin, Heiko, KVM is going to touch that file from time to time. > Can we either have > a: a topic branch for this tool (like I did for the 4.9 merge window) > with just the patches for this tool. This topic branch can then be merged > into kvm/next and s390/features. That's up to Martin. > b: split out the kvm defines into arch/s390/tools/kvm_facilities.h > b would like something like this (cut/paste so whitespace damaged) > > From: Christian Borntraeger > Date: Wed, 23 Nov 2016 09:18:42 +0100 > Subject: [PATCH 1/1] s390:gen_facilites: Move kvm facilities into a separate > file > > Signed-off-by: Christian Borntraeger > --- > MAINTAINERS | 1 + > arch/s390/tools/gen_facilities.c | 25 + > arch/s390/tools/kvm_facilities.c | 26 ++ > 3 files changed, 28 insertions(+), 24 deletions(-) > create mode 100644 arch/s390/tools/kvm_facilities.c No, please don't split this file again. Only kvm will ever touch the kvm facilities, so we won't any conflicts here. This time it's only because the file moved, otherwise there wouldn't be any problem at all.
Re: [PATCH v2] tile: handle __ro_after_init like parisc does
On Mon, Nov 14, 2016 at 01:12:05PM -0800, Kees Cook wrote: > At some point here, I want to collect all the arch maintainers and > discuss the options for correctly reflecting the three data > memory-protection needs we have: > > - always read-only > - read-only after init > - read-only except during rare updates > > (The latter one doesn't exist all yet...) > > x86, arm, and arm64 use mark_rodata_ro() after init finishes, so they > don't technically implement "always read-only". parisc, tile, powerpc, > others have "always read-only", but disable read-only-after-init since > they don't use mark_rodata_ro(). I think s390 has recently implemented > both, but I have to double-check... Yes, s390 has both: an early always read-only support, which is effective as soon as paging_init() has set up and enabled page tables. Our mark_rodata_ro() implementation only makes the ro_after_init section read-only.
Re: [PATCH v2] tile: handle __ro_after_init like parisc does
On Mon, Nov 14, 2016 at 01:12:05PM -0800, Kees Cook wrote: > At some point here, I want to collect all the arch maintainers and > discuss the options for correctly reflecting the three data > memory-protection needs we have: > > - always read-only > - read-only after init > - read-only except during rare updates > > (The latter one doesn't exist all yet...) > > x86, arm, and arm64 use mark_rodata_ro() after init finishes, so they > don't technically implement "always read-only". parisc, tile, powerpc, > others have "always read-only", but disable read-only-after-init since > they don't use mark_rodata_ro(). I think s390 has recently implemented > both, but I have to double-check... Yes, s390 has both: an early always read-only support, which is effective as soon as paging_init() has set up and enabled page tables. Our mark_rodata_ro() implementation only makes the ro_after_init section read-only.
[PATCH] mm/pkeys: generate pkey system call code only if ARCH_HAS_PKEYS is selected
Having code for the pkey_mprotect, pkey_alloc and pkey_free system calls makes only sense if ARCH_HAS_PKEYS is selected. If not selected these system calls will always return -ENOSPC or -EINVAL. To simplify things and have less code generate the pkey system call code only if ARCH_HAS_PKEYS is selected. For architectures which have already wired up the system calls, but do not select ARCH_HAS_PKEYS this will result in less generated code and a different return code: the three system calls will now always return -ENOSYS, using the cond_syscall mechanism. For architectures which have not wired up the system calls less unreachable code will be generated. Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- mm/mprotect.c | 4 1 file changed, 4 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index 11936526b08b..a06e91c4de29 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -484,6 +484,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, return do_mprotect_pkey(start, len, prot, -1); } +#ifdef CONFIG_ARCH_HAS_PKEYS + SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, unsigned long, prot, int, pkey) { @@ -534,3 +536,5 @@ SYSCALL_DEFINE1(pkey_free, int, pkey) */ return ret; } + +#endif /* CONFIG_ARCH_HAS_PKEYS */ -- 2.8.4
[PATCH] mm/pkeys: generate pkey system call code only if ARCH_HAS_PKEYS is selected
Having code for the pkey_mprotect, pkey_alloc and pkey_free system calls makes only sense if ARCH_HAS_PKEYS is selected. If not selected these system calls will always return -ENOSPC or -EINVAL. To simplify things and have less code generate the pkey system call code only if ARCH_HAS_PKEYS is selected. For architectures which have already wired up the system calls, but do not select ARCH_HAS_PKEYS this will result in less generated code and a different return code: the three system calls will now always return -ENOSYS, using the cond_syscall mechanism. For architectures which have not wired up the system calls less unreachable code will be generated. Signed-off-by: Heiko Carstens --- mm/mprotect.c | 4 1 file changed, 4 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index 11936526b08b..a06e91c4de29 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -484,6 +484,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, return do_mprotect_pkey(start, len, prot, -1); } +#ifdef CONFIG_ARCH_HAS_PKEYS + SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, unsigned long, prot, int, pkey) { @@ -534,3 +536,5 @@ SYSCALL_DEFINE1(pkey_free, int, pkey) */ return ret; } + +#endif /* CONFIG_ARCH_HAS_PKEYS */ -- 2.8.4
Re: [PATCHv2 0/2] THREAD_INFO_IN_TASK prep work for arm64+s390
Hi Mark, > On Wed, Nov 02, 2016 at 03:56:26PM +, Mark Rutland wrote: > > On Wed, Oct 26, 2016 at 06:43:05PM +0100, Mark Rutland wrote: > > > Heiko and I have been working on THREAD_INFO_IN_TASK for s390 and arm64 > > > respectively, and we're both targetting v4.10. > > > > > > These are the common core changes which we both require. I've put > > > together a > > > branch [1,2] based on v4.9-rc2. I intend to tag this at some point next > > > week so > > > that Heiko and I have a stable base. > > > > As a heads-up, I've collated the tags I've received in the last few days, > > along > > with the comment Andy requested in the v1 thread, and tagged this as > > core-ti-stack-split [1]. > > > > Thanks, > > Mark. > > > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git > > core-ti-stack-split > > I believe that Catalin's hoping to take my arm64 patches for v4.10, > using the above tag as a base. > > What's the plan for s390? Are you happy to do the same? We intend to merge the s390 variant also for v4.10. Our approach has changed however: we came to the conclusion that having a per-cpu preempt count will generate better code on s390; so Martin implemented that. And while being at it he also moved the rest of the s390 specific stuff out of our struct thread_info. The corresponding patches will be pushed to git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features branch soon. In result we will have a struct thread_info that only contains the flags field like x86 does. So we don't depend an any other patches anymore...
Re: [PATCHv2 0/2] THREAD_INFO_IN_TASK prep work for arm64+s390
Hi Mark, > On Wed, Nov 02, 2016 at 03:56:26PM +, Mark Rutland wrote: > > On Wed, Oct 26, 2016 at 06:43:05PM +0100, Mark Rutland wrote: > > > Heiko and I have been working on THREAD_INFO_IN_TASK for s390 and arm64 > > > respectively, and we're both targetting v4.10. > > > > > > These are the common core changes which we both require. I've put > > > together a > > > branch [1,2] based on v4.9-rc2. I intend to tag this at some point next > > > week so > > > that Heiko and I have a stable base. > > > > As a heads-up, I've collated the tags I've received in the last few days, > > along > > with the comment Andy requested in the v1 thread, and tagged this as > > core-ti-stack-split [1]. > > > > Thanks, > > Mark. > > > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git > > core-ti-stack-split > > I believe that Catalin's hoping to take my arm64 patches for v4.10, > using the above tag as a base. > > What's the plan for s390? Are you happy to do the same? We intend to merge the s390 variant also for v4.10. Our approach has changed however: we came to the conclusion that having a per-cpu preempt count will generate better code on s390; so Martin implemented that. And while being at it he also moved the rest of the s390 specific stuff out of our struct thread_info. The corresponding patches will be pushed to git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features branch soon. In result we will have a struct thread_info that only contains the flags field like x86 does. So we don't depend an any other patches anymore...
Re: [PATCH] mm: only enable sys_pkey* when ARCH_HAS_PKEYS
On Tue, Nov 08, 2016 at 12:39:28PM +0100, Arnd Bergmann wrote: > On Tuesday, November 8, 2016 10:30:42 AM CET Heiko Carstens wrote: > > Three architectures (parisc, powerpc, s390) decided to ignore the system > > calls completely, but still have the pkey code linked into the kernel > > image. > > Wouldn't it actually make sense to hook this up to the storage keys > in the s390 page tables? We have storage keys per _physical_ page. Not per page within the the table entries. So this doesn't work unfortunately.
Re: [PATCH] mm: only enable sys_pkey* when ARCH_HAS_PKEYS
On Tue, Nov 08, 2016 at 12:39:28PM +0100, Arnd Bergmann wrote: > On Tuesday, November 8, 2016 10:30:42 AM CET Heiko Carstens wrote: > > Three architectures (parisc, powerpc, s390) decided to ignore the system > > calls completely, but still have the pkey code linked into the kernel > > image. > > Wouldn't it actually make sense to hook this up to the storage keys > in the s390 page tables? We have storage keys per _physical_ page. Not per page within the the table entries. So this doesn't work unfortunately.
Re: [PATCH] mm: only enable sys_pkey* when ARCH_HAS_PKEYS
On Tue, Nov 08, 2016 at 10:41:12AM +, Russell King - ARM Linux wrote: > On Tue, Nov 08, 2016 at 10:30:42AM +0100, Heiko Carstens wrote: > > Two architectures (arm, mips) have wired them up and thus allocated system > > call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a > > bit pointless. > > I don't think it's pointless at all. First, read the LWN article for > the userspace side of the interface: https://lwn.net/Articles/689395/ > > From reading this, it seems (at least to me) that these pkey syscalls > are going to be the application level API - which means applications > are probably going to want to make these calls. > > Sure, they'll have to go through glibc, and glibc can provide stubs, > but the problem with that is if we do get hardware pkey support (eg, > due to pressure to increase security) then we're going to end up > needing both kernel changes and glibc changes to add the calls. > > Since one of the design goals of pkeys is to allow them to work when > there is no underlying hardware support, I see no reason not to wire > them up in architecture syscall tables today, so that we have a cross- > architecture kernel version where the pkey syscalls become available. > glibc (and other libcs) don't then have to mess around with per- > architecture recording of which kernel version the pkey syscalls were > added. > > Not wiring up the syscalls doesn't really gain anything: the code > present when !ARCH_HAS_PKEYS will still be part of the kernel image, > it just won't be callable. That can be easily solved (see below). > So, on balance, I've decided to wire them up on ARM, even though the > hardware doesn't support them, to avoid unnecessary pain in userspace > from the ARM side of things. > > Obviously what other architectures do is their own business. It would make sense if this would be handled the same across architectures. We could simply ifdef the small pkey block in mprotect.c and rely on the pkey cond_syscalls added with e2753293ac4b. The result would actually be the same what Mark proposed, except with less generated code. Something like this: diff --git a/mm/mprotect.c b/mm/mprotect.c index 11936526b08b..9fb86b107e49 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -484,6 +484,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, return do_mprotect_pkey(start, len, prot, -1); } +#ifdef CONFIG_ARCH_HAS_PKEYS + SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, unsigned long, prot, int, pkey) { @@ -534,3 +536,4 @@ SYSCALL_DEFINE1(pkey_free, int, pkey) */ return ret; } +#endif /* CONFIG_ARCH_HAS_PKEYS */
Re: [PATCH] mm: only enable sys_pkey* when ARCH_HAS_PKEYS
On Tue, Nov 08, 2016 at 10:41:12AM +, Russell King - ARM Linux wrote: > On Tue, Nov 08, 2016 at 10:30:42AM +0100, Heiko Carstens wrote: > > Two architectures (arm, mips) have wired them up and thus allocated system > > call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a > > bit pointless. > > I don't think it's pointless at all. First, read the LWN article for > the userspace side of the interface: https://lwn.net/Articles/689395/ > > From reading this, it seems (at least to me) that these pkey syscalls > are going to be the application level API - which means applications > are probably going to want to make these calls. > > Sure, they'll have to go through glibc, and glibc can provide stubs, > but the problem with that is if we do get hardware pkey support (eg, > due to pressure to increase security) then we're going to end up > needing both kernel changes and glibc changes to add the calls. > > Since one of the design goals of pkeys is to allow them to work when > there is no underlying hardware support, I see no reason not to wire > them up in architecture syscall tables today, so that we have a cross- > architecture kernel version where the pkey syscalls become available. > glibc (and other libcs) don't then have to mess around with per- > architecture recording of which kernel version the pkey syscalls were > added. > > Not wiring up the syscalls doesn't really gain anything: the code > present when !ARCH_HAS_PKEYS will still be part of the kernel image, > it just won't be callable. That can be easily solved (see below). > So, on balance, I've decided to wire them up on ARM, even though the > hardware doesn't support them, to avoid unnecessary pain in userspace > from the ARM side of things. > > Obviously what other architectures do is their own business. It would make sense if this would be handled the same across architectures. We could simply ifdef the small pkey block in mprotect.c and rely on the pkey cond_syscalls added with e2753293ac4b. The result would actually be the same what Mark proposed, except with less generated code. Something like this: diff --git a/mm/mprotect.c b/mm/mprotect.c index 11936526b08b..9fb86b107e49 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -484,6 +484,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, return do_mprotect_pkey(start, len, prot, -1); } +#ifdef CONFIG_ARCH_HAS_PKEYS + SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, unsigned long, prot, int, pkey) { @@ -534,3 +536,4 @@ SYSCALL_DEFINE1(pkey_free, int, pkey) */ return ret; } +#endif /* CONFIG_ARCH_HAS_PKEYS */
Re: [PATCH] mm: only enable sys_pkey* when ARCH_HAS_PKEYS
On Fri, Nov 04, 2016 at 11:44:59PM +, Mark Rutland wrote: > On Wed, Nov 02, 2016 at 12:15:50PM -0700, Dave Hansen wrote: > > On 10/31/2016 05:08 PM, Mark Rutland wrote: > > > When an architecture does not select CONFIG_ARCH_HAS_PKEYS, the pkey_alloc > > > syscall will return -ENOSPC for all (otherwise well-formed) requests, as > > > the > > > generic implementation of mm_pkey_alloc() returns -1. The other pkey > > > syscalls > > > perform some work before always failing, in a similar fashion. > > > > > > This implies the absence of keys, but otherwise functional pkey support. > > > This > > > is odd, since the architecture provides no such support. Instead, it > > > would be > > > preferable to indicate that the syscall is not implemented, since this is > > > effectively the case. > > > > This makes the behavior of an x86 cpu without pkeys and an arm cpu > > without pkeys differ. Is that what we want? > > My rationale was that we have no idea whether architectures will have pkey > support in future, and if/when they do, we may have to apply additional checks > anyhow. i.e. in cases we'd return -ENOSPC today, we might want to return > another error code. > > Returning -ENOSYS retains the current behaviour, and allows us to handle that > ABI issue when we know what architecture support looks like. > > Other architectures not using the generic syscalls seem to handle this with > -ENOSYS, e.g. parisc with commit 18088db042dd9ae2, so there's differing > behaviour regardless of arm specifically. The three system calls won't return -ENOSYS on architectures which decided to ignore them (like with with above mentioned commit), since they haven't allocated a system call number at all. Right now we have one architecture where these three system calls work if the cpu supports the feature (x86). Two architectures (arm, mips) have wired them up and thus allocated system call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a bit pointless. Three architectures (parisc, powerpc, s390) decided to ignore the system calls completely, but still have the pkey code linked into the kernel image. imho the generic pkey code should be ifdef'ed with CONFIG_ARCH_HAS_PKEYS. Otherwise only dead code will be linked and increase the kernel image size for no good reason.
Re: [PATCH] mm: only enable sys_pkey* when ARCH_HAS_PKEYS
On Fri, Nov 04, 2016 at 11:44:59PM +, Mark Rutland wrote: > On Wed, Nov 02, 2016 at 12:15:50PM -0700, Dave Hansen wrote: > > On 10/31/2016 05:08 PM, Mark Rutland wrote: > > > When an architecture does not select CONFIG_ARCH_HAS_PKEYS, the pkey_alloc > > > syscall will return -ENOSPC for all (otherwise well-formed) requests, as > > > the > > > generic implementation of mm_pkey_alloc() returns -1. The other pkey > > > syscalls > > > perform some work before always failing, in a similar fashion. > > > > > > This implies the absence of keys, but otherwise functional pkey support. > > > This > > > is odd, since the architecture provides no such support. Instead, it > > > would be > > > preferable to indicate that the syscall is not implemented, since this is > > > effectively the case. > > > > This makes the behavior of an x86 cpu without pkeys and an arm cpu > > without pkeys differ. Is that what we want? > > My rationale was that we have no idea whether architectures will have pkey > support in future, and if/when they do, we may have to apply additional checks > anyhow. i.e. in cases we'd return -ENOSPC today, we might want to return > another error code. > > Returning -ENOSYS retains the current behaviour, and allows us to handle that > ABI issue when we know what architecture support looks like. > > Other architectures not using the generic syscalls seem to handle this with > -ENOSYS, e.g. parisc with commit 18088db042dd9ae2, so there's differing > behaviour regardless of arm specifically. The three system calls won't return -ENOSYS on architectures which decided to ignore them (like with with above mentioned commit), since they haven't allocated a system call number at all. Right now we have one architecture where these three system calls work if the cpu supports the feature (x86). Two architectures (arm, mips) have wired them up and thus allocated system call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a bit pointless. Three architectures (parisc, powerpc, s390) decided to ignore the system calls completely, but still have the pkey code linked into the kernel image. imho the generic pkey code should be ifdef'ed with CONFIG_ARCH_HAS_PKEYS. Otherwise only dead code will be linked and increase the kernel image size for no good reason.
Re: [PATCH] s390: remove unneeded dependency for gen_facilities
On Tue, Nov 08, 2016 at 10:55:47AM +0900, Masahiro Yamada wrote: > The dependency between the object and the source is handled by > scripts/Makefile.host, so only "hostprogs-y += gen_facilities" > is fine. > > Signed-off-by: Masahiro Yamada <yamada.masah...@socionext.com> > --- > > arch/s390/tools/Makefile | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/s390/tools/Makefile b/arch/s390/tools/Makefile > index 6d9814c..4b5e1e4 100644 > --- a/arch/s390/tools/Makefile > +++ b/arch/s390/tools/Makefile > @@ -9,7 +9,5 @@ define filechk_facilities.h > $(obj)/gen_facilities > endef > > -$(obj)/gen_facilities.o: $(srctree)/arch/s390/tools/gen_facilities.c > - > include/generated/facilities.h: $(obj)/gen_facilities FORCE > $(call filechk,facilities.h) Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> Martin, can you pick this one up too, please?
Re: [PATCH] s390: remove unneeded dependency for gen_facilities
On Tue, Nov 08, 2016 at 10:55:47AM +0900, Masahiro Yamada wrote: > The dependency between the object and the source is handled by > scripts/Makefile.host, so only "hostprogs-y += gen_facilities" > is fine. > > Signed-off-by: Masahiro Yamada > --- > > arch/s390/tools/Makefile | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/s390/tools/Makefile b/arch/s390/tools/Makefile > index 6d9814c..4b5e1e4 100644 > --- a/arch/s390/tools/Makefile > +++ b/arch/s390/tools/Makefile > @@ -9,7 +9,5 @@ define filechk_facilities.h > $(obj)/gen_facilities > endef > > -$(obj)/gen_facilities.o: $(srctree)/arch/s390/tools/gen_facilities.c > - > include/generated/facilities.h: $(obj)/gen_facilities FORCE > $(call filechk,facilities.h) Acked-by: Heiko Carstens Martin, can you pick this one up too, please?
Re: [PATCH] tile: handle RO_AFTER_INIT_DATA
On Mon, Nov 07, 2016 at 02:50:27PM -0500, Chris Metcalf wrote: > This is the minimal change to handle RO_AFTER_INIT_DATA. > The tile architecture already marks RO_DATA as read-only in > the kernel, so grouping RO_AFTER_INIT_DATA with RO_DATA, as is > done by default, means the kernel faults in init when it tries > to write to RO_AFTER_INIT_DATA. For now, just move it past the > end of the RODATA section so it is not specially treated. > --- > This is just to fix 4.9; I will post a more complete fix shortly > targeting 4.10. > > arch/tile/kernel/vmlinux.lds.S | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S > index e1baf094fba4..dcd7445c31a2 100644 > --- a/arch/tile/kernel/vmlinux.lds.S > +++ b/arch/tile/kernel/vmlinux.lds.S > @@ -1,3 +1,6 @@ > +/* Handle ro_after_init data on our own. */ > +#define RO_AFTER_INIT_DATA > + > #include > #include > #include > @@ -87,6 +90,7 @@ SECTIONS > >_sdata = .; /* Start of data section */ >RO_DATA_SECTION(PAGE_SIZE) > + RO_AFTER_INIT_DATA >RW_DATA_SECTION(L2_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE) >_edata = .; imho, the minimal fix would be to just #define __ro_after_init __read_mostly within arch/tile/include/asm/cache.h. That's also what parisc currently has.
Re: [PATCH] tile: handle RO_AFTER_INIT_DATA
On Mon, Nov 07, 2016 at 02:50:27PM -0500, Chris Metcalf wrote: > This is the minimal change to handle RO_AFTER_INIT_DATA. > The tile architecture already marks RO_DATA as read-only in > the kernel, so grouping RO_AFTER_INIT_DATA with RO_DATA, as is > done by default, means the kernel faults in init when it tries > to write to RO_AFTER_INIT_DATA. For now, just move it past the > end of the RODATA section so it is not specially treated. > --- > This is just to fix 4.9; I will post a more complete fix shortly > targeting 4.10. > > arch/tile/kernel/vmlinux.lds.S | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S > index e1baf094fba4..dcd7445c31a2 100644 > --- a/arch/tile/kernel/vmlinux.lds.S > +++ b/arch/tile/kernel/vmlinux.lds.S > @@ -1,3 +1,6 @@ > +/* Handle ro_after_init data on our own. */ > +#define RO_AFTER_INIT_DATA > + > #include > #include > #include > @@ -87,6 +90,7 @@ SECTIONS > >_sdata = .; /* Start of data section */ >RO_DATA_SECTION(PAGE_SIZE) > + RO_AFTER_INIT_DATA >RW_DATA_SECTION(L2_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE) >_edata = .; imho, the minimal fix would be to just #define __ro_after_init __read_mostly within arch/tile/include/asm/cache.h. That's also what parisc currently has.
Re: [PATCH 2/2] s390: squash facilities_src.h into gen_facilities.c
On Mon, Nov 07, 2016 at 02:13:06PM +0100, Paul Bolle wrote: > It took me some time to figure out that gen_facilities is only used to > generate a small header file (generated/facilities.h). And that header's only > goal is to define FACILITIES_ALS and FACILITIES_KVM. > > Pasted below is an attempt to use asm/facilities.h instead of > generated/facilities.h. That allows to drop arch/s390/tools/ entirely. I don't > actually have an s390 machine at hand to test this, but this does build and > the preprocessed code this generates looks sane. > > (Yes, asm/facilities.h might need another level of preprocessor defines to > become actually readable.) The whole point of the tool is that we can use the bit numbers that are specified within the architecture document, with the additional odd IBM bit numbering scheme. Where the most significant bit has bit number 0(!). > diff --git a/arch/s390/include/asm/facilities.h > b/arch/s390/include/asm/facilities.h > new file mode 100644 > index ..c87f18d29217 > --- /dev/null > +++ b/arch/s390/include/asm/facilities.h > @@ -0,0 +1,43 @@ > +#ifndef __ASM_FACILITIES_H > +#define __ASM_FACILITIES_H > + > +#define FACILITIES_ALS \ > + _AC(IS_BUILTIN(CONFIG_HAVE_MARCH_Z900_FEATURES), UL) << 0 | /* N3 > instructions */ \ So, this is wrong. It should be " << 63" to match the existing code. > +#define FACILITIES_KVM \ > + _BITUL(0) |/* N3 instructions */ \ > + _BITUL(1) |/* z/Arch mode installed */ \ > + _BITUL(2) |/* z/Arch mode active */ \ > + _BITUL(3) |/* DAT-enhancement */ \ > + _BITUL(4) |/* idte segment table */ \ > + _BITUL(5) |/* idte region table */ \ > + _BITUL(6) |/* ASN-and-LX reuse */ \ > + _BITUL(7) |/* stfle */ \ > + _BITUL(8) |/* enhanced-DAT 1 */ \ > + _BITUL(9) |/* sense-running-status */ \ > + _BITUL(10) |/* conditional sske */ \ > + _BITUL(13) |/* ipte-range */ \ > + _BITUL(14) /* nonquiescing key-setting */ \ > + , \ > + _BITUL(9) |/* transactional execution */ \ > + _BITUL(11) |/* access-exception-fetch/store indication */ \ And this is exactly what I want to avoid: start counting from zero again if we cross a double word. It _must_ read 73, 75, otherwise this becomes the unmaintainable and error prone mess we had before. I just want a list with bit numbers and the rest must be created automatically.
Re: [PATCH 2/2] s390: squash facilities_src.h into gen_facilities.c
On Mon, Nov 07, 2016 at 02:13:06PM +0100, Paul Bolle wrote: > It took me some time to figure out that gen_facilities is only used to > generate a small header file (generated/facilities.h). And that header's only > goal is to define FACILITIES_ALS and FACILITIES_KVM. > > Pasted below is an attempt to use asm/facilities.h instead of > generated/facilities.h. That allows to drop arch/s390/tools/ entirely. I don't > actually have an s390 machine at hand to test this, but this does build and > the preprocessed code this generates looks sane. > > (Yes, asm/facilities.h might need another level of preprocessor defines to > become actually readable.) The whole point of the tool is that we can use the bit numbers that are specified within the architecture document, with the additional odd IBM bit numbering scheme. Where the most significant bit has bit number 0(!). > diff --git a/arch/s390/include/asm/facilities.h > b/arch/s390/include/asm/facilities.h > new file mode 100644 > index ..c87f18d29217 > --- /dev/null > +++ b/arch/s390/include/asm/facilities.h > @@ -0,0 +1,43 @@ > +#ifndef __ASM_FACILITIES_H > +#define __ASM_FACILITIES_H > + > +#define FACILITIES_ALS \ > + _AC(IS_BUILTIN(CONFIG_HAVE_MARCH_Z900_FEATURES), UL) << 0 | /* N3 > instructions */ \ So, this is wrong. It should be " << 63" to match the existing code. > +#define FACILITIES_KVM \ > + _BITUL(0) |/* N3 instructions */ \ > + _BITUL(1) |/* z/Arch mode installed */ \ > + _BITUL(2) |/* z/Arch mode active */ \ > + _BITUL(3) |/* DAT-enhancement */ \ > + _BITUL(4) |/* idte segment table */ \ > + _BITUL(5) |/* idte region table */ \ > + _BITUL(6) |/* ASN-and-LX reuse */ \ > + _BITUL(7) |/* stfle */ \ > + _BITUL(8) |/* enhanced-DAT 1 */ \ > + _BITUL(9) |/* sense-running-status */ \ > + _BITUL(10) |/* conditional sske */ \ > + _BITUL(13) |/* ipte-range */ \ > + _BITUL(14) /* nonquiescing key-setting */ \ > + , \ > + _BITUL(9) |/* transactional execution */ \ > + _BITUL(11) |/* access-exception-fetch/store indication */ \ And this is exactly what I want to avoid: start counting from zero again if we cross a double word. It _must_ read 73, 75, otherwise this becomes the unmaintainable and error prone mess we had before. I just want a list with bit numbers and the rest must be created automatically.
Re: [PATCH 2/2] s390: squash facilities_src.h into gen_facilities.c
On Sun, Nov 06, 2016 at 12:45:28PM +0900, Masahiro Yamada wrote: > We generally expect headers in arch/$(ARCH)/include/asm directory > are included from kernel sources, but facilities_src.h is not; > it is included from the arch/s390/tools/gen_facilities.c tool. > > There is no reason to expose this header to the public include path. > Furthermore, facilities_src.h makes sure to be included only from > gen_facilities.c by the following: > > #ifndef S390_GEN_FACILITIES_C > #error "This file can only be included by gen_facilities.c" > #endif > > This check can be removed by merging the two files. > > Signed-off-by: Masahiro Yamada <yamada.masah...@socionext.com> > --- Both patches: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> Martin, can you please pick them up?
Re: [PATCH 2/2] s390: squash facilities_src.h into gen_facilities.c
On Sun, Nov 06, 2016 at 12:45:28PM +0900, Masahiro Yamada wrote: > We generally expect headers in arch/$(ARCH)/include/asm directory > are included from kernel sources, but facilities_src.h is not; > it is included from the arch/s390/tools/gen_facilities.c tool. > > There is no reason to expose this header to the public include path. > Furthermore, facilities_src.h makes sure to be included only from > gen_facilities.c by the following: > > #ifndef S390_GEN_FACILITIES_C > #error "This file can only be included by gen_facilities.c" > #endif > > This check can be removed by merging the two files. > > Signed-off-by: Masahiro Yamada > --- Both patches: Acked-by: Heiko Carstens Martin, can you please pick them up?
Re: [PATCH 11/25] s390/smp: Convert to hotplug state machine
On Thu, Nov 03, 2016 at 03:50:07PM +0100, Sebastian Andrzej Siewior wrote: > @@ -1084,20 +1079,13 @@ static int smp_add_present_cpu(int cpu) > rc = sysfs_create_group(>kobj, _common_attr_group); > if (rc) > goto out_cpu; > - if (cpu_online(cpu)) { > - rc = sysfs_create_group(>kobj, _online_attr_group); > - if (rc) > - goto out_online; > - } > + > rc = topology_cpu_init(c); Please don't add whitespace :) If you fix the minor nits both s390 related patches are Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com> Thanks!
Re: [PATCH 11/25] s390/smp: Convert to hotplug state machine
On Thu, Nov 03, 2016 at 03:50:07PM +0100, Sebastian Andrzej Siewior wrote: > @@ -1084,20 +1079,13 @@ static int smp_add_present_cpu(int cpu) > rc = sysfs_create_group(>kobj, _common_attr_group); > if (rc) > goto out_cpu; > - if (cpu_online(cpu)) { > - rc = sysfs_create_group(>kobj, _online_attr_group); > - if (rc) > - goto out_online; > - } > + > rc = topology_cpu_init(c); Please don't add whitespace :) If you fix the minor nits both s390 related patches are Acked-by: Heiko Carstens Thanks!
Re: [PATCH 10/25] s390/smp: Make cpu notifier symetric
On Thu, Nov 03, 2016 at 03:50:06PM +0100, Sebastian Andrzej Siewior wrote: > From: Thomas Gleixner <t...@linutronix.de> > > There is no reason to remove the sysfs cpu files when the CPU is dead, they > can be removed when the cpu is prepared to go down. Doing it at > DOWN_PREPARE allows us to convert it to a symetric hotplug state in the > next step. > > Cc: Martin Schwidefsky <schwidef...@de.ibm.com> > Cc: Heiko Carstens <heiko.carst...@de.ibm.com> > Cc: linux-s...@vger.kernel.org > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> > --- > arch/s390/kernel/smp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c > index 35531fe1c5ea..1a21e66c484a 100644 > --- a/arch/s390/kernel/smp.c > +++ b/arch/s390/kernel/smp.c > @@ -1056,9 +1056,10 @@ static int smp_cpu_notify(struct notifier_block *self, > unsigned long action, > > switch (action & ~CPU_TASKS_FROZEN) { > case CPU_ONLINE: > + case CPU_DOWN_FAILED: > err = sysfs_create_group(>kobj, _online_attr_group); > break; > - case CPU_DEAD: > + case CPU_DOWN_PREPARE:: This won't compile... even though it will be removed with the following patch it would be good to fix this to keep this bisectable.
Re: [PATCH 10/25] s390/smp: Make cpu notifier symetric
On Thu, Nov 03, 2016 at 03:50:06PM +0100, Sebastian Andrzej Siewior wrote: > From: Thomas Gleixner > > There is no reason to remove the sysfs cpu files when the CPU is dead, they > can be removed when the cpu is prepared to go down. Doing it at > DOWN_PREPARE allows us to convert it to a symetric hotplug state in the > next step. > > Cc: Martin Schwidefsky > Cc: Heiko Carstens > Cc: linux-s...@vger.kernel.org > Signed-off-by: Thomas Gleixner > Signed-off-by: Sebastian Andrzej Siewior > --- > arch/s390/kernel/smp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c > index 35531fe1c5ea..1a21e66c484a 100644 > --- a/arch/s390/kernel/smp.c > +++ b/arch/s390/kernel/smp.c > @@ -1056,9 +1056,10 @@ static int smp_cpu_notify(struct notifier_block *self, > unsigned long action, > > switch (action & ~CPU_TASKS_FROZEN) { > case CPU_ONLINE: > + case CPU_DOWN_FAILED: > err = sysfs_create_group(>kobj, _online_attr_group); > break; > - case CPU_DEAD: > + case CPU_DOWN_PREPARE:: This won't compile... even though it will be removed with the following patch it would be good to fix this to keep this bisectable.
[PATCH] percpu: use notrace variant of preempt_disable/preempt_enable
Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") added a couple of this_cpu_read calls to the ftrace code. On x86 this is not a problem, since it has single instructions to read percpu data. Other architectures which use the generic variant now have additional preempt_disable and preempt_enable calls in the core ftrace code. This may lead to recursive calls and in result to a dead machine, e.g. if preemption and debugging options are enabled. To fix this use the notrace variant of preempt_disable and preempt_enable within the generic percpu code. Reported-and-bisected-by: Sebastian Ott <seb...@linux.vnet.ibm.com> Tested-by: Sebastian Ott <seb...@linux.vnet.ibm.com> Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- include/asm-generic/percpu.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h index 40e887068da2..0504ef8f3aa3 100644 --- a/include/asm-generic/percpu.h +++ b/include/asm-generic/percpu.h @@ -118,9 +118,9 @@ do { \ #define this_cpu_generic_read(pcp) \ ({ \ typeof(pcp) __ret; \ - preempt_disable(); \ + preempt_disable_notrace(); \ __ret = raw_cpu_generic_read(pcp); \ - preempt_enable(); \ + preempt_enable_notrace(); \ __ret; \ }) -- 2.8.4
[PATCH] percpu: use notrace variant of preempt_disable/preempt_enable
Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") added a couple of this_cpu_read calls to the ftrace code. On x86 this is not a problem, since it has single instructions to read percpu data. Other architectures which use the generic variant now have additional preempt_disable and preempt_enable calls in the core ftrace code. This may lead to recursive calls and in result to a dead machine, e.g. if preemption and debugging options are enabled. To fix this use the notrace variant of preempt_disable and preempt_enable within the generic percpu code. Reported-and-bisected-by: Sebastian Ott Tested-by: Sebastian Ott Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") Signed-off-by: Heiko Carstens --- include/asm-generic/percpu.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h index 40e887068da2..0504ef8f3aa3 100644 --- a/include/asm-generic/percpu.h +++ b/include/asm-generic/percpu.h @@ -118,9 +118,9 @@ do { \ #define this_cpu_generic_read(pcp) \ ({ \ typeof(pcp) __ret; \ - preempt_disable(); \ + preempt_disable_notrace(); \ __ret = raw_cpu_generic_read(pcp); \ - preempt_enable(); \ + preempt_enable_notrace(); \ __ret; \ }) -- 2.8.4
Re: [PATCH 0/9] s390: remove modular usage from non-modular code
On Sun, Oct 30, 2016 at 04:37:23PM -0400, Paul Gortmaker wrote: > My ongoing audit looking for non-modular code that needlessly uses > modular macros (vs. built-in equivalents) and/or has dead code > relating to module unloading that can never be executed led to the > creation of these s390 related commits. > > For anyone new to the underlying goal of this cleanup, we are trying to > not use module support for code that can never be built as a module since: > > (1) it is easy to accidentally write unused module_exit and remove code > (2) it can be misleading when reading the source, thinking it can be > modular when the Makefile and/or Kconfig prohibit it > (3) it requires the include of the module.h header file which in turn > includes nearly everything else, thus adding to CPP overhead. > (4) it gets copied/replicated into other code and spreads like weeds. > > Build tested on current linux-next (allyes/allno/allmod) to ensure no > silly typos or implicit include issues that would break compilation > crept in. > > Paul Gortmaker (9): > s390: cio: make it explicitly non-modular > s390: char: make zcore explicitly non-modular > s390: char: make con3215 explicitly non-modular > s390: char: make sclp_tty explicitly non-modular > s390: char: make slcp_quiesce explicitly non-modular > s390: hotplug: make pci_hpc explicitly non-modular > s390: hypfs: make inode explicitly non-modular > s390: kernel: make lgr explicitly non-modular > s390: virtio: make ccw explicitly non-modular Whole series applied. Thanks!
Re: [PATCH 0/9] s390: remove modular usage from non-modular code
On Sun, Oct 30, 2016 at 04:37:23PM -0400, Paul Gortmaker wrote: > My ongoing audit looking for non-modular code that needlessly uses > modular macros (vs. built-in equivalents) and/or has dead code > relating to module unloading that can never be executed led to the > creation of these s390 related commits. > > For anyone new to the underlying goal of this cleanup, we are trying to > not use module support for code that can never be built as a module since: > > (1) it is easy to accidentally write unused module_exit and remove code > (2) it can be misleading when reading the source, thinking it can be > modular when the Makefile and/or Kconfig prohibit it > (3) it requires the include of the module.h header file which in turn > includes nearly everything else, thus adding to CPP overhead. > (4) it gets copied/replicated into other code and spreads like weeds. > > Build tested on current linux-next (allyes/allno/allmod) to ensure no > silly typos or implicit include issues that would break compilation > crept in. > > Paul Gortmaker (9): > s390: cio: make it explicitly non-modular > s390: char: make zcore explicitly non-modular > s390: char: make con3215 explicitly non-modular > s390: char: make sclp_tty explicitly non-modular > s390: char: make slcp_quiesce explicitly non-modular > s390: hotplug: make pci_hpc explicitly non-modular > s390: hypfs: make inode explicitly non-modular > s390: kernel: make lgr explicitly non-modular > s390: virtio: make ccw explicitly non-modular Whole series applied. Thanks!
[tip:x86/urgent] sched/core, x86: Make struct thread_info arch specific again
Commit-ID: c8061485a0d7569a865a3cc3c63347b0f42b3765 Gitweb: http://git.kernel.org/tip/c8061485a0d7569a865a3cc3c63347b0f42b3765 Author: Heiko Carstens <heiko.carst...@de.ibm.com> AuthorDate: Wed, 19 Oct 2016 19:28:11 +0100 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Thu, 20 Oct 2016 13:27:47 +0200 sched/core, x86: Make struct thread_info arch specific again The following commit: c65eacbe290b ("sched/core: Allow putting thread_info into task_struct") ... made 'struct thread_info' a generic struct with only a single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is selected. This change however seems to be quite x86 centric, since at least the generic preemption code (asm-generic/preempt.h) assumes that struct thread_info also has a preempt_count member, which apparently was not true for x86. We could add a bit more #ifdefs to solve this problem too, but it seems to be much simpler to make struct thread_info arch specific again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a bit easier for architectures that have a couple of arch specific stuff in their thread_info definition. The arch specific stuff _could_ be moved to thread_struct. However keeping them in thread_info makes it easier: accessing thread_info members is simple, since it is at the beginning of the task_struct, while the thread_struct is at the end. At least on s390 the offsets needed to access members of the thread_struct (with task_struct as base) are too large for various asm instructions. This is not a problem when keeping these members within thread_info. Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> Signed-off-by: Mark Rutland <mark.rutl...@arm.com> Acked-by: Thomas Gleixner <t...@linutronix.de> Cc: Andrew Morton <a...@linux-foundation.org> Cc: Andy Lutomirski <l...@kernel.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: keesc...@chromium.org Cc: linux-a...@vger.kernel.org Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutl...@arm.com Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/include/asm/thread_info.h | 9 + include/linux/thread_info.h| 11 --- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 2aaca53..ad6f5eb0 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -52,6 +52,15 @@ struct task_struct; #include #include +struct thread_info { + unsigned long flags; /* low level flags */ +}; + +#define INIT_THREAD_INFO(tsk) \ +{ \ + .flags = 0,\ +} + #define init_stack (init_thread_union.stack) #else /* !__ASSEMBLY__ */ diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 45f004e..2873baf 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -14,17 +14,6 @@ struct timespec; struct compat_timespec; #ifdef CONFIG_THREAD_INFO_IN_TASK -struct thread_info { - unsigned long flags; /* low level flags */ -}; - -#define INIT_THREAD_INFO(tsk) \ -{ \ - .flags = 0,\ -} -#endif - -#ifdef CONFIG_THREAD_INFO_IN_TASK #define current_thread_info() ((struct thread_info *)current) #endif
[tip:x86/urgent] sched/core, x86: Make struct thread_info arch specific again
Commit-ID: c8061485a0d7569a865a3cc3c63347b0f42b3765 Gitweb: http://git.kernel.org/tip/c8061485a0d7569a865a3cc3c63347b0f42b3765 Author: Heiko Carstens AuthorDate: Wed, 19 Oct 2016 19:28:11 +0100 Committer: Ingo Molnar CommitDate: Thu, 20 Oct 2016 13:27:47 +0200 sched/core, x86: Make struct thread_info arch specific again The following commit: c65eacbe290b ("sched/core: Allow putting thread_info into task_struct") ... made 'struct thread_info' a generic struct with only a single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is selected. This change however seems to be quite x86 centric, since at least the generic preemption code (asm-generic/preempt.h) assumes that struct thread_info also has a preempt_count member, which apparently was not true for x86. We could add a bit more #ifdefs to solve this problem too, but it seems to be much simpler to make struct thread_info arch specific again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a bit easier for architectures that have a couple of arch specific stuff in their thread_info definition. The arch specific stuff _could_ be moved to thread_struct. However keeping them in thread_info makes it easier: accessing thread_info members is simple, since it is at the beginning of the task_struct, while the thread_struct is at the end. At least on s390 the offsets needed to access members of the thread_struct (with task_struct as base) are too large for various asm instructions. This is not a problem when keeping these members within thread_info. Signed-off-by: Heiko Carstens Signed-off-by: Mark Rutland Acked-by: Thomas Gleixner Cc: Andrew Morton Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Peter Zijlstra Cc: keesc...@chromium.org Cc: linux-a...@vger.kernel.org Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutl...@arm.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/thread_info.h | 9 + include/linux/thread_info.h| 11 --- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 2aaca53..ad6f5eb0 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -52,6 +52,15 @@ struct task_struct; #include #include +struct thread_info { + unsigned long flags; /* low level flags */ +}; + +#define INIT_THREAD_INFO(tsk) \ +{ \ + .flags = 0,\ +} + #define init_stack (init_thread_union.stack) #else /* !__ASSEMBLY__ */ diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 45f004e..2873baf 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -14,17 +14,6 @@ struct timespec; struct compat_timespec; #ifdef CONFIG_THREAD_INFO_IN_TASK -struct thread_info { - unsigned long flags; /* low level flags */ -}; - -#define INIT_THREAD_INFO(tsk) \ -{ \ - .flags = 0,\ -} -#endif - -#ifdef CONFIG_THREAD_INFO_IN_TASK #define current_thread_info() ((struct thread_info *)current) #endif
Re: [PATCH 3/3] thread_info: include for THREAD_INFO_IN_TASK
On Wed, Oct 19, 2016 at 07:28:13PM +0100, Mark Rutland wrote: > When CONFIG_THREAD_INFO_IN_TASK is selected, the current_thread_info() > macro relies on current having been defined prior to its use. However, > not all users of current_thread_info() include , and thus > current is not guaranteed to be defined. > > When CONFIG_THREAD_INFO_IN_TASK is not selected, it's possible that > get_current() / current are based upon current_thread_info(), and > includes . Thus always including > would result in circular dependences on some platforms. > > To ensure both cases work, this patch includes , but only > when CONFIG_THREAD_INFO_IN_TASK is selected. > > Signed-off-by: Mark Rutland <mark.rutl...@arm.com> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Andy Lutomirski <l...@kernel.org> > Cc: Heiko Carstens <heiko.carst...@de.ibm.com> > Cc: Kees Cook <keesc...@chromium.org> > --- > include/linux/thread_info.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h > index c75c6ab..ef1f4b0 100644 > --- a/include/linux/thread_info.h > +++ b/include/linux/thread_info.h > @@ -12,6 +12,7 @@ > #include > > #ifdef CONFIG_THREAD_INFO_IN_TASK > +#include > #define current_thread_info() ((struct thread_info *)current) > #endif FWIW: Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Re: [PATCH 3/3] thread_info: include for THREAD_INFO_IN_TASK
On Wed, Oct 19, 2016 at 07:28:13PM +0100, Mark Rutland wrote: > When CONFIG_THREAD_INFO_IN_TASK is selected, the current_thread_info() > macro relies on current having been defined prior to its use. However, > not all users of current_thread_info() include , and thus > current is not guaranteed to be defined. > > When CONFIG_THREAD_INFO_IN_TASK is not selected, it's possible that > get_current() / current are based upon current_thread_info(), and > includes . Thus always including > would result in circular dependences on some platforms. > > To ensure both cases work, this patch includes , but only > when CONFIG_THREAD_INFO_IN_TASK is selected. > > Signed-off-by: Mark Rutland > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Heiko Carstens > Cc: Kees Cook > --- > include/linux/thread_info.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h > index c75c6ab..ef1f4b0 100644 > --- a/include/linux/thread_info.h > +++ b/include/linux/thread_info.h > @@ -12,6 +12,7 @@ > #include > > #ifdef CONFIG_THREAD_INFO_IN_TASK > +#include > #define current_thread_info() ((struct thread_info *)current) > #endif FWIW: Acked-by: Heiko Carstens
Re: [PATCH v2 1/1] s390/spinlock: Provide vcpu_is_preempted
On Wed, Oct 19, 2016 at 08:56:36AM +0200, Christian Borntraeger wrote: > On 09/29/2016 05:51 PM, Christian Borntraeger wrote: > > this implements the s390 backend for commit > > "kernel/sched: introduce vcpu preempted check interface" > > by reworking the existing smp_vcpu_scheduled into > > arch_vcpu_is_preempted. We can then also get rid of the > > local cpu_is_preempted function by moving the > > CIF_ENABLED_WAIT test into arch_vcpu_is_preempted. > > > > Signed-off-by: Christian Borntraeger> > > Martin, Peter, > > I think we could go with the patch as is. In other words not providing > arch_vcpu_is_preempted for !CONFIG_SMP. > > This will result in compile errors if code does spinning or yielding for > non-SMP kernels - which does not make sense to me, so this might actually > be a nice indicator. > If you prefer the !CONFIG_SMP implementation let me know and I will respin. ...but I do prefer an implementation for !CONFIG_SMP. I'm tired of fixing silly compile errors that only happen on s390.
Re: [PATCH v2 1/1] s390/spinlock: Provide vcpu_is_preempted
On Wed, Oct 19, 2016 at 08:56:36AM +0200, Christian Borntraeger wrote: > On 09/29/2016 05:51 PM, Christian Borntraeger wrote: > > this implements the s390 backend for commit > > "kernel/sched: introduce vcpu preempted check interface" > > by reworking the existing smp_vcpu_scheduled into > > arch_vcpu_is_preempted. We can then also get rid of the > > local cpu_is_preempted function by moving the > > CIF_ENABLED_WAIT test into arch_vcpu_is_preempted. > > > > Signed-off-by: Christian Borntraeger > > > Martin, Peter, > > I think we could go with the patch as is. In other words not providing > arch_vcpu_is_preempted for !CONFIG_SMP. > > This will result in compile errors if code does spinning or yielding for > non-SMP kernels - which does not make sense to me, so this might actually > be a nice indicator. > If you prefer the !CONFIG_SMP implementation let me know and I will respin. ...but I do prefer an implementation for !CONFIG_SMP. I'm tired of fixing silly compile errors that only happen on s390.
Re: [PATCH v3] s390/spinlock: Provide vcpu_is_preempted
On Wed, Oct 19, 2016 at 10:42:04AM +0200, Christian Borntraeger wrote: > this implements the s390 backend for commit > "kernel/sched: introduce vcpu preempted check interface" > by reworking the existing smp_vcpu_scheduled into > arch_vcpu_is_preempted. We can then also get rid of the > local cpu_is_preempted function by moving the > CIF_ENABLED_WAIT test into arch_vcpu_is_preempted. > > Signed-off-by: Christian Borntraeger <borntrae...@de.ibm.com> > --- > arch/s390/include/asm/spinlock.h | 8 > arch/s390/kernel/smp.c | 9 +++-- > arch/s390/lib/spinlock.c | 25 - > 3 files changed, 23 insertions(+), 19 deletions(-) Acked-by: Heiko Carstens <heiko.carst...@de.ibm.com>
Re: [PATCH v3] s390/spinlock: Provide vcpu_is_preempted
On Wed, Oct 19, 2016 at 10:42:04AM +0200, Christian Borntraeger wrote: > this implements the s390 backend for commit > "kernel/sched: introduce vcpu preempted check interface" > by reworking the existing smp_vcpu_scheduled into > arch_vcpu_is_preempted. We can then also get rid of the > local cpu_is_preempted function by moving the > CIF_ENABLED_WAIT test into arch_vcpu_is_preempted. > > Signed-off-by: Christian Borntraeger > --- > arch/s390/include/asm/spinlock.h | 8 > arch/s390/kernel/smp.c | 9 +++-- > arch/s390/lib/spinlock.c | 25 - > 3 files changed, 23 insertions(+), 19 deletions(-) Acked-by: Heiko Carstens
Re: [PATCH] thread_info: include for THREAD_INFO_IN_TASK (WAS: [PATCH 2/3] sched/preempt: include asm/current.h)
On Mon, Oct 17, 2016 at 06:33:15PM +0100, Mark Rutland wrote: > On Mon, Oct 17, 2016 at 03:48:13PM +0100, Mark Rutland wrote: > > On Fri, Oct 14, 2016 at 11:42:25AM +0100, Mark Rutland wrote: > > > When CONFIG_THREAD_INFO_IN_TASK is selected, the current_thread_info() > > > macro relies on current having been defined prior to its use. However, > > > not all users of current_thread_info() include , and thus > > > current is not guaranteed to be defined. > > > > > > When CONFIG_THREAD_INFO_IN_TASK is not selected, it's possible that > > > get_current() / current are based upon current_thread_info(), and > > > includes . Thus always including > > > would result in circular dependences on some platforms. > > > > > > To ensure both cases work, this patch includes , but only > > > when CONFIG_THREAD_INFO_IN_TASK is selected. > > > > > > Signed-off-by: Mark Rutland> > > --- > > > include/linux/thread_info.h | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > As discussed, it would be great if this could go in along with the patch > > > to > > > make thread_info arch-specific again, to make merging the arch-specific > > > parts > > > easier (for arm64 and s390). > > > > Urrgh; I've just recalled that this patch alone is insufficient. The > > h8300 arch code has an unfixed bug [1], and relies on some implicit > > definition ordering that will be broken by this patch. > > > > I have a workaround/cleanup for core code that I'll send with an updated > > version of my arm64 series shortly. > > Looks like I spoke too soon. I have another circular include issue with > raw_smp_processor_id() and task_struct::cpu to solve -- it looks like > s390 doesn't suffer from that per my reading of your headers. That's correct. > In the mean time, I've pushed out a branch [2] with the common patches, > atop of v4.9-rc1. I just verified that your branch works fine for s390 (with and without the THREAD_INFO_IN_TASK conversion). Thanks, Heiko
Re: [PATCH] thread_info: include for THREAD_INFO_IN_TASK (WAS: [PATCH 2/3] sched/preempt: include asm/current.h)
On Mon, Oct 17, 2016 at 06:33:15PM +0100, Mark Rutland wrote: > On Mon, Oct 17, 2016 at 03:48:13PM +0100, Mark Rutland wrote: > > On Fri, Oct 14, 2016 at 11:42:25AM +0100, Mark Rutland wrote: > > > When CONFIG_THREAD_INFO_IN_TASK is selected, the current_thread_info() > > > macro relies on current having been defined prior to its use. However, > > > not all users of current_thread_info() include , and thus > > > current is not guaranteed to be defined. > > > > > > When CONFIG_THREAD_INFO_IN_TASK is not selected, it's possible that > > > get_current() / current are based upon current_thread_info(), and > > > includes . Thus always including > > > would result in circular dependences on some platforms. > > > > > > To ensure both cases work, this patch includes , but only > > > when CONFIG_THREAD_INFO_IN_TASK is selected. > > > > > > Signed-off-by: Mark Rutland > > > --- > > > include/linux/thread_info.h | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > As discussed, it would be great if this could go in along with the patch > > > to > > > make thread_info arch-specific again, to make merging the arch-specific > > > parts > > > easier (for arm64 and s390). > > > > Urrgh; I've just recalled that this patch alone is insufficient. The > > h8300 arch code has an unfixed bug [1], and relies on some implicit > > definition ordering that will be broken by this patch. > > > > I have a workaround/cleanup for core code that I'll send with an updated > > version of my arm64 series shortly. > > Looks like I spoke too soon. I have another circular include issue with > raw_smp_processor_id() and task_struct::cpu to solve -- it looks like > s390 doesn't suffer from that per my reading of your headers. That's correct. > In the mean time, I've pushed out a branch [2] with the common patches, > atop of v4.9-rc1. I just verified that your branch works fine for s390 (with and without the THREAD_INFO_IN_TASK conversion). Thanks, Heiko
Re: [PATCH 2/3] sched/preempt: include asm/current.h
On Fri, Oct 14, 2016 at 12:25:56AM +0100, Mark Rutland wrote: > Hi, > > On Thu, Oct 13, 2016 at 01:57:11PM +0200, Heiko Carstens wrote: > > The generic preempt code needs to include . Otherwise > > compilation fails if THREAD_INFO_IN_TASK is selected and the generic > > preempt code is used: > > > > ./include/linux/thread_info.h:17:54: error: 'current' undeclared (first use > > in this function) > > #define current_thread_info() ((struct thread_info *)current) > > I don't think this is the right fix. Users of current_thread_info() should > only > have to include , as already > does. > > I have a patch [1] which has include the > THREAD_INFO_IN_TASK case (while avoiding circular includes over > > and in the !THREAD_INFO_IN_TASK case). I added that include initially to too, but was afraid of include dependency hell. So I tried the minimal version, and it worked. However with the ifdef within your patch you make sure that nothing breaks by accident; so I think your version is better. > I was planning on posting an updated series with that come -rc1. That could/should also go into 4.9, so architectures could independently convert to THREAD_INFO_IN_TASK for the next merge window.
Re: [PATCH 2/3] sched/preempt: include asm/current.h
On Fri, Oct 14, 2016 at 12:25:56AM +0100, Mark Rutland wrote: > Hi, > > On Thu, Oct 13, 2016 at 01:57:11PM +0200, Heiko Carstens wrote: > > The generic preempt code needs to include . Otherwise > > compilation fails if THREAD_INFO_IN_TASK is selected and the generic > > preempt code is used: > > > > ./include/linux/thread_info.h:17:54: error: 'current' undeclared (first use > > in this function) > > #define current_thread_info() ((struct thread_info *)current) > > I don't think this is the right fix. Users of current_thread_info() should > only > have to include , as already > does. > > I have a patch [1] which has include the > THREAD_INFO_IN_TASK case (while avoiding circular includes over > > and in the !THREAD_INFO_IN_TASK case). I added that include initially to too, but was afraid of include dependency hell. So I tried the minimal version, and it worked. However with the ifdef within your patch you make sure that nothing breaks by accident; so I think your version is better. > I was planning on posting an updated series with that come -rc1. That could/should also go into 4.9, so architectures could independently convert to THREAD_INFO_IN_TASK for the next merge window.
[PATCH 3/3] s390: move thread_info into task_struct
This is the s390 variant of commit 15f4eae70d36 ("x86: Move thread_info into task_struct"). Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- arch/s390/Kconfig | 1 + arch/s390/include/asm/lowcore.h | 2 +- arch/s390/include/asm/thread_info.h | 11 arch/s390/kernel/asm-offsets.c | 17 + arch/s390/kernel/entry.S| 51 ++--- arch/s390/kernel/head64.S | 5 ++-- arch/s390/kernel/setup.c| 3 +-- arch/s390/kernel/smp.c | 1 - 8 files changed, 37 insertions(+), 54 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 426481d4cc86..a159dfc27b76 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -169,6 +169,7 @@ config S390 select OLD_SIGSUSPEND3 select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK select TTY select VIRT_CPU_ACCOUNTING select VIRT_TO_BUS diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h index 7b93b78f423c..27402de81047 100644 --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -95,7 +95,7 @@ struct lowcore { /* Current process. */ __u64 current_task; /* 0x0310 */ - __u64 thread_info;/* 0x0318 */ + __u8pad_0x318[0x320-0x318]; /* 0x0318 */ __u64 kernel_stack; /* 0x0320 */ /* Interrupt, panic and restart stack. */ diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index f15c0398c363..f5fc4efa9bec 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -30,10 +30,8 @@ * - if the contents of this structure are changed, the assembly constants must also be changed */ struct thread_info { - struct task_struct *task; /* main task structure */ unsigned long flags; /* low level flags */ unsigned long sys_call_table; /* System call table address */ - unsigned intcpu;/* current CPU */ int preempt_count; /* 0 => preemptable, <0 => BUG */ unsigned intsystem_call; __u64 user_timer; @@ -46,21 +44,12 @@ struct thread_info { */ #define INIT_THREAD_INFO(tsk) \ { \ - .task = , \ .flags = 0,\ - .cpu= 0,\ .preempt_count = INIT_PREEMPT_COUNT, \ } -#define init_thread_info (init_thread_union.thread_info) #define init_stack (init_thread_union.stack) -/* how to get the thread information struct from C */ -static inline struct thread_info *current_thread_info(void) -{ - return (struct thread_info *) S390_lowcore.thread_info; -} - void arch_release_task_struct(struct task_struct *tsk); int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c index f3df9e0a5dec..e6969ebd0d44 100644 --- a/arch/s390/kernel/asm-offsets.c +++ b/arch/s390/kernel/asm-offsets.c @@ -25,7 +25,7 @@ int main(void) { /* task struct offsets */ - OFFSET(__TASK_thread_info, task_struct, stack); + OFFSET(__TASK_stack, task_struct, stack); OFFSET(__TASK_thread, task_struct, thread); OFFSET(__TASK_pid, task_struct, pid); BLANK(); @@ -39,14 +39,12 @@ int main(void) OFFSET(__THREAD_trap_tdb, thread_struct, trap_tdb); BLANK(); /* thread info offsets */ - OFFSET(__TI_task, thread_info, task); - OFFSET(__TI_flags, thread_info, flags); - OFFSET(__TI_sysc_table, thread_info, sys_call_table); - OFFSET(__TI_cpu, thread_info, cpu); - OFFSET(__TI_precount, thread_info, preempt_count); - OFFSET(__TI_user_timer, thread_info, user_timer); - OFFSET(__TI_system_timer, thread_info, system_timer); - OFFSET(__TI_last_break, thread_info, last_break); + OFFSET(__TASK_TI_flags, task_struct, thread_info.flags); + OFFSET(__TASK_TI_sysc_table, task_struct, thread_info.sys_call_table); + OFFSET(__TASK_TI_precount, task_struct, thread_info.preempt_count); + OFFSET(__TASK_TI_user_timer, task_struct, thread_info.user_timer); + OFFSET(__TASK_TI_system_timer, task_struct, thread_info.system_timer); + OFFSET(__TASK_TI_last_break, task_struct, thread_info.last_break); BLANK(); /* pt_regs offsets */ OFFSET(__PT_ARGS, pt_regs, args); @@ -159,7 +157,6 @@ int main(void) OFFSET(__LC_INT_CLOCK, lowcore, int_clock); OFFSET(__LC_MCCK_CLOCK, lowcore, mcck_clock); OFFSET(__LC_CURRENT, lowcore, curr
[PATCH 3/3] s390: move thread_info into task_struct
This is the s390 variant of commit 15f4eae70d36 ("x86: Move thread_info into task_struct"). Signed-off-by: Heiko Carstens --- arch/s390/Kconfig | 1 + arch/s390/include/asm/lowcore.h | 2 +- arch/s390/include/asm/thread_info.h | 11 arch/s390/kernel/asm-offsets.c | 17 + arch/s390/kernel/entry.S| 51 ++--- arch/s390/kernel/head64.S | 5 ++-- arch/s390/kernel/setup.c| 3 +-- arch/s390/kernel/smp.c | 1 - 8 files changed, 37 insertions(+), 54 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 426481d4cc86..a159dfc27b76 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -169,6 +169,7 @@ config S390 select OLD_SIGSUSPEND3 select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE + select THREAD_INFO_IN_TASK select TTY select VIRT_CPU_ACCOUNTING select VIRT_TO_BUS diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h index 7b93b78f423c..27402de81047 100644 --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -95,7 +95,7 @@ struct lowcore { /* Current process. */ __u64 current_task; /* 0x0310 */ - __u64 thread_info;/* 0x0318 */ + __u8pad_0x318[0x320-0x318]; /* 0x0318 */ __u64 kernel_stack; /* 0x0320 */ /* Interrupt, panic and restart stack. */ diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index f15c0398c363..f5fc4efa9bec 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -30,10 +30,8 @@ * - if the contents of this structure are changed, the assembly constants must also be changed */ struct thread_info { - struct task_struct *task; /* main task structure */ unsigned long flags; /* low level flags */ unsigned long sys_call_table; /* System call table address */ - unsigned intcpu;/* current CPU */ int preempt_count; /* 0 => preemptable, <0 => BUG */ unsigned intsystem_call; __u64 user_timer; @@ -46,21 +44,12 @@ struct thread_info { */ #define INIT_THREAD_INFO(tsk) \ { \ - .task = , \ .flags = 0,\ - .cpu= 0,\ .preempt_count = INIT_PREEMPT_COUNT, \ } -#define init_thread_info (init_thread_union.thread_info) #define init_stack (init_thread_union.stack) -/* how to get the thread information struct from C */ -static inline struct thread_info *current_thread_info(void) -{ - return (struct thread_info *) S390_lowcore.thread_info; -} - void arch_release_task_struct(struct task_struct *tsk); int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c index f3df9e0a5dec..e6969ebd0d44 100644 --- a/arch/s390/kernel/asm-offsets.c +++ b/arch/s390/kernel/asm-offsets.c @@ -25,7 +25,7 @@ int main(void) { /* task struct offsets */ - OFFSET(__TASK_thread_info, task_struct, stack); + OFFSET(__TASK_stack, task_struct, stack); OFFSET(__TASK_thread, task_struct, thread); OFFSET(__TASK_pid, task_struct, pid); BLANK(); @@ -39,14 +39,12 @@ int main(void) OFFSET(__THREAD_trap_tdb, thread_struct, trap_tdb); BLANK(); /* thread info offsets */ - OFFSET(__TI_task, thread_info, task); - OFFSET(__TI_flags, thread_info, flags); - OFFSET(__TI_sysc_table, thread_info, sys_call_table); - OFFSET(__TI_cpu, thread_info, cpu); - OFFSET(__TI_precount, thread_info, preempt_count); - OFFSET(__TI_user_timer, thread_info, user_timer); - OFFSET(__TI_system_timer, thread_info, system_timer); - OFFSET(__TI_last_break, thread_info, last_break); + OFFSET(__TASK_TI_flags, task_struct, thread_info.flags); + OFFSET(__TASK_TI_sysc_table, task_struct, thread_info.sys_call_table); + OFFSET(__TASK_TI_precount, task_struct, thread_info.preempt_count); + OFFSET(__TASK_TI_user_timer, task_struct, thread_info.user_timer); + OFFSET(__TASK_TI_system_timer, task_struct, thread_info.system_timer); + OFFSET(__TASK_TI_last_break, task_struct, thread_info.last_break); BLANK(); /* pt_regs offsets */ OFFSET(__PT_ARGS, pt_regs, args); @@ -159,7 +157,6 @@ int main(void) OFFSET(__LC_INT_CLOCK, lowcore, int_clock); OFFSET(__LC_MCCK_CLOCK, lowcore, mcck_clock); OFFSET(__LC_CURRENT, lowcore, current_task); - OFFSET(__LC_T
[PATCH 2/3] sched/preempt: include asm/current.h
The generic preempt code needs to include . Otherwise compilation fails if THREAD_INFO_IN_TASK is selected and the generic preempt code is used: ./include/linux/thread_info.h:17:54: error: 'current' undeclared (first use in this function) #define current_thread_info() ((struct thread_info *)current) Signed-off-by: Heiko Carstens <heiko.carst...@de.ibm.com> --- include/asm-generic/preempt.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h index c1cde3577551..66fcd6cd7fc6 100644 --- a/include/asm-generic/preempt.h +++ b/include/asm-generic/preempt.h @@ -2,6 +2,7 @@ #define __ASM_PREEMPT_H #include +#include #define PREEMPT_ENABLED(0) -- 2.8.4