Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/09/2012 07:37 AM, Sasha Levin wrote: > On 11/08/2012 01:51 AM, Glauber Costa wrote: >> On 11/07/2012 04:53 PM, Sasha Levin wrote: >>> On 11/01/2012 08:07 AM, Glauber Costa wrote: SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a ->show() in the root cache, and then ->store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa CC: Christoph Lameter CC: Pekka Enberg CC: Michal Hocko CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Suleiman Souhlal CC: Tejun Heo --- >>> >>> Hi guys, >>> >>> This patch is making lockdep angry! *bark bark* >>> >>> [ 351.935003] == >>> [ 351.937693] [ INFO: possible circular locking dependency detected ] >>> [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G >>>W >>> [ 351.942444] --- >>> [ 351.943528] trinity-child13/6961 is trying to acquire lock: >>> [ 351.943528] (s_active#43){.+}, at: [] >>> sysfs_addrm_finish+0x31/0x60 >>> [ 351.943528] >>> [ 351.943528] but task is already holding lock: >>> [ 351.943528] (slab_mutex){+.+.+.}, at: [] >>> kmem_cache_destroy+0x22/0xe0 >>> [ 351.943528] >>> [ 351.943528] which lock already depends on the new lock. >>> [ 351.943528] >>> [ 351.943528] >>> [ 351.943528] the existing dependency chain (in reverse order) is: >>> [ 351.943528] >>> -> #1 (slab_mutex){+.+.+.}: >>> [ 351.960334][] lock_acquire+0x1aa/0x240 >>> [ 351.960334][] __mutex_lock_common+0x59/0x5a0 >>> [ 351.960334][] mutex_lock_nested+0x3f/0x50 >>> [ 351.960334][] slab_attr_store+0xde/0x110 >>> [ 351.960334][] sysfs_write_file+0xfa/0x150 >>> [ 351.960334][] vfs_write+0xb0/0x180 >>> [ 351.960334][] sys_pwrite64+0x60/0xb0 >>> [ 351.960334][] tracesys+0xe1/0xe6 >>> [ 351.960334] >>> -> #0 (s_active#43){.+}: >>> [ 351.960334][] __lock_acquire+0x14df/0x1ca0 >>> [ 351.960334][] lock_acquire+0x1aa/0x240 >>> [ 351.960334][] sysfs_deactivate+0x122/0x1a0 >>> [ 351.960334][] sysfs_addrm_finish+0x31/0x60 >>> [ 351.960334][] sysfs_remove_dir+0x89/0xd0 >>> [ 351.960334][] kobject_del+0x16/0x40 >>> [ 351.960334][] __kmem_cache_shutdown+0x40/0x60 >>> [ 351.960334][] kmem_cache_destroy+0x40/0xe0 >>> [ 351.960334][] mon_text_release+0x78/0xe0 >>> [ 351.960334][] __fput+0x122/0x2d0 >>> [ 351.960334][] fput+0x9/0x10 >>> [ 351.960334][] task_work_run+0xbe/0x100 >>> [ 351.960334][] do_exit+0x432/0xbd0 >>> [ 351.960334][] do_group_exit+0x84/0xd0 >>> [ 351.960334][] get_signal_to_deliver+0x81d/0x930 >>> [ 351.960334][] do_signal+0x3a/0x950 >>> [ 351.960334][] do_notify_resume+0x3e/0x90 >>> [ 351.960334][] int_signal+0x12/0x17 >>> [ 351.960334] First: Sorry I took so long, I had some problems in my way back from Spain... I just managed to reproduce it, by following the callchain. In summary: 1) when we store an attribute, we will call sysfs_get_active(), that will hold the sd->dep_map lock, where 'sd' is the specific dirent. 2) ->store() is called with that held. 3) ->store() will hold the slab_mutex 4) While destroying the cache, with the slab_mutex held, we will eventually get to kobject_put(), that deep down in the callchain will resort to sysfs_addrm_finish, that can hold that lock again. In summary, creating a kmem limited memcg, storing an argument in the global cache, and then deleting the memcg should trigger this. The funny thing is that I had a test exactly like this in which it didn't trigger, and now I know why: I was storing attributes for "dentry", which can stay around for longer until it completely runs out of objects, which will depend on the vmscan
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/09/2012 07:37 AM, Sasha Levin wrote: On 11/08/2012 01:51 AM, Glauber Costa wrote: On 11/07/2012 04:53 PM, Sasha Levin wrote: On 11/01/2012 08:07 AM, Glauber Costa wrote: SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a -show() in the root cache, and then -store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org --- Hi guys, This patch is making lockdep angry! *bark bark* [ 351.935003] == [ 351.937693] [ INFO: possible circular locking dependency detected ] [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G W [ 351.942444] --- [ 351.943528] trinity-child13/6961 is trying to acquire lock: [ 351.943528] (s_active#43){.+}, at: [812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.943528] [ 351.943528] but task is already holding lock: [ 351.943528] (slab_mutex){+.+.+.}, at: [81228a42] kmem_cache_destroy+0x22/0xe0 [ 351.943528] [ 351.943528] which lock already depends on the new lock. [ 351.943528] [ 351.943528] [ 351.943528] the existing dependency chain (in reverse order) is: [ 351.943528] - #1 (slab_mutex){+.+.+.}: [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][83a944d9] __mutex_lock_common+0x59/0x5a0 [ 351.960334][83a94a5f] mutex_lock_nested+0x3f/0x50 [ 351.960334][81256a6e] slab_attr_store+0xde/0x110 [ 351.960334][812f820a] sysfs_write_file+0xfa/0x150 [ 351.960334][8127a220] vfs_write+0xb0/0x180 [ 351.960334][8127a540] sys_pwrite64+0x60/0xb0 [ 351.960334][83a99298] tracesys+0xe1/0xe6 [ 351.960334] - #0 (s_active#43){.+}: [ 351.960334][811825af] __lock_acquire+0x14df/0x1ca0 [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][812f9272] sysfs_deactivate+0x122/0x1a0 [ 351.960334][812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.960334][812fa369] sysfs_remove_dir+0x89/0xd0 [ 351.960334][819e1d96] kobject_del+0x16/0x40 [ 351.960334][8125ed40] __kmem_cache_shutdown+0x40/0x60 [ 351.960334][81228a60] kmem_cache_destroy+0x40/0xe0 [ 351.960334][82b21058] mon_text_release+0x78/0xe0 [ 351.960334][8127b3b2] __fput+0x122/0x2d0 [ 351.960334][8127b569] fput+0x9/0x10 [ 351.960334][81131b4e] task_work_run+0xbe/0x100 [ 351.960334][81110742] do_exit+0x432/0xbd0 [ 351.960334][81110fa4] do_group_exit+0x84/0xd0 [ 351.960334][8112431d] get_signal_to_deliver+0x81d/0x930 [ 351.960334][8106d5aa] do_signal+0x3a/0x950 [ 351.960334][8106df1e] do_notify_resume+0x3e/0x90 [ 351.960334][83a993aa] int_signal+0x12/0x17 [ 351.960334] First: Sorry I took so long, I had some problems in my way back from Spain... I just managed to reproduce it, by following the callchain. In summary: 1) when we store an attribute, we will call sysfs_get_active(), that will hold the sd-dep_map lock, where 'sd' is the specific dirent. 2) -store() is called with that held. 3) -store() will hold the slab_mutex 4) While destroying the cache, with the slab_mutex held, we will eventually get to kobject_put(), that deep down in the callchain will resort to sysfs_addrm_finish, that can hold that lock again. In summary, creating a kmem limited memcg, storing an argument in the global
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/08/2012 01:51 AM, Glauber Costa wrote: > On 11/07/2012 04:53 PM, Sasha Levin wrote: >> On 11/01/2012 08:07 AM, Glauber Costa wrote: >>> SLUB allows us to tune a particular cache behavior with sysfs-based >>> tunables. When creating a new memcg cache copy, we'd like to preserve >>> any tunables the parent cache already had. >>> >>> This can be done by tapping into the store attribute function provided >>> by the allocator. We of course don't need to mess with read-only >>> fields. Since the attributes can have multiple types and are stored >>> internally by sysfs, the best strategy is to issue a ->show() in the >>> root cache, and then ->store() in the memcg cache. >>> >>> The drawback of that, is that sysfs can allocate up to a page in >>> buffering for show(), that we are likely not to need, but also can't >>> guarantee. To avoid always allocating a page for that, we can update the >>> caches at store time with the maximum attribute size ever stored to the >>> root cache. We will then get a buffer big enough to hold it. The >>> corolary to this, is that if no stores happened, nothing will be >>> propagated. >>> >>> It can also happen that a root cache has its tunables updated during >>> normal system operation. In this case, we will propagate the change to >>> all caches that are already active. >>> >>> Signed-off-by: Glauber Costa >>> CC: Christoph Lameter >>> CC: Pekka Enberg >>> CC: Michal Hocko >>> CC: Kamezawa Hiroyuki >>> CC: Johannes Weiner >>> CC: Suleiman Souhlal >>> CC: Tejun Heo >>> --- >> >> Hi guys, >> >> This patch is making lockdep angry! *bark bark* >> >> [ 351.935003] == >> [ 351.937693] [ INFO: possible circular locking dependency detected ] >> [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G >> W >> [ 351.942444] --- >> [ 351.943528] trinity-child13/6961 is trying to acquire lock: >> [ 351.943528] (s_active#43){.+}, at: [] >> sysfs_addrm_finish+0x31/0x60 >> [ 351.943528] >> [ 351.943528] but task is already holding lock: >> [ 351.943528] (slab_mutex){+.+.+.}, at: [] >> kmem_cache_destroy+0x22/0xe0 >> [ 351.943528] >> [ 351.943528] which lock already depends on the new lock. >> [ 351.943528] >> [ 351.943528] >> [ 351.943528] the existing dependency chain (in reverse order) is: >> [ 351.943528] >> -> #1 (slab_mutex){+.+.+.}: >> [ 351.960334][] lock_acquire+0x1aa/0x240 >> [ 351.960334][] __mutex_lock_common+0x59/0x5a0 >> [ 351.960334][] mutex_lock_nested+0x3f/0x50 >> [ 351.960334][] slab_attr_store+0xde/0x110 >> [ 351.960334][] sysfs_write_file+0xfa/0x150 >> [ 351.960334][] vfs_write+0xb0/0x180 >> [ 351.960334][] sys_pwrite64+0x60/0xb0 >> [ 351.960334][] tracesys+0xe1/0xe6 >> [ 351.960334] >> -> #0 (s_active#43){.+}: >> [ 351.960334][] __lock_acquire+0x14df/0x1ca0 >> [ 351.960334][] lock_acquire+0x1aa/0x240 >> [ 351.960334][] sysfs_deactivate+0x122/0x1a0 >> [ 351.960334][] sysfs_addrm_finish+0x31/0x60 >> [ 351.960334][] sysfs_remove_dir+0x89/0xd0 >> [ 351.960334][] kobject_del+0x16/0x40 >> [ 351.960334][] __kmem_cache_shutdown+0x40/0x60 >> [ 351.960334][] kmem_cache_destroy+0x40/0xe0 >> [ 351.960334][] mon_text_release+0x78/0xe0 >> [ 351.960334][] __fput+0x122/0x2d0 >> [ 351.960334][] fput+0x9/0x10 >> [ 351.960334][] task_work_run+0xbe/0x100 >> [ 351.960334][] do_exit+0x432/0xbd0 >> [ 351.960334][] do_group_exit+0x84/0xd0 >> [ 351.960334][] get_signal_to_deliver+0x81d/0x930 >> [ 351.960334][] do_signal+0x3a/0x950 >> [ 351.960334][] do_notify_resume+0x3e/0x90 >> [ 351.960334][] int_signal+0x12/0x17 >> [ 351.960334] >> [ 351.960334] other info that might help us debug this: >> [ 351.960334] >> [ 351.960334] Possible unsafe locking scenario: >> [ 351.960334] >> [ 351.960334]CPU0CPU1 >> [ 351.960334] >> [ 351.960334] lock(slab_mutex); >> [ 351.960334]lock(s_active#43); >> [ 351.960334]lock(slab_mutex); >> [ 351.960334] lock(s_active#43); >> [ 351.960334] >> [ 351.960334] *** DEADLOCK *** >> [ 351.960334] >> [ 351.960334] 2 locks held by trinity-child13/6961: >> [ 351.960334] #0: (mon_lock){+.+.+.}, at: [] >> mon_text_release+0x25/0xe0 >> [ 351.960334] #1: (slab_mutex){+.+.+.}, at: [] >> kmem_cache_destroy+0x22/0xe0 >> [ 351.960334] >> [ 351.960334] stack backtrace: >> [ 351.960334] Pid: 6961, comm: trinity-child13 Tainted: GW >> 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 >> [ 351.960334] Call Trace: >> [ 351.960334] [] print_circular_bug+0x1fb/0x20c >> [ 351.960334] [] __lock_acquire+0x14df/0x1ca0 >>
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/08/2012 01:51 AM, Glauber Costa wrote: On 11/07/2012 04:53 PM, Sasha Levin wrote: On 11/01/2012 08:07 AM, Glauber Costa wrote: SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a -show() in the root cache, and then -store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org --- Hi guys, This patch is making lockdep angry! *bark bark* [ 351.935003] == [ 351.937693] [ INFO: possible circular locking dependency detected ] [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G W [ 351.942444] --- [ 351.943528] trinity-child13/6961 is trying to acquire lock: [ 351.943528] (s_active#43){.+}, at: [812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.943528] [ 351.943528] but task is already holding lock: [ 351.943528] (slab_mutex){+.+.+.}, at: [81228a42] kmem_cache_destroy+0x22/0xe0 [ 351.943528] [ 351.943528] which lock already depends on the new lock. [ 351.943528] [ 351.943528] [ 351.943528] the existing dependency chain (in reverse order) is: [ 351.943528] - #1 (slab_mutex){+.+.+.}: [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][83a944d9] __mutex_lock_common+0x59/0x5a0 [ 351.960334][83a94a5f] mutex_lock_nested+0x3f/0x50 [ 351.960334][81256a6e] slab_attr_store+0xde/0x110 [ 351.960334][812f820a] sysfs_write_file+0xfa/0x150 [ 351.960334][8127a220] vfs_write+0xb0/0x180 [ 351.960334][8127a540] sys_pwrite64+0x60/0xb0 [ 351.960334][83a99298] tracesys+0xe1/0xe6 [ 351.960334] - #0 (s_active#43){.+}: [ 351.960334][811825af] __lock_acquire+0x14df/0x1ca0 [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][812f9272] sysfs_deactivate+0x122/0x1a0 [ 351.960334][812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.960334][812fa369] sysfs_remove_dir+0x89/0xd0 [ 351.960334][819e1d96] kobject_del+0x16/0x40 [ 351.960334][8125ed40] __kmem_cache_shutdown+0x40/0x60 [ 351.960334][81228a60] kmem_cache_destroy+0x40/0xe0 [ 351.960334][82b21058] mon_text_release+0x78/0xe0 [ 351.960334][8127b3b2] __fput+0x122/0x2d0 [ 351.960334][8127b569] fput+0x9/0x10 [ 351.960334][81131b4e] task_work_run+0xbe/0x100 [ 351.960334][81110742] do_exit+0x432/0xbd0 [ 351.960334][81110fa4] do_group_exit+0x84/0xd0 [ 351.960334][8112431d] get_signal_to_deliver+0x81d/0x930 [ 351.960334][8106d5aa] do_signal+0x3a/0x950 [ 351.960334][8106df1e] do_notify_resume+0x3e/0x90 [ 351.960334][83a993aa] int_signal+0x12/0x17 [ 351.960334] [ 351.960334] other info that might help us debug this: [ 351.960334] [ 351.960334] Possible unsafe locking scenario: [ 351.960334] [ 351.960334]CPU0CPU1 [ 351.960334] [ 351.960334] lock(slab_mutex); [ 351.960334]lock(s_active#43); [ 351.960334]lock(slab_mutex); [ 351.960334] lock(s_active#43); [ 351.960334] [ 351.960334] *** DEADLOCK *** [ 351.960334] [ 351.960334] 2 locks held by trinity-child13/6961: [ 351.960334] #0: (mon_lock){+.+.+.}, at: [82b21005] mon_text_release+0x25/0xe0 [ 351.960334] #1:
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/07/2012 04:53 PM, Sasha Levin wrote: > On 11/01/2012 08:07 AM, Glauber Costa wrote: >> SLUB allows us to tune a particular cache behavior with sysfs-based >> tunables. When creating a new memcg cache copy, we'd like to preserve >> any tunables the parent cache already had. >> >> This can be done by tapping into the store attribute function provided >> by the allocator. We of course don't need to mess with read-only >> fields. Since the attributes can have multiple types and are stored >> internally by sysfs, the best strategy is to issue a ->show() in the >> root cache, and then ->store() in the memcg cache. >> >> The drawback of that, is that sysfs can allocate up to a page in >> buffering for show(), that we are likely not to need, but also can't >> guarantee. To avoid always allocating a page for that, we can update the >> caches at store time with the maximum attribute size ever stored to the >> root cache. We will then get a buffer big enough to hold it. The >> corolary to this, is that if no stores happened, nothing will be >> propagated. >> >> It can also happen that a root cache has its tunables updated during >> normal system operation. In this case, we will propagate the change to >> all caches that are already active. >> >> Signed-off-by: Glauber Costa >> CC: Christoph Lameter >> CC: Pekka Enberg >> CC: Michal Hocko >> CC: Kamezawa Hiroyuki >> CC: Johannes Weiner >> CC: Suleiman Souhlal >> CC: Tejun Heo >> --- > > Hi guys, > > This patch is making lockdep angry! *bark bark* > > [ 351.935003] == > [ 351.937693] [ INFO: possible circular locking dependency detected ] > [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G > W > [ 351.942444] --- > [ 351.943528] trinity-child13/6961 is trying to acquire lock: > [ 351.943528] (s_active#43){.+}, at: [] > sysfs_addrm_finish+0x31/0x60 > [ 351.943528] > [ 351.943528] but task is already holding lock: > [ 351.943528] (slab_mutex){+.+.+.}, at: [] > kmem_cache_destroy+0x22/0xe0 > [ 351.943528] > [ 351.943528] which lock already depends on the new lock. > [ 351.943528] > [ 351.943528] > [ 351.943528] the existing dependency chain (in reverse order) is: > [ 351.943528] > -> #1 (slab_mutex){+.+.+.}: > [ 351.960334][] lock_acquire+0x1aa/0x240 > [ 351.960334][] __mutex_lock_common+0x59/0x5a0 > [ 351.960334][] mutex_lock_nested+0x3f/0x50 > [ 351.960334][] slab_attr_store+0xde/0x110 > [ 351.960334][] sysfs_write_file+0xfa/0x150 > [ 351.960334][] vfs_write+0xb0/0x180 > [ 351.960334][] sys_pwrite64+0x60/0xb0 > [ 351.960334][] tracesys+0xe1/0xe6 > [ 351.960334] > -> #0 (s_active#43){.+}: > [ 351.960334][] __lock_acquire+0x14df/0x1ca0 > [ 351.960334][] lock_acquire+0x1aa/0x240 > [ 351.960334][] sysfs_deactivate+0x122/0x1a0 > [ 351.960334][] sysfs_addrm_finish+0x31/0x60 > [ 351.960334][] sysfs_remove_dir+0x89/0xd0 > [ 351.960334][] kobject_del+0x16/0x40 > [ 351.960334][] __kmem_cache_shutdown+0x40/0x60 > [ 351.960334][] kmem_cache_destroy+0x40/0xe0 > [ 351.960334][] mon_text_release+0x78/0xe0 > [ 351.960334][] __fput+0x122/0x2d0 > [ 351.960334][] fput+0x9/0x10 > [ 351.960334][] task_work_run+0xbe/0x100 > [ 351.960334][] do_exit+0x432/0xbd0 > [ 351.960334][] do_group_exit+0x84/0xd0 > [ 351.960334][] get_signal_to_deliver+0x81d/0x930 > [ 351.960334][] do_signal+0x3a/0x950 > [ 351.960334][] do_notify_resume+0x3e/0x90 > [ 351.960334][] int_signal+0x12/0x17 > [ 351.960334] > [ 351.960334] other info that might help us debug this: > [ 351.960334] > [ 351.960334] Possible unsafe locking scenario: > [ 351.960334] > [ 351.960334]CPU0CPU1 > [ 351.960334] > [ 351.960334] lock(slab_mutex); > [ 351.960334]lock(s_active#43); > [ 351.960334]lock(slab_mutex); > [ 351.960334] lock(s_active#43); > [ 351.960334] > [ 351.960334] *** DEADLOCK *** > [ 351.960334] > [ 351.960334] 2 locks held by trinity-child13/6961: > [ 351.960334] #0: (mon_lock){+.+.+.}, at: [] > mon_text_release+0x25/0xe0 > [ 351.960334] #1: (slab_mutex){+.+.+.}, at: [] > kmem_cache_destroy+0x22/0xe0 > [ 351.960334] > [ 351.960334] stack backtrace: > [ 351.960334] Pid: 6961, comm: trinity-child13 Tainted: GW > 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 > [ 351.960334] Call Trace: > [ 351.960334] [] print_circular_bug+0x1fb/0x20c > [ 351.960334] [] __lock_acquire+0x14df/0x1ca0 > [ 351.960334] [] ? debug_check_no_locks_freed+0x185/0x1e0 > [ 351.960334] [] lock_acquire+0x1aa/0x240 > [ 351.960334] [] ?
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/01/2012 08:07 AM, Glauber Costa wrote: > SLUB allows us to tune a particular cache behavior with sysfs-based > tunables. When creating a new memcg cache copy, we'd like to preserve > any tunables the parent cache already had. > > This can be done by tapping into the store attribute function provided > by the allocator. We of course don't need to mess with read-only > fields. Since the attributes can have multiple types and are stored > internally by sysfs, the best strategy is to issue a ->show() in the > root cache, and then ->store() in the memcg cache. > > The drawback of that, is that sysfs can allocate up to a page in > buffering for show(), that we are likely not to need, but also can't > guarantee. To avoid always allocating a page for that, we can update the > caches at store time with the maximum attribute size ever stored to the > root cache. We will then get a buffer big enough to hold it. The > corolary to this, is that if no stores happened, nothing will be > propagated. > > It can also happen that a root cache has its tunables updated during > normal system operation. In this case, we will propagate the change to > all caches that are already active. > > Signed-off-by: Glauber Costa > CC: Christoph Lameter > CC: Pekka Enberg > CC: Michal Hocko > CC: Kamezawa Hiroyuki > CC: Johannes Weiner > CC: Suleiman Souhlal > CC: Tejun Heo > --- Hi guys, This patch is making lockdep angry! *bark bark* [ 351.935003] == [ 351.937693] [ INFO: possible circular locking dependency detected ] [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G W [ 351.942444] --- [ 351.943528] trinity-child13/6961 is trying to acquire lock: [ 351.943528] (s_active#43){.+}, at: [] sysfs_addrm_finish+0x31/0x60 [ 351.943528] [ 351.943528] but task is already holding lock: [ 351.943528] (slab_mutex){+.+.+.}, at: [] kmem_cache_destroy+0x22/0xe0 [ 351.943528] [ 351.943528] which lock already depends on the new lock. [ 351.943528] [ 351.943528] [ 351.943528] the existing dependency chain (in reverse order) is: [ 351.943528] -> #1 (slab_mutex){+.+.+.}: [ 351.960334][] lock_acquire+0x1aa/0x240 [ 351.960334][] __mutex_lock_common+0x59/0x5a0 [ 351.960334][] mutex_lock_nested+0x3f/0x50 [ 351.960334][] slab_attr_store+0xde/0x110 [ 351.960334][] sysfs_write_file+0xfa/0x150 [ 351.960334][] vfs_write+0xb0/0x180 [ 351.960334][] sys_pwrite64+0x60/0xb0 [ 351.960334][] tracesys+0xe1/0xe6 [ 351.960334] -> #0 (s_active#43){.+}: [ 351.960334][] __lock_acquire+0x14df/0x1ca0 [ 351.960334][] lock_acquire+0x1aa/0x240 [ 351.960334][] sysfs_deactivate+0x122/0x1a0 [ 351.960334][] sysfs_addrm_finish+0x31/0x60 [ 351.960334][] sysfs_remove_dir+0x89/0xd0 [ 351.960334][] kobject_del+0x16/0x40 [ 351.960334][] __kmem_cache_shutdown+0x40/0x60 [ 351.960334][] kmem_cache_destroy+0x40/0xe0 [ 351.960334][] mon_text_release+0x78/0xe0 [ 351.960334][] __fput+0x122/0x2d0 [ 351.960334][] fput+0x9/0x10 [ 351.960334][] task_work_run+0xbe/0x100 [ 351.960334][] do_exit+0x432/0xbd0 [ 351.960334][] do_group_exit+0x84/0xd0 [ 351.960334][] get_signal_to_deliver+0x81d/0x930 [ 351.960334][] do_signal+0x3a/0x950 [ 351.960334][] do_notify_resume+0x3e/0x90 [ 351.960334][] int_signal+0x12/0x17 [ 351.960334] [ 351.960334] other info that might help us debug this: [ 351.960334] [ 351.960334] Possible unsafe locking scenario: [ 351.960334] [ 351.960334]CPU0CPU1 [ 351.960334] [ 351.960334] lock(slab_mutex); [ 351.960334]lock(s_active#43); [ 351.960334]lock(slab_mutex); [ 351.960334] lock(s_active#43); [ 351.960334] [ 351.960334] *** DEADLOCK *** [ 351.960334] [ 351.960334] 2 locks held by trinity-child13/6961: [ 351.960334] #0: (mon_lock){+.+.+.}, at: [] mon_text_release+0x25/0xe0 [ 351.960334] #1: (slab_mutex){+.+.+.}, at: [] kmem_cache_destroy+0x22/0xe0 [ 351.960334] [ 351.960334] stack backtrace: [ 351.960334] Pid: 6961, comm: trinity-child13 Tainted: GW 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 [ 351.960334] Call Trace: [ 351.960334] [] print_circular_bug+0x1fb/0x20c [ 351.960334] [] __lock_acquire+0x14df/0x1ca0 [ 351.960334] [] ? debug_check_no_locks_freed+0x185/0x1e0 [ 351.960334] [] lock_acquire+0x1aa/0x240 [ 351.960334] [] ? sysfs_addrm_finish+0x31/0x60 [ 351.960334] [] sysfs_deactivate+0x122/0x1a0 [ 351.960334] [] ? sysfs_addrm_finish+0x31/0x60 [ 351.960334] [] sysfs_addrm_finish+0x31/0x60 [ 351.960334] [] sysfs_remove_dir+0x89/0xd0 [ 351.960334] []
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/01/2012 08:07 AM, Glauber Costa wrote: SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a -show() in the root cache, and then -store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org --- Hi guys, This patch is making lockdep angry! *bark bark* [ 351.935003] == [ 351.937693] [ INFO: possible circular locking dependency detected ] [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G W [ 351.942444] --- [ 351.943528] trinity-child13/6961 is trying to acquire lock: [ 351.943528] (s_active#43){.+}, at: [812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.943528] [ 351.943528] but task is already holding lock: [ 351.943528] (slab_mutex){+.+.+.}, at: [81228a42] kmem_cache_destroy+0x22/0xe0 [ 351.943528] [ 351.943528] which lock already depends on the new lock. [ 351.943528] [ 351.943528] [ 351.943528] the existing dependency chain (in reverse order) is: [ 351.943528] - #1 (slab_mutex){+.+.+.}: [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][83a944d9] __mutex_lock_common+0x59/0x5a0 [ 351.960334][83a94a5f] mutex_lock_nested+0x3f/0x50 [ 351.960334][81256a6e] slab_attr_store+0xde/0x110 [ 351.960334][812f820a] sysfs_write_file+0xfa/0x150 [ 351.960334][8127a220] vfs_write+0xb0/0x180 [ 351.960334][8127a540] sys_pwrite64+0x60/0xb0 [ 351.960334][83a99298] tracesys+0xe1/0xe6 [ 351.960334] - #0 (s_active#43){.+}: [ 351.960334][811825af] __lock_acquire+0x14df/0x1ca0 [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][812f9272] sysfs_deactivate+0x122/0x1a0 [ 351.960334][812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.960334][812fa369] sysfs_remove_dir+0x89/0xd0 [ 351.960334][819e1d96] kobject_del+0x16/0x40 [ 351.960334][8125ed40] __kmem_cache_shutdown+0x40/0x60 [ 351.960334][81228a60] kmem_cache_destroy+0x40/0xe0 [ 351.960334][82b21058] mon_text_release+0x78/0xe0 [ 351.960334][8127b3b2] __fput+0x122/0x2d0 [ 351.960334][8127b569] fput+0x9/0x10 [ 351.960334][81131b4e] task_work_run+0xbe/0x100 [ 351.960334][81110742] do_exit+0x432/0xbd0 [ 351.960334][81110fa4] do_group_exit+0x84/0xd0 [ 351.960334][8112431d] get_signal_to_deliver+0x81d/0x930 [ 351.960334][8106d5aa] do_signal+0x3a/0x950 [ 351.960334][8106df1e] do_notify_resume+0x3e/0x90 [ 351.960334][83a993aa] int_signal+0x12/0x17 [ 351.960334] [ 351.960334] other info that might help us debug this: [ 351.960334] [ 351.960334] Possible unsafe locking scenario: [ 351.960334] [ 351.960334]CPU0CPU1 [ 351.960334] [ 351.960334] lock(slab_mutex); [ 351.960334]lock(s_active#43); [ 351.960334]lock(slab_mutex); [ 351.960334] lock(s_active#43); [ 351.960334] [ 351.960334] *** DEADLOCK *** [ 351.960334] [ 351.960334] 2 locks held by trinity-child13/6961: [ 351.960334] #0: (mon_lock){+.+.+.}, at: [82b21005] mon_text_release+0x25/0xe0 [ 351.960334] #1: (slab_mutex){+.+.+.}, at: [81228a42] kmem_cache_destroy+0x22/0xe0 [ 351.960334] [ 351.960334] stack backtrace: [ 351.960334] Pid: 6961, comm:
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On 11/07/2012 04:53 PM, Sasha Levin wrote: On 11/01/2012 08:07 AM, Glauber Costa wrote: SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a -show() in the root cache, and then -store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org --- Hi guys, This patch is making lockdep angry! *bark bark* [ 351.935003] == [ 351.937693] [ INFO: possible circular locking dependency detected ] [ 351.939720] 3.7.0-rc4-next-20121106-sasha-8-g353b62f #117 Tainted: G W [ 351.942444] --- [ 351.943528] trinity-child13/6961 is trying to acquire lock: [ 351.943528] (s_active#43){.+}, at: [812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.943528] [ 351.943528] but task is already holding lock: [ 351.943528] (slab_mutex){+.+.+.}, at: [81228a42] kmem_cache_destroy+0x22/0xe0 [ 351.943528] [ 351.943528] which lock already depends on the new lock. [ 351.943528] [ 351.943528] [ 351.943528] the existing dependency chain (in reverse order) is: [ 351.943528] - #1 (slab_mutex){+.+.+.}: [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][83a944d9] __mutex_lock_common+0x59/0x5a0 [ 351.960334][83a94a5f] mutex_lock_nested+0x3f/0x50 [ 351.960334][81256a6e] slab_attr_store+0xde/0x110 [ 351.960334][812f820a] sysfs_write_file+0xfa/0x150 [ 351.960334][8127a220] vfs_write+0xb0/0x180 [ 351.960334][8127a540] sys_pwrite64+0x60/0xb0 [ 351.960334][83a99298] tracesys+0xe1/0xe6 [ 351.960334] - #0 (s_active#43){.+}: [ 351.960334][811825af] __lock_acquire+0x14df/0x1ca0 [ 351.960334][8118536a] lock_acquire+0x1aa/0x240 [ 351.960334][812f9272] sysfs_deactivate+0x122/0x1a0 [ 351.960334][812f9e11] sysfs_addrm_finish+0x31/0x60 [ 351.960334][812fa369] sysfs_remove_dir+0x89/0xd0 [ 351.960334][819e1d96] kobject_del+0x16/0x40 [ 351.960334][8125ed40] __kmem_cache_shutdown+0x40/0x60 [ 351.960334][81228a60] kmem_cache_destroy+0x40/0xe0 [ 351.960334][82b21058] mon_text_release+0x78/0xe0 [ 351.960334][8127b3b2] __fput+0x122/0x2d0 [ 351.960334][8127b569] fput+0x9/0x10 [ 351.960334][81131b4e] task_work_run+0xbe/0x100 [ 351.960334][81110742] do_exit+0x432/0xbd0 [ 351.960334][81110fa4] do_group_exit+0x84/0xd0 [ 351.960334][8112431d] get_signal_to_deliver+0x81d/0x930 [ 351.960334][8106d5aa] do_signal+0x3a/0x950 [ 351.960334][8106df1e] do_notify_resume+0x3e/0x90 [ 351.960334][83a993aa] int_signal+0x12/0x17 [ 351.960334] [ 351.960334] other info that might help us debug this: [ 351.960334] [ 351.960334] Possible unsafe locking scenario: [ 351.960334] [ 351.960334]CPU0CPU1 [ 351.960334] [ 351.960334] lock(slab_mutex); [ 351.960334]lock(s_active#43); [ 351.960334]lock(slab_mutex); [ 351.960334] lock(s_active#43); [ 351.960334] [ 351.960334] *** DEADLOCK *** [ 351.960334] [ 351.960334] 2 locks held by trinity-child13/6961: [ 351.960334] #0: (mon_lock){+.+.+.}, at: [82b21005] mon_text_release+0x25/0xe0 [ 351.960334] #1: (slab_mutex){+.+.+.}, at: [81228a42]
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On Thu, 1 Nov 2012 16:07:44 +0400 Glauber Costa wrote: > SLUB allows us to tune a particular cache behavior with sysfs-based > tunables. When creating a new memcg cache copy, we'd like to preserve > any tunables the parent cache already had. > > This can be done by tapping into the store attribute function provided > by the allocator. We of course don't need to mess with read-only > fields. Since the attributes can have multiple types and are stored > internally by sysfs, the best strategy is to issue a ->show() in the > root cache, and then ->store() in the memcg cache. > > The drawback of that, is that sysfs can allocate up to a page in > buffering for show(), that we are likely not to need, but also can't > guarantee. To avoid always allocating a page for that, we can update the > caches at store time with the maximum attribute size ever stored to the > root cache. We will then get a buffer big enough to hold it. The > corolary to this, is that if no stores happened, nothing will be > propagated. > > It can also happen that a root cache has its tunables updated during > normal system operation. In this case, we will propagate the change to > all caches that are already active. > > ... > > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3955,6 +3956,7 @@ int __kmem_cache_create(struct kmem_cache *s, unsigned > long flags) > if (err) > return err; > > + memcg_propagate_slab_attrs(s); > mutex_unlock(_mutex); > err = sysfs_slab_add(s); > mutex_lock(_mutex); > @@ -5180,6 +5182,7 @@ static ssize_t slab_attr_store(struct kobject *kobj, > struct slab_attribute *attribute; > struct kmem_cache *s; > int err; > + int i __maybe_unused; > > attribute = to_slab_attr(attr); > s = to_slab(kobj); > @@ -5188,10 +5191,81 @@ static ssize_t slab_attr_store(struct kobject *kobj, > return -EIO; > > err = attribute->store(s, buf, len); > +#ifdef CONFIG_MEMCG_KMEM > + if (slab_state < FULL) > + return err; > > + if ((err < 0) || !is_root_cache(s)) > + return err; > + > + mutex_lock(_mutex); > + if (s->max_attr_size < len) > + s->max_attr_size = len; > + > + for_each_memcg_cache_index(i) { > + struct kmem_cache *c = cache_from_memcg(s, i); > + if (c) > + /* return value determined by the parent cache only */ > + attribute->store(c, buf, len); > + } > + mutex_unlock(_mutex); > +#endif > return err; > } hm, __maybe_unused is an ugly thing. We can avoid it by tweaking the code a bit: diff -puN mm/slub.c~slub-slub-specific-propagation-changes-fix mm/slub.c --- a/mm/slub.c~slub-slub-specific-propagation-changes-fix +++ a/mm/slub.c @@ -5175,7 +5175,6 @@ static ssize_t slab_attr_store(struct ko struct slab_attribute *attribute; struct kmem_cache *s; int err; - int i __maybe_unused; attribute = to_slab_attr(attr); s = to_slab(kobj); @@ -5185,23 +5184,24 @@ static ssize_t slab_attr_store(struct ko err = attribute->store(s, buf, len); #ifdef CONFIG_MEMCG_KMEM - if (slab_state < FULL) - return err; + if (slab_state >= FULL && err >= 0 && is_root_cache(s)) { + int i; - if ((err < 0) || !is_root_cache(s)) - return err; - - mutex_lock(_mutex); - if (s->max_attr_size < len) - s->max_attr_size = len; - - for_each_memcg_cache_index(i) { - struct kmem_cache *c = cache_from_memcg(s, i); - if (c) - /* return value determined by the parent cache only */ - attribute->store(c, buf, len); + mutex_lock(_mutex); + if (s->max_attr_size < len) + s->max_attr_size = len; + + for_each_memcg_cache_index(i) { + struct kmem_cache *c = cache_from_memcg(s, i); + /* +* This function's return value is determined by the +* parent cache only +*/ + if (c) + attribute->store(c, buf, len); + } + mutex_unlock(_mutex); } - mutex_unlock(_mutex); #endif return err; } Also, the comment in there tells the reader *what the code does*, not *why it does it*. Why do we ignore the ->store return value for child caches? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 28/29] slub: slub-specific propagation changes.
On Thu, 1 Nov 2012 16:07:44 +0400 Glauber Costa glom...@parallels.com wrote: SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a -show() in the root cache, and then -store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. ... --- a/mm/slub.c +++ b/mm/slub.c @@ -3955,6 +3956,7 @@ int __kmem_cache_create(struct kmem_cache *s, unsigned long flags) if (err) return err; + memcg_propagate_slab_attrs(s); mutex_unlock(slab_mutex); err = sysfs_slab_add(s); mutex_lock(slab_mutex); @@ -5180,6 +5182,7 @@ static ssize_t slab_attr_store(struct kobject *kobj, struct slab_attribute *attribute; struct kmem_cache *s; int err; + int i __maybe_unused; attribute = to_slab_attr(attr); s = to_slab(kobj); @@ -5188,10 +5191,81 @@ static ssize_t slab_attr_store(struct kobject *kobj, return -EIO; err = attribute-store(s, buf, len); +#ifdef CONFIG_MEMCG_KMEM + if (slab_state FULL) + return err; + if ((err 0) || !is_root_cache(s)) + return err; + + mutex_lock(slab_mutex); + if (s-max_attr_size len) + s-max_attr_size = len; + + for_each_memcg_cache_index(i) { + struct kmem_cache *c = cache_from_memcg(s, i); + if (c) + /* return value determined by the parent cache only */ + attribute-store(c, buf, len); + } + mutex_unlock(slab_mutex); +#endif return err; } hm, __maybe_unused is an ugly thing. We can avoid it by tweaking the code a bit: diff -puN mm/slub.c~slub-slub-specific-propagation-changes-fix mm/slub.c --- a/mm/slub.c~slub-slub-specific-propagation-changes-fix +++ a/mm/slub.c @@ -5175,7 +5175,6 @@ static ssize_t slab_attr_store(struct ko struct slab_attribute *attribute; struct kmem_cache *s; int err; - int i __maybe_unused; attribute = to_slab_attr(attr); s = to_slab(kobj); @@ -5185,23 +5184,24 @@ static ssize_t slab_attr_store(struct ko err = attribute-store(s, buf, len); #ifdef CONFIG_MEMCG_KMEM - if (slab_state FULL) - return err; + if (slab_state = FULL err = 0 is_root_cache(s)) { + int i; - if ((err 0) || !is_root_cache(s)) - return err; - - mutex_lock(slab_mutex); - if (s-max_attr_size len) - s-max_attr_size = len; - - for_each_memcg_cache_index(i) { - struct kmem_cache *c = cache_from_memcg(s, i); - if (c) - /* return value determined by the parent cache only */ - attribute-store(c, buf, len); + mutex_lock(slab_mutex); + if (s-max_attr_size len) + s-max_attr_size = len; + + for_each_memcg_cache_index(i) { + struct kmem_cache *c = cache_from_memcg(s, i); + /* +* This function's return value is determined by the +* parent cache only +*/ + if (c) + attribute-store(c, buf, len); + } + mutex_unlock(slab_mutex); } - mutex_unlock(slab_mutex); #endif return err; } Also, the comment in there tells the reader *what the code does*, not *why it does it*. Why do we ignore the -store return value for child caches? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 28/29] slub: slub-specific propagation changes.
SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a ->show() in the root cache, and then ->store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa CC: Christoph Lameter CC: Pekka Enberg CC: Michal Hocko CC: Kamezawa Hiroyuki CC: Johannes Weiner CC: Suleiman Souhlal CC: Tejun Heo --- include/linux/slub_def.h | 1 + mm/slub.c| 76 +++- 2 files changed, 76 insertions(+), 1 deletion(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 364ba6c..9db4825 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -103,6 +103,7 @@ struct kmem_cache { #endif #ifdef CONFIG_MEMCG_KMEM struct memcg_cache_params *memcg_params; + int max_attr_size; /* for propagation, maximum size of a stored attr */ #endif #ifdef CONFIG_NUMA diff --git a/mm/slub.c b/mm/slub.c index ffc8ede..6c39af8 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -203,13 +203,14 @@ enum track_item { TRACK_ALLOC, TRACK_FREE }; static int sysfs_slab_add(struct kmem_cache *); static int sysfs_slab_alias(struct kmem_cache *, const char *); static void sysfs_slab_remove(struct kmem_cache *); - +static void memcg_propagate_slab_attrs(struct kmem_cache *s); #else static inline int sysfs_slab_add(struct kmem_cache *s) { return 0; } static inline int sysfs_slab_alias(struct kmem_cache *s, const char *p) { return 0; } static inline void sysfs_slab_remove(struct kmem_cache *s) { } +static inline void memcg_propagate_slab_attrs(struct kmem_cache *s) { } #endif static inline void stat(const struct kmem_cache *s, enum stat_item si) @@ -3955,6 +3956,7 @@ int __kmem_cache_create(struct kmem_cache *s, unsigned long flags) if (err) return err; + memcg_propagate_slab_attrs(s); mutex_unlock(_mutex); err = sysfs_slab_add(s); mutex_lock(_mutex); @@ -5180,6 +5182,7 @@ static ssize_t slab_attr_store(struct kobject *kobj, struct slab_attribute *attribute; struct kmem_cache *s; int err; + int i __maybe_unused; attribute = to_slab_attr(attr); s = to_slab(kobj); @@ -5188,10 +5191,81 @@ static ssize_t slab_attr_store(struct kobject *kobj, return -EIO; err = attribute->store(s, buf, len); +#ifdef CONFIG_MEMCG_KMEM + if (slab_state < FULL) + return err; + if ((err < 0) || !is_root_cache(s)) + return err; + + mutex_lock(_mutex); + if (s->max_attr_size < len) + s->max_attr_size = len; + + for_each_memcg_cache_index(i) { + struct kmem_cache *c = cache_from_memcg(s, i); + if (c) + /* return value determined by the parent cache only */ + attribute->store(c, buf, len); + } + mutex_unlock(_mutex); +#endif return err; } +static void memcg_propagate_slab_attrs(struct kmem_cache *s) +{ +#ifdef CONFIG_MEMCG_KMEM + int i; + char *buffer = NULL; + + if (!is_root_cache(s)) + return; + + /* +* This mean this cache had no attribute written. Therefore, no point +* in copying default values around +*/ + if (!s->max_attr_size) + return; + + for (i = 0; i < ARRAY_SIZE(slab_attrs); i++) { + char mbuf[64]; + char *buf; + struct slab_attribute *attr = to_slab_attr(slab_attrs[i]); + + if (!attr || !attr->store || !attr->show) + continue; + + /* +* It is really bad that we have to allocate here, so we will +* do it only as a fallback. If we actually allocate, though, +* we can just use the allocated buffer until the end. +* +* Most of the slub attributes will tend to be very small in +*
[PATCH v6 28/29] slub: slub-specific propagation changes.
SLUB allows us to tune a particular cache behavior with sysfs-based tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This can be done by tapping into the store attribute function provided by the allocator. We of course don't need to mess with read-only fields. Since the attributes can have multiple types and are stored internally by sysfs, the best strategy is to issue a -show() in the root cache, and then -store() in the memcg cache. The drawback of that, is that sysfs can allocate up to a page in buffering for show(), that we are likely not to need, but also can't guarantee. To avoid always allocating a page for that, we can update the caches at store time with the maximum attribute size ever stored to the root cache. We will then get a buffer big enough to hold it. The corolary to this, is that if no stores happened, nothing will be propagated. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org --- include/linux/slub_def.h | 1 + mm/slub.c| 76 +++- 2 files changed, 76 insertions(+), 1 deletion(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 364ba6c..9db4825 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -103,6 +103,7 @@ struct kmem_cache { #endif #ifdef CONFIG_MEMCG_KMEM struct memcg_cache_params *memcg_params; + int max_attr_size; /* for propagation, maximum size of a stored attr */ #endif #ifdef CONFIG_NUMA diff --git a/mm/slub.c b/mm/slub.c index ffc8ede..6c39af8 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -203,13 +203,14 @@ enum track_item { TRACK_ALLOC, TRACK_FREE }; static int sysfs_slab_add(struct kmem_cache *); static int sysfs_slab_alias(struct kmem_cache *, const char *); static void sysfs_slab_remove(struct kmem_cache *); - +static void memcg_propagate_slab_attrs(struct kmem_cache *s); #else static inline int sysfs_slab_add(struct kmem_cache *s) { return 0; } static inline int sysfs_slab_alias(struct kmem_cache *s, const char *p) { return 0; } static inline void sysfs_slab_remove(struct kmem_cache *s) { } +static inline void memcg_propagate_slab_attrs(struct kmem_cache *s) { } #endif static inline void stat(const struct kmem_cache *s, enum stat_item si) @@ -3955,6 +3956,7 @@ int __kmem_cache_create(struct kmem_cache *s, unsigned long flags) if (err) return err; + memcg_propagate_slab_attrs(s); mutex_unlock(slab_mutex); err = sysfs_slab_add(s); mutex_lock(slab_mutex); @@ -5180,6 +5182,7 @@ static ssize_t slab_attr_store(struct kobject *kobj, struct slab_attribute *attribute; struct kmem_cache *s; int err; + int i __maybe_unused; attribute = to_slab_attr(attr); s = to_slab(kobj); @@ -5188,10 +5191,81 @@ static ssize_t slab_attr_store(struct kobject *kobj, return -EIO; err = attribute-store(s, buf, len); +#ifdef CONFIG_MEMCG_KMEM + if (slab_state FULL) + return err; + if ((err 0) || !is_root_cache(s)) + return err; + + mutex_lock(slab_mutex); + if (s-max_attr_size len) + s-max_attr_size = len; + + for_each_memcg_cache_index(i) { + struct kmem_cache *c = cache_from_memcg(s, i); + if (c) + /* return value determined by the parent cache only */ + attribute-store(c, buf, len); + } + mutex_unlock(slab_mutex); +#endif return err; } +static void memcg_propagate_slab_attrs(struct kmem_cache *s) +{ +#ifdef CONFIG_MEMCG_KMEM + int i; + char *buffer = NULL; + + if (!is_root_cache(s)) + return; + + /* +* This mean this cache had no attribute written. Therefore, no point +* in copying default values around +*/ + if (!s-max_attr_size) + return; + + for (i = 0; i ARRAY_SIZE(slab_attrs); i++) { + char mbuf[64]; + char *buf; + struct slab_attribute *attr = to_slab_attr(slab_attrs[i]); + + if (!attr || !attr-store || !attr-show) + continue; + + /* +* It is really bad that we have to allocate here, so we will +* do it only as a fallback. If we actually allocate, though, +* we can just