Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-16 Thread Yang Shi
On Wed, Dec 16, 2020 at 11:14 AM Johannes Weiner  wrote:
>
> On Wed, Dec 16, 2020 at 08:59:38AM +1100, Dave Chinner wrote:
> > On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
> > > On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> > > > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > > > > Since memcg_shrinker_map_size just can be changd under holding 
> > > > > shrinker_rwsem
> > > > > exclusively, the read side can be protected by holding read lock, so 
> > > > > it sounds
> > > > > superfluous to have a dedicated mutex.
> > > >
> > > > I'm not sure this is a good idea. This couples the shrinker
> > > > infrastructure to internal details of how cgroups are initialised
> > > > and managed. Sure, certain operations might be done in certain
> > > > shrinker lock contexts, but that doesn't mean we should share global
> > > > locks across otherwise independent subsystems
> > >
> > > They're not independent subsystems. Most of the memory controller is
> > > an extension of core VM operations that is fairly difficult to
> > > understand outside the context of those operations. Then there are a
> > > limited number of entry points from the cgroup interface. We used to
> > > have our own locks for core VM structures (private page lock e.g.) to
> > > coordinate VM and cgroup, and that was mostly unintelligble.
> >
> > Yes, but OTOH you can CONFIG_MEMCG=n and the shrinker infrastructure
> > and shrinkers all still functions correctly.  Ergo, the shrinker
> > infrastructure is independent of memcgs. Yes, it may have functions
> > to iterate and manipulate memcgs, but it is not dependent on memcgs
> > existing for correct behaviour and functionality.
>
> Okay, but now do it the other way round and explain the memcg bits in
> a world where shrinkers don't exist ;-)
>
> Anyway, we seem to be mostly in agreement below.
>
> > > We have since established that those two components coordinate with
> > > native VM locking and lifetime management. If you need to lock the
> > > page, you lock the page - instead of having all VM paths that already
> > > hold the page lock acquire a nested lock to exclude one cgroup path.
> > >
> > > In this case, we have auxiliary shrinker data, subject to shrinker
> > > lifetime and exclusion rules. It's much easier to understand that
> > > cgroup creation needs a stable shrinker list (shrinker_rwsem) to
> > > manage this data, than having an aliased lock that is private to the
> > > memcg callbacks and obscures this real interdependency.
> >
> > Ok, so the way to do this is to move all the stuff that needs to be
> > done under a "subsystem global" lock to the one file, not turn a
> > static lock into a globally visible lock and spray it around random
> > source files.
>
> Sure, that works as well.
>
> > The shrinker map should be generic functionality for all shrinker
> > invocations because even a non-memcg machine can have thousands of
> > registered shrinkers that are mostly idle all the time.
>
> Agreed.
>
> > IOWs, I think the shrinker map management is not really memcg
> > specific - it's just allocation and assignment of a structure, and
> > the only memcg bit is the map is being stored in a memcg structure.
> > Therefore, if we are looking towards tighter integration then we
> > should acutally move the map management to the shrinker code, not
> > split the shrinker infrastructure management across different files.
> > There's already a heap of code in vmscan.c under #ifdef
> > CONFIG_MEMCG, like the prealloc_shrinker() code path:
> >
> > prealloc_shrinker()   vmscan.c
> >   if (MEMCG_AWARE)vmscan.c
> > prealloc_memcg_shrinker   vmscan.c
> > #ifdef CONFIG_MEMCG   vmscan.c
> >   down_write(shrinker_rwsem)  vmscan.c
> >   if (id > shrinker_id_max)   vmscan.c
> >   memcg_expand_shrinker_maps  memcontrol.c
> > for_each_memcgmemcontrol.c
> >   reallocate shrinker map memcontrol.c
> >   replace shrinker mapmemcontrol.c
> >   shrinker_id_max = idvmscan.c
> >   down_write(shrinker_rwsem)  vmscan.c
> > #endif
> >
> > And, really, there's very little code in memcg_expand_shrinker_maps()
> > here - the only memcg part is the memcg iteration loop, and we
> > already have them in vmscan.c (e.g. shrink_node_memcgs(),
> > age_active_anon(), drop_slab_node()) so there's precedence for
> > moving this memcg iteration for shrinker map management all into
> > vmscan.c.
> >
> > Doing so would formalise the shrinker maps as first class shrinker
> > infrastructure rather than being tacked on to the side of the memcg
> > infrastructure. At this point it makes total sense to serialise map
> > manipulations under the shrinker_rwsem.
>
> Yes, that's a great idea.
>
> > That is, for the medium term, 

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-16 Thread Roman Gushchin
On Wed, Dec 16, 2020 at 08:59:38AM +1100, Dave Chinner wrote:
> On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
> > On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> > > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > > > Since memcg_shrinker_map_size just can be changd under holding 
> > > > shrinker_rwsem
> > > > exclusively, the read side can be protected by holding read lock, so it 
> > > > sounds
> > > > superfluous to have a dedicated mutex.
> > > 
> > > I'm not sure this is a good idea. This couples the shrinker
> > > infrastructure to internal details of how cgroups are initialised
> > > and managed. Sure, certain operations might be done in certain
> > > shrinker lock contexts, but that doesn't mean we should share global
> > > locks across otherwise independent subsystems
> > 
> > They're not independent subsystems. Most of the memory controller is
> > an extension of core VM operations that is fairly difficult to
> > understand outside the context of those operations. Then there are a
> > limited number of entry points from the cgroup interface. We used to
> > have our own locks for core VM structures (private page lock e.g.) to
> > coordinate VM and cgroup, and that was mostly unintelligble.
> 
> Yes, but OTOH you can CONFIG_MEMCG=n and the shrinker infrastructure
> and shrinkers all still functions correctly.  Ergo, the shrinker
> infrastructure is independent of memcgs. Yes, it may have functions
> to iterate and manipulate memcgs, but it is not dependent on memcgs
> existing for correct behaviour and functionality.
> 
> Yet.
> 
> > We have since established that those two components coordinate with
> > native VM locking and lifetime management. If you need to lock the
> > page, you lock the page - instead of having all VM paths that already
> > hold the page lock acquire a nested lock to exclude one cgroup path.
> > 
> > In this case, we have auxiliary shrinker data, subject to shrinker
> > lifetime and exclusion rules. It's much easier to understand that
> > cgroup creation needs a stable shrinker list (shrinker_rwsem) to
> > manage this data, than having an aliased lock that is private to the
> > memcg callbacks and obscures this real interdependency.
> 
> Ok, so the way to do this is to move all the stuff that needs to be
> done under a "subsystem global" lock to the one file, not turn a
> static lock into a globally visible lock and spray it around random
> source files. There's already way too many static globals to manage
> separate shrinker and memcg state..
> 
> I certainly agree that shrinkers and memcg need to be more closely
> integrated.  I've only been saying that for ... well, since memcgs
> essentially duplicated the top level shrinker path so the shrinker
> map could be introduced to avoid calling shrinkers that have no work
> to do for memcgs. The shrinker map should be generic functionality
> for all shrinker invocations because even a non-memcg machine can
> have thousands of registered shrinkers that are mostly idle all the
> time.
> 
> IOWs, I think the shrinker map management is not really memcg
> specific - it's just allocation and assignment of a structure, and
> the only memcg bit is the map is being stored in a memcg structure.
> Therefore, if we are looking towards tighter integration then we
> should acutally move the map management to the shrinker code, not
> split the shrinker infrastructure management across different files.
> There's already a heap of code in vmscan.c under #ifdef
> CONFIG_MEMCG, like the prealloc_shrinker() code path:
> 
> prealloc_shrinker()   vmscan.c
>   if (MEMCG_AWARE)vmscan.c
> prealloc_memcg_shrinker   vmscan.c
> #ifdef CONFIG_MEMCG   vmscan.c
>   down_write(shrinker_rwsem)  vmscan.c
>   if (id > shrinker_id_max)   vmscan.c
>   memcg_expand_shrinker_maps  memcontrol.c
> for_each_memcgmemcontrol.c
>   reallocate shrinker map memcontrol.c
>   replace shrinker mapmemcontrol.c
>   shrinker_id_max = idvmscan.c
>   down_write(shrinker_rwsem)  vmscan.c
> #endif
> 
> And, really, there's very little code in memcg_expand_shrinker_maps()
> here - the only memcg part is the memcg iteration loop, and we
> already have them in vmscan.c (e.g. shrink_node_memcgs(),
> age_active_anon(), drop_slab_node()) so there's precedence for
> moving this memcg iteration for shrinker map management all into
> vmscan.c.
> 
> Doing so would formalise the shrinker maps as first class shrinker
> infrastructure rather than being tacked on to the side of the memcg
> infrastructure. At this point it makes total sense to serialise map
> manipulations under the shrinker_rwsem.
> 
> IOWs, I'm not disagreeing with the direction this patch takes us in,
> I'm disag

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-16 Thread Johannes Weiner
On Wed, Dec 16, 2020 at 08:59:38AM +1100, Dave Chinner wrote:
> On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
> > On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> > > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > > > Since memcg_shrinker_map_size just can be changd under holding 
> > > > shrinker_rwsem
> > > > exclusively, the read side can be protected by holding read lock, so it 
> > > > sounds
> > > > superfluous to have a dedicated mutex.
> > > 
> > > I'm not sure this is a good idea. This couples the shrinker
> > > infrastructure to internal details of how cgroups are initialised
> > > and managed. Sure, certain operations might be done in certain
> > > shrinker lock contexts, but that doesn't mean we should share global
> > > locks across otherwise independent subsystems
> > 
> > They're not independent subsystems. Most of the memory controller is
> > an extension of core VM operations that is fairly difficult to
> > understand outside the context of those operations. Then there are a
> > limited number of entry points from the cgroup interface. We used to
> > have our own locks for core VM structures (private page lock e.g.) to
> > coordinate VM and cgroup, and that was mostly unintelligble.
> 
> Yes, but OTOH you can CONFIG_MEMCG=n and the shrinker infrastructure
> and shrinkers all still functions correctly.  Ergo, the shrinker
> infrastructure is independent of memcgs. Yes, it may have functions
> to iterate and manipulate memcgs, but it is not dependent on memcgs
> existing for correct behaviour and functionality.

Okay, but now do it the other way round and explain the memcg bits in
a world where shrinkers don't exist ;-)

Anyway, we seem to be mostly in agreement below.

> > We have since established that those two components coordinate with
> > native VM locking and lifetime management. If you need to lock the
> > page, you lock the page - instead of having all VM paths that already
> > hold the page lock acquire a nested lock to exclude one cgroup path.
> > 
> > In this case, we have auxiliary shrinker data, subject to shrinker
> > lifetime and exclusion rules. It's much easier to understand that
> > cgroup creation needs a stable shrinker list (shrinker_rwsem) to
> > manage this data, than having an aliased lock that is private to the
> > memcg callbacks and obscures this real interdependency.
> 
> Ok, so the way to do this is to move all the stuff that needs to be
> done under a "subsystem global" lock to the one file, not turn a
> static lock into a globally visible lock and spray it around random
> source files.

Sure, that works as well.

> The shrinker map should be generic functionality for all shrinker
> invocations because even a non-memcg machine can have thousands of
> registered shrinkers that are mostly idle all the time.

Agreed.

> IOWs, I think the shrinker map management is not really memcg
> specific - it's just allocation and assignment of a structure, and
> the only memcg bit is the map is being stored in a memcg structure.
> Therefore, if we are looking towards tighter integration then we
> should acutally move the map management to the shrinker code, not
> split the shrinker infrastructure management across different files.
> There's already a heap of code in vmscan.c under #ifdef
> CONFIG_MEMCG, like the prealloc_shrinker() code path:
> 
> prealloc_shrinker()   vmscan.c
>   if (MEMCG_AWARE)vmscan.c
> prealloc_memcg_shrinker   vmscan.c
> #ifdef CONFIG_MEMCG   vmscan.c
>   down_write(shrinker_rwsem)  vmscan.c
>   if (id > shrinker_id_max)   vmscan.c
>   memcg_expand_shrinker_maps  memcontrol.c
> for_each_memcgmemcontrol.c
>   reallocate shrinker map memcontrol.c
>   replace shrinker mapmemcontrol.c
>   shrinker_id_max = idvmscan.c
>   down_write(shrinker_rwsem)  vmscan.c
> #endif
> 
> And, really, there's very little code in memcg_expand_shrinker_maps()
> here - the only memcg part is the memcg iteration loop, and we
> already have them in vmscan.c (e.g. shrink_node_memcgs(),
> age_active_anon(), drop_slab_node()) so there's precedence for
> moving this memcg iteration for shrinker map management all into
> vmscan.c.
>
> Doing so would formalise the shrinker maps as first class shrinker
> infrastructure rather than being tacked on to the side of the memcg
> infrastructure. At this point it makes total sense to serialise map
> manipulations under the shrinker_rwsem.

Yes, that's a great idea.

> That is, for the medium term, I think  we should be getting rid of
> the "legacy" non-memcg shrinker path and everything runs under
> memcgs.  With this patchset moving all the deferred counts to be
> memcg aware, the only reason for keeping the non-memcg path arou

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-16 Thread Kirill Tkhai
16.12.2020, 00:59, "Dave Chinner" :
> On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
>>  On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
>>  > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
>>  > > Since memcg_shrinker_map_size just can be changd under holding 
>> shrinker_rwsem
>>  > > exclusively, the read side can be protected by holding read lock, so it 
>> sounds
>>  > > superfluous to have a dedicated mutex.
>>  >
>>  > I'm not sure this is a good idea. This couples the shrinker
>>  > infrastructure to internal details of how cgroups are initialised
>>  > and managed. Sure, certain operations might be done in certain
>>  > shrinker lock contexts, but that doesn't mean we should share global
>>  > locks across otherwise independent subsystems
>>
>>  They're not independent subsystems. Most of the memory controller is
>>  an extension of core VM operations that is fairly difficult to
>>  understand outside the context of those operations. Then there are a
>>  limited number of entry points from the cgroup interface. We used to
>>  have our own locks for core VM structures (private page lock e.g.) to
>>  coordinate VM and cgroup, and that was mostly unintelligble.
>
> Yes, but OTOH you can CONFIG_MEMCG=n and the shrinker infrastructure
> and shrinkers all still functions correctly. Ergo, the shrinker
> infrastructure is independent of memcgs. Yes, it may have functions
> to iterate and manipulate memcgs, but it is not dependent on memcgs
> existing for correct behaviour and functionality.
>
> Yet.
>
>>  We have since established that those two components coordinate with
>>  native VM locking and lifetime management. If you need to lock the
>>  page, you lock the page - instead of having all VM paths that already
>>  hold the page lock acquire a nested lock to exclude one cgroup path.
>>
>>  In this case, we have auxiliary shrinker data, subject to shrinker
>>  lifetime and exclusion rules. It's much easier to understand that
>>  cgroup creation needs a stable shrinker list (shrinker_rwsem) to
>>  manage this data, than having an aliased lock that is private to the
>>  memcg callbacks and obscures this real interdependency.
>
> Ok, so the way to do this is to move all the stuff that needs to be
> done under a "subsystem global" lock to the one file, not turn a
> static lock into a globally visible lock and spray it around random
> source files. There's already way too many static globals to manage
> separate shrinker and memcg state..
>
> I certainly agree that shrinkers and memcg need to be more closely
> integrated. I've only been saying that for ... well, since memcgs
> essentially duplicated the top level shrinker path so the shrinker
> map could be introduced to avoid calling shrinkers that have no work
> to do for memcgs. The shrinker map should be generic functionality
> for all shrinker invocations because even a non-memcg machine can
> have thousands of registered shrinkers that are mostly idle all the
> time.
>
> IOWs, I think the shrinker map management is not really memcg
> specific - it's just allocation and assignment of a structure, and
> the only memcg bit is the map is being stored in a memcg structure.
> Therefore, if we are looking towards tighter integration then we
> should acutally move the map management to the shrinker code, not
> split the shrinker infrastructure management across different files.
> There's already a heap of code in vmscan.c under #ifdef
> CONFIG_MEMCG, like the prealloc_shrinker() code path:
>
> prealloc_shrinker() vmscan.c
>   if (MEMCG_AWARE) vmscan.c
> prealloc_memcg_shrinker vmscan.c
> #ifdef CONFIG_MEMCG vmscan.c
>   down_write(shrinker_rwsem) vmscan.c
>   if (id > shrinker_id_max) vmscan.c
> memcg_expand_shrinker_maps memcontrol.c
>   for_each_memcg memcontrol.c
> reallocate shrinker map memcontrol.c
> replace shrinker map memcontrol.c
> shrinker_id_max = id vmscan.c
>   down_write(shrinker_rwsem) vmscan.c
> #endif
>
> And, really, there's very little code in memcg_expand_shrinker_maps()
> here - the only memcg part is the memcg iteration loop, and we
> already have them in vmscan.c (e.g. shrink_node_memcgs(),
> age_active_anon(), drop_slab_node()) so there's precedence for
> moving this memcg iteration for shrinker map management all into
> vmscan.c.
>
> Doing so would formalise the shrinker maps as first class shrinker
> infrastructure rather than being tacked on to the side of the memcg
> infrastructure. At this point it makes total sense to serialise map
> manipulations under the shrinker_rwsem.
>
> IOWs, I'm not disagreeing with the direction this patch takes us in,
> I'm disagreeing with the implementation as published in the patch
> because it doesn't move us closer to a clean, concise single
> shrinker infrastructure implementation.
>
> That is, for the medium term, I think we should be getting rid of
> the "legacy" non-memcg shrinker path

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Dave Chinner
On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
> On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > > Since memcg_shrinker_map_size just can be changd under holding 
> > > shrinker_rwsem
> > > exclusively, the read side can be protected by holding read lock, so it 
> > > sounds
> > > superfluous to have a dedicated mutex.
> > 
> > I'm not sure this is a good idea. This couples the shrinker
> > infrastructure to internal details of how cgroups are initialised
> > and managed. Sure, certain operations might be done in certain
> > shrinker lock contexts, but that doesn't mean we should share global
> > locks across otherwise independent subsystems
> 
> They're not independent subsystems. Most of the memory controller is
> an extension of core VM operations that is fairly difficult to
> understand outside the context of those operations. Then there are a
> limited number of entry points from the cgroup interface. We used to
> have our own locks for core VM structures (private page lock e.g.) to
> coordinate VM and cgroup, and that was mostly unintelligble.

Yes, but OTOH you can CONFIG_MEMCG=n and the shrinker infrastructure
and shrinkers all still functions correctly.  Ergo, the shrinker
infrastructure is independent of memcgs. Yes, it may have functions
to iterate and manipulate memcgs, but it is not dependent on memcgs
existing for correct behaviour and functionality.

Yet.

> We have since established that those two components coordinate with
> native VM locking and lifetime management. If you need to lock the
> page, you lock the page - instead of having all VM paths that already
> hold the page lock acquire a nested lock to exclude one cgroup path.
> 
> In this case, we have auxiliary shrinker data, subject to shrinker
> lifetime and exclusion rules. It's much easier to understand that
> cgroup creation needs a stable shrinker list (shrinker_rwsem) to
> manage this data, than having an aliased lock that is private to the
> memcg callbacks and obscures this real interdependency.

Ok, so the way to do this is to move all the stuff that needs to be
done under a "subsystem global" lock to the one file, not turn a
static lock into a globally visible lock and spray it around random
source files. There's already way too many static globals to manage
separate shrinker and memcg state..

I certainly agree that shrinkers and memcg need to be more closely
integrated.  I've only been saying that for ... well, since memcgs
essentially duplicated the top level shrinker path so the shrinker
map could be introduced to avoid calling shrinkers that have no work
to do for memcgs. The shrinker map should be generic functionality
for all shrinker invocations because even a non-memcg machine can
have thousands of registered shrinkers that are mostly idle all the
time.

IOWs, I think the shrinker map management is not really memcg
specific - it's just allocation and assignment of a structure, and
the only memcg bit is the map is being stored in a memcg structure.
Therefore, if we are looking towards tighter integration then we
should acutally move the map management to the shrinker code, not
split the shrinker infrastructure management across different files.
There's already a heap of code in vmscan.c under #ifdef
CONFIG_MEMCG, like the prealloc_shrinker() code path:

prealloc_shrinker() vmscan.c
  if (MEMCG_AWARE)  vmscan.c
prealloc_memcg_shrinker vmscan.c
#ifdef CONFIG_MEMCG vmscan.c
  down_write(shrinker_rwsem)vmscan.c
  if (id > shrinker_id_max) vmscan.c
memcg_expand_shrinker_maps  memcontrol.c
  for_each_memcgmemcontrol.c
reallocate shrinker map memcontrol.c
replace shrinker mapmemcontrol.c
shrinker_id_max = idvmscan.c
  down_write(shrinker_rwsem)vmscan.c
#endif

And, really, there's very little code in memcg_expand_shrinker_maps()
here - the only memcg part is the memcg iteration loop, and we
already have them in vmscan.c (e.g. shrink_node_memcgs(),
age_active_anon(), drop_slab_node()) so there's precedence for
moving this memcg iteration for shrinker map management all into
vmscan.c.

Doing so would formalise the shrinker maps as first class shrinker
infrastructure rather than being tacked on to the side of the memcg
infrastructure. At this point it makes total sense to serialise map
manipulations under the shrinker_rwsem.

IOWs, I'm not disagreeing with the direction this patch takes us in,
I'm disagreeing with the implementation as published in the patch
because it doesn't move us closer to a clean, concise single
shrinker infrastructure implementation.

That is, for the medium term, I think  we should be getting rid of
the "l

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Yang Shi
On Tue, Dec 15, 2020 at 6:10 AM Johannes Weiner  wrote:
>
> On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > Since memcg_shrinker_map_size just can be changd under holding 
> > shrinker_rwsem
> > exclusively, the read side can be protected by holding read lock, so it 
> > sounds
> > superfluous to have a dedicated mutex.  This should not exacerbate the 
> > contention
> > to shrinker_rwsem since just one read side critical section is added.
> >
> > Signed-off-by: Yang Shi 
>
> Acked-by: Johannes Weiner 
>
> Thanks Yang, this is a step in the right direction. It would still be
> nice to also drop memcg_shrinker_map_size and (trivially) derive that
> value from shrinker_nr_max where necessary. It is duplicate state.

Thanks! I will take a further look at it.


Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Johannes Weiner
On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> Since memcg_shrinker_map_size just can be changd under holding shrinker_rwsem
> exclusively, the read side can be protected by holding read lock, so it sounds
> superfluous to have a dedicated mutex.  This should not exacerbate the 
> contention
> to shrinker_rwsem since just one read side critical section is added.
> 
> Signed-off-by: Yang Shi 

Acked-by: Johannes Weiner 

Thanks Yang, this is a step in the right direction. It would still be
nice to also drop memcg_shrinker_map_size and (trivially) derive that
value from shrinker_nr_max where necessary. It is duplicate state.


Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Johannes Weiner
On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > Since memcg_shrinker_map_size just can be changd under holding 
> > shrinker_rwsem
> > exclusively, the read side can be protected by holding read lock, so it 
> > sounds
> > superfluous to have a dedicated mutex.
> 
> I'm not sure this is a good idea. This couples the shrinker
> infrastructure to internal details of how cgroups are initialised
> and managed. Sure, certain operations might be done in certain
> shrinker lock contexts, but that doesn't mean we should share global
> locks across otherwise independent subsystems

They're not independent subsystems. Most of the memory controller is
an extension of core VM operations that is fairly difficult to
understand outside the context of those operations. Then there are a
limited number of entry points from the cgroup interface. We used to
have our own locks for core VM structures (private page lock e.g.) to
coordinate VM and cgroup, and that was mostly unintelligble.

We have since established that those two components coordinate with
native VM locking and lifetime management. If you need to lock the
page, you lock the page - instead of having all VM paths that already
hold the page lock acquire a nested lock to exclude one cgroup path.

In this case, we have auxiliary shrinker data, subject to shrinker
lifetime and exclusion rules. It's much easier to understand that
cgroup creation needs a stable shrinker list (shrinker_rwsem) to
manage this data, than having an aliased lock that is private to the
memcg callbacks and obscures this real interdependency.


Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-14 Thread Dave Chinner
On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> Since memcg_shrinker_map_size just can be changd under holding shrinker_rwsem
> exclusively, the read side can be protected by holding read lock, so it sounds
> superfluous to have a dedicated mutex.

I'm not sure this is a good idea. This couples the shrinker
infrastructure to internal details of how cgroups are initialised
and managed. Sure, certain operations might be done in certain
shrinker lock contexts, but that doesn't mean we should share global
locks across otherwise independent subsystems

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-14 Thread Yang Shi
Since memcg_shrinker_map_size just can be changd under holding shrinker_rwsem
exclusively, the read side can be protected by holding read lock, so it sounds
superfluous to have a dedicated mutex.  This should not exacerbate the 
contention
to shrinker_rwsem since just one read side critical section is added.

Signed-off-by: Yang Shi 
---
 mm/internal.h   |  1 +
 mm/memcontrol.c | 17 +++--
 mm/vmscan.c |  2 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index c43ccdddb0f6..10c79d199aaa 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -108,6 +108,7 @@ extern unsigned long highest_memmap_pfn;
 /*
  * in mm/vmscan.c:
  */
+extern struct rw_semaphore shrinker_rwsem;
 extern int isolate_lru_page(struct page *page);
 extern void putback_lru_page(struct page *page);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 29459a6ce1c7..ed942734235f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -394,8 +394,8 @@ DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
 EXPORT_SYMBOL(memcg_kmem_enabled_key);
 #endif
 
+/* It is only can be changed with holding shrinker_rwsem exclusively */
 static int memcg_shrinker_map_size;
-static DEFINE_MUTEX(memcg_shrinker_map_mutex);
 
 static void memcg_free_shrinker_map_rcu(struct rcu_head *head)
 {
@@ -408,8 +408,6 @@ static int memcg_expand_one_shrinker_map(struct mem_cgroup 
*memcg,
struct memcg_shrinker_map *new, *old;
int nid;
 
-   lockdep_assert_held(&memcg_shrinker_map_mutex);
-
for_each_node(nid) {
old = rcu_dereference_protected(
mem_cgroup_nodeinfo(memcg, nid)->shrinker_map, true);
@@ -458,7 +456,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
*memcg)
if (mem_cgroup_is_root(memcg))
return 0;
 
-   mutex_lock(&memcg_shrinker_map_mutex);
+   down_read(&shrinker_rwsem);
size = memcg_shrinker_map_size;
for_each_node(nid) {
map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
@@ -469,7 +467,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
*memcg)
}
rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_map, map);
}
-   mutex_unlock(&memcg_shrinker_map_mutex);
+   up_read(&shrinker_rwsem);
 
return ret;
 }
@@ -484,9 +482,8 @@ int memcg_expand_shrinker_maps(int new_id)
if (size <= old_size)
return 0;
 
-   mutex_lock(&memcg_shrinker_map_mutex);
if (!root_mem_cgroup)
-   goto unlock;
+   goto out;
 
for_each_mem_cgroup(memcg) {
if (mem_cgroup_is_root(memcg))
@@ -494,13 +491,13 @@ int memcg_expand_shrinker_maps(int new_id)
ret = memcg_expand_one_shrinker_map(memcg, size, old_size);
if (ret) {
mem_cgroup_iter_break(NULL, memcg);
-   goto unlock;
+   goto out;
}
}
-unlock:
+out:
if (!ret)
memcg_shrinker_map_size = size;
-   mutex_unlock(&memcg_shrinker_map_mutex);
+
return ret;
 }
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 48c06c48b97e..912c044301dd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -184,7 +184,7 @@ static void set_task_reclaim_state(struct task_struct *task,
 }
 
 static LIST_HEAD(shrinker_list);
-static DECLARE_RWSEM(shrinker_rwsem);
+DECLARE_RWSEM(shrinker_rwsem);
 
 #ifdef CONFIG_MEMCG
 /*
-- 
2.26.2