Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-11 Thread Michal Hocko
On Mon 09-11-15 21:28:40, Vladimir Davydov wrote:
> On Mon, Nov 09, 2015 at 03:08:32PM +0100, Michal Hocko wrote:
[...]
> > > Vladimir Davydov (5):
> > >   Revert "kernfs: do not account ino_ida allocations to memcg"
> > >   Revert "gfp: add __GFP_NOACCOUNT"
> > 
> > The patch ordering would break the bisectability. I would simply squash
> 
> How's that? AFAICS the kernel should compile after any first N=1..5
> patches of the series applied.

Sorry, forgot to comment on this. I didn't mean it would break
compilation. It would just reintroduce the bug fixed by "kernfs: do not
account ino_ida allocations to memcg". My understanding is that the bug
is quite unlikely and it will results in a pinned memcg which is much
less serious than a crash or other misbehavior.

I will leave whether this is serious enough to you but as the revert is
basically dropping the flag which can be trivially done in the patch
which renames it and changes its semantic I do not think splitting has
any large advantage.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-11 Thread Michal Hocko
On Mon 09-11-15 21:28:40, Vladimir Davydov wrote:
> On Mon, Nov 09, 2015 at 03:08:32PM +0100, Michal Hocko wrote:
[...]
> > > Vladimir Davydov (5):
> > >   Revert "kernfs: do not account ino_ida allocations to memcg"
> > >   Revert "gfp: add __GFP_NOACCOUNT"
> > 
> > The patch ordering would break the bisectability. I would simply squash
> 
> How's that? AFAICS the kernel should compile after any first N=1..5
> patches of the series applied.

Sorry, forgot to comment on this. I didn't mean it would break
compilation. It would just reintroduce the bug fixed by "kernfs: do not
account ino_ida allocations to memcg". My understanding is that the bug
is quite unlikely and it will results in a pinned memcg which is much
less serious than a crash or other misbehavior.

I will leave whether this is serious enough to you but as the revert is
basically dropping the flag which can be trivially done in the patch
which renames it and changes its semantic I do not think splitting has
any large advantage.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 03:30:53PM -0500, Tejun Heo wrote:
...
> Hmm can't we simply merge among !SLAB_ACCOUNT and SLAB_ACCOUNT
> kmem_caches within themselves?  I don't think we'd be losing anything
> by restricting merge at that level.  For anything to be tagged
> SLAB_ACCOUNT, it has to have a potential to grow enormous after all.

OK, I'll prepare v2 which will introduce SLAB_ACCOUNT and add it to
SLAB_MERGE_SAME. Let's see what slab maintainers think of it.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Tejun Heo
Hello, Vladimir.

On Mon, Nov 09, 2015 at 11:12:18PM +0300, Vladimir Davydov wrote:
> Because we won't be able to distinguish kmem_cache_alloc calls that
> should be accounted from those that shouldn't. The problem is if two
> caches
> 
>   A = kmem_cache_create(...)
> 
> and
> 
>   B = kmem_cache_create(...)
> 
> happen to be merged, A and B will point to the same kmem_cache struct.
> As a result, there is no way to distinguish
> 
>   kmem_cache_alloc(A)
> 
> which we want to account from
> 
>   kmem_cache_alloc(B)
> 
> which we don't.

Hmm can't we simply merge among !SLAB_ACCOUNT and SLAB_ACCOUNT
kmem_caches within themselves?  I don't think we'd be losing anything
by restricting merge at that level.  For anything to be tagged
SLAB_ACCOUNT, it has to have a potential to grow enormous after all.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 02:32:53PM -0500, Tejun Heo wrote:
> On Mon, Nov 09, 2015 at 10:27:47PM +0300, Vladimir Davydov wrote:
> > Of course, we could rework slab merging so that kmem_cache_create
> > returned a new dummy cache even if it was actually merged. Such a cache
> > would point to the real cache, which would be used for allocations. This
> > wouldn't limit slab merging, but this would add one more dereference to
> > alloc path, which is even worse.
> 
> Hmmm, this could be me not really understanding but why can't we let
> all slabs to be merged regardless of SLAB_ACCOUNT flag for root memcg
> and point to per-memcg slabs (may be merged among them but most likely

Because we won't be able to distinguish kmem_cache_alloc calls that
should be accounted from those that shouldn't. The problem is if two
caches

A = kmem_cache_create(...)

and

B = kmem_cache_create(...)

happen to be merged, A and B will point to the same kmem_cache struct.
As a result, there is no way to distinguish

kmem_cache_alloc(A)

which we want to account from

kmem_cache_alloc(B)

which we don't.

> won't matter) for !root.  We're indirecting once anyway, no?

If kmem accounting is not used, we aren't indirecting. That's why I
don't think we can use dummy kmem_cache struct for merged caches, where
we could store __GFP_ACCOUNT flag.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Tejun Heo
Hello, Vladmir.

On Mon, Nov 09, 2015 at 10:27:47PM +0300, Vladimir Davydov wrote:
> Of course, we could rework slab merging so that kmem_cache_create
> returned a new dummy cache even if it was actually merged. Such a cache
> would point to the real cache, which would be used for allocations. This
> wouldn't limit slab merging, but this would add one more dereference to
> alloc path, which is even worse.

Hmmm, this could be me not really understanding but why can't we let
all slabs to be merged regardless of SLAB_ACCOUNT flag for root memcg
and point to per-memcg slabs (may be merged among them but most likely
won't matter) for !root.  We're indirecting once anyway, no?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 01:54:01PM -0500, Tejun Heo wrote:
> On Mon, Nov 09, 2015 at 09:28:40PM +0300, Vladimir Davydov wrote:
> > > I am _all_ for this semantic I am just not sure what to do with the
> > > legacy kmem controller. Can we change its semantic? If we cannot do that
> > 
> > I think we can. If somebody reports a "bug" caused by this change, i.e.
> > basically notices that something that used to be accounted is not any
> > longer, it will be trivial to fix by adding __GFP_ACCOUNT where
> > appropriate. If it is not, e.g. if accounting of objects of a particular
> > type leads to intense false-sharing, we would end up disabling
> > accounting for it anyway.
> 
> I agree too, if anything is meaningfully broken by the flip, it just
> indicates that the whitelist needs to be expanded; however, I wonder
> whether this would be done better at slab level rather than per
> allocation site.

I'd like to, but this is not as simple as it seems at first glance. The
problem is that slab caches of the same size are actively merged with
each other. If we just added SLAB_ACCOUNT flag, which would be passed to
kmem_cache_create to enable accounting, we'd divide all caches into two
groups that couldn't be merged with each other even if kmem accounting
was not used at all. This would be a show stopper.

Of course, we could rework slab merging so that kmem_cache_create
returned a new dummy cache even if it was actually merged. Such a cache
would point to the real cache, which would be used for allocations. This
wouldn't limit slab merging, but this would add one more dereference to
alloc path, which is even worse.

That's why I decided to go with marking individual allocations.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Tejun Heo
Hello, Vladmir.

On Mon, Nov 09, 2015 at 09:28:40PM +0300, Vladimir Davydov wrote:
> > I am _all_ for this semantic I am just not sure what to do with the
> > legacy kmem controller. Can we change its semantic? If we cannot do that
> 
> I think we can. If somebody reports a "bug" caused by this change, i.e.
> basically notices that something that used to be accounted is not any
> longer, it will be trivial to fix by adding __GFP_ACCOUNT where
> appropriate. If it is not, e.g. if accounting of objects of a particular
> type leads to intense false-sharing, we would end up disabling
> accounting for it anyway.

I agree too, if anything is meaningfully broken by the flip, it just
indicates that the whitelist needs to be expanded; however, I wonder
whether this would be done better at slab level rather than per
allocation site.

A class of objects which can consume noticeable amount of memory which
can be attributed to userland is likely to be on its own slab already
or separating it out to its own slab is likely to be a good idea.
Marking those slabs as kmemcg accounted seems better suited to the
semantics - it's always about classes of objects - and less
error-prone than marking individual allocation sites.

This also reduces the number of slabs to worry about and more
importantly makes it clear which slabs need to be replicated for
kmemcg accounting from the beginning and the slab part of
implementation can be far simpler / more static.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 03:08:32PM +0100, Michal Hocko wrote:
...
> > Therefore this patch switches to the white list policy. Now kmalloc
> > users have to explicitly opt in by passing __GFP_ACCOUNT flag.
> > 
> > Currently, the list of accounted objects is quite limited and only
> > includes those allocations that (1) are known to be easily triggered
> > from userspace and (2) can fail gracefully (for the full list see patch
> > no. 5) and it still misses many object types. However, accounting only
> > those objects should be a satisfactory approximation of the behavior we
> > used to have for most sane workloads.
> 
> I am _all_ for this semantic I am just not sure what to do with the
> legacy kmem controller. Can we change its semantic? If we cannot do that

I think we can. If somebody reports a "bug" caused by this change, i.e.
basically notices that something that used to be accounted is not any
longer, it will be trivial to fix by adding __GFP_ACCOUNT where
appropriate. If it is not, e.g. if accounting of objects of a particular
type leads to intense false-sharing, we would end up disabling
accounting for it anyway.

> we would have to distinguish legacy and unified hierarchies during
> runtime and add the flag automagically for the first one (that would
> however require to keep __GFP_NOACCOUNT as well) which is all as clear
> as mud. But maybe the workloads which are using kmem legacy API can cope
> with that.
> 
> Anyway if we go this way then I think the kmem accounting would be safe
> to be enabled by default with the cgroup2.
> 
> > Thanks,
> > 
> > Vladimir Davydov (5):
> >   Revert "kernfs: do not account ino_ida allocations to memcg"
> >   Revert "gfp: add __GFP_NOACCOUNT"
> 
> The patch ordering would break the bisectability. I would simply squash

How's that? AFAICS the kernel should compile after any first N=1..5
patches of the series applied.

> both places into the patch which replaces the flag.
> 

IMO it is more readable the way it is, but I don't insist.

Thanks,
Vladimir

> >   memcg: only account kmem allocations marked as __GFP_ACCOUNT
> >   vmalloc: allow to account vmalloc to memcg
> >   Account certain kmem allocations to memcg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Johannes Weiner
On Mon, Nov 09, 2015 at 03:08:32PM +0100, Michal Hocko wrote:
> I am _all_ for this semantic I am just not sure what to do with the
> legacy kmem controller. Can we change its semantic? If we cannot do that
> we would have to distinguish legacy and unified hierarchies during
> runtime and add the flag automagically for the first one (that would
> however require to keep __GFP_NOACCOUNT as well) which is all as clear
> as mud. But maybe the workloads which are using kmem legacy API can cope
> with that.

I think we can make that change for the existing kmem accounting too,
simply because the whitelist should be covering all memory consumers
that actually matter for isolation in practice. Yes, there is a risk
for accidents, but we are not actually intending to change semantics.

> Anyway if we go this way then I think the kmem accounting would be safe
> to be enabled by default with the cgroup2.

Cool, I'm happy we're on the same page about this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Michal Hocko
On Sat 07-11-15 23:07:04, Vladimir Davydov wrote:
> Hi,
> 
> Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
> alloc_kmem_pages call) are accounted to memory cgroup automatically.
> Callers have to explicitly opt out if they don't want/need accounting
> for some reason. Such a design decision leads to several problems:
> 
>  - kmalloc users are highly sensitive to failures, many of them
>implicitly rely on the fact that kmalloc never fails, while memcg
>makes failures quite plausible.
> 
>  - A lot of objects are shared among different containers by design.
>Accounting such objects to one of containers is just unfair.
>Moreover, it might lead to pinning a dead memcg along with its kmem
>caches, which aren't tiny, which might result in noticeable increase
>in memory consumption for no apparent reason in the long run.
> 
>  - There are tons of short-lived objects. Accounting them to memcg will
>only result in slight noise and won't change the overall picture, but
>we still have to pay accounting overhead.

Yes, I think we should have gone that path since the very beginning.
Glauber even started with opt-in IIRC (caches were supposed to register
to be accounted). I do not remember what's led to the opt-out switch -
but I guess it has something to do with the user API how to select which
caches to track and also the original version from Google by Suleiman
Souhlal did the opt-out from the very beginning. Also kmem extension was
assumed to be used for "special" workloads.

> For more info, see
> 
>  - https://lkml.org/lkml/2015/11/5/365
>  - https://lkml.org/lkml/2015/11/6/122

Using lkml.org links tend to be quite painful because they quite often
do not work. http://lkml.kernel.org/r/$msg_id tends to work much better
IMO

http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
http://lkml.kernel.org/r/20151106090555.GK29259@esperanza

> Therefore this patch switches to the white list policy. Now kmalloc
> users have to explicitly opt in by passing __GFP_ACCOUNT flag.
> 
> Currently, the list of accounted objects is quite limited and only
> includes those allocations that (1) are known to be easily triggered
> from userspace and (2) can fail gracefully (for the full list see patch
> no. 5) and it still misses many object types. However, accounting only
> those objects should be a satisfactory approximation of the behavior we
> used to have for most sane workloads.

I am _all_ for this semantic I am just not sure what to do with the
legacy kmem controller. Can we change its semantic? If we cannot do that
we would have to distinguish legacy and unified hierarchies during
runtime and add the flag automagically for the first one (that would
however require to keep __GFP_NOACCOUNT as well) which is all as clear
as mud. But maybe the workloads which are using kmem legacy API can cope
with that.

Anyway if we go this way then I think the kmem accounting would be safe
to be enabled by default with the cgroup2.

> Thanks,
> 
> Vladimir Davydov (5):
>   Revert "kernfs: do not account ino_ida allocations to memcg"
>   Revert "gfp: add __GFP_NOACCOUNT"

The patch ordering would break the bisectability. I would simply squash
both places into the patch which replaces the flag.

>   memcg: only account kmem allocations marked as __GFP_ACCOUNT
>   vmalloc: allow to account vmalloc to memcg
>   Account certain kmem allocations to memcg

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Michal Hocko
On Sat 07-11-15 23:07:04, Vladimir Davydov wrote:
> Hi,
> 
> Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
> alloc_kmem_pages call) are accounted to memory cgroup automatically.
> Callers have to explicitly opt out if they don't want/need accounting
> for some reason. Such a design decision leads to several problems:
> 
>  - kmalloc users are highly sensitive to failures, many of them
>implicitly rely on the fact that kmalloc never fails, while memcg
>makes failures quite plausible.
> 
>  - A lot of objects are shared among different containers by design.
>Accounting such objects to one of containers is just unfair.
>Moreover, it might lead to pinning a dead memcg along with its kmem
>caches, which aren't tiny, which might result in noticeable increase
>in memory consumption for no apparent reason in the long run.
> 
>  - There are tons of short-lived objects. Accounting them to memcg will
>only result in slight noise and won't change the overall picture, but
>we still have to pay accounting overhead.

Yes, I think we should have gone that path since the very beginning.
Glauber even started with opt-in IIRC (caches were supposed to register
to be accounted). I do not remember what's led to the opt-out switch -
but I guess it has something to do with the user API how to select which
caches to track and also the original version from Google by Suleiman
Souhlal did the opt-out from the very beginning. Also kmem extension was
assumed to be used for "special" workloads.

> For more info, see
> 
>  - https://lkml.org/lkml/2015/11/5/365
>  - https://lkml.org/lkml/2015/11/6/122

Using lkml.org links tend to be quite painful because they quite often
do not work. http://lkml.kernel.org/r/$msg_id tends to work much better
IMO

http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
http://lkml.kernel.org/r/20151106090555.GK29259@esperanza

> Therefore this patch switches to the white list policy. Now kmalloc
> users have to explicitly opt in by passing __GFP_ACCOUNT flag.
> 
> Currently, the list of accounted objects is quite limited and only
> includes those allocations that (1) are known to be easily triggered
> from userspace and (2) can fail gracefully (for the full list see patch
> no. 5) and it still misses many object types. However, accounting only
> those objects should be a satisfactory approximation of the behavior we
> used to have for most sane workloads.

I am _all_ for this semantic I am just not sure what to do with the
legacy kmem controller. Can we change its semantic? If we cannot do that
we would have to distinguish legacy and unified hierarchies during
runtime and add the flag automagically for the first one (that would
however require to keep __GFP_NOACCOUNT as well) which is all as clear
as mud. But maybe the workloads which are using kmem legacy API can cope
with that.

Anyway if we go this way then I think the kmem accounting would be safe
to be enabled by default with the cgroup2.

> Thanks,
> 
> Vladimir Davydov (5):
>   Revert "kernfs: do not account ino_ida allocations to memcg"
>   Revert "gfp: add __GFP_NOACCOUNT"

The patch ordering would break the bisectability. I would simply squash
both places into the patch which replaces the flag.

>   memcg: only account kmem allocations marked as __GFP_ACCOUNT
>   vmalloc: allow to account vmalloc to memcg
>   Account certain kmem allocations to memcg

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Tejun Heo
Hello, Vladmir.

On Mon, Nov 09, 2015 at 09:28:40PM +0300, Vladimir Davydov wrote:
> > I am _all_ for this semantic I am just not sure what to do with the
> > legacy kmem controller. Can we change its semantic? If we cannot do that
> 
> I think we can. If somebody reports a "bug" caused by this change, i.e.
> basically notices that something that used to be accounted is not any
> longer, it will be trivial to fix by adding __GFP_ACCOUNT where
> appropriate. If it is not, e.g. if accounting of objects of a particular
> type leads to intense false-sharing, we would end up disabling
> accounting for it anyway.

I agree too, if anything is meaningfully broken by the flip, it just
indicates that the whitelist needs to be expanded; however, I wonder
whether this would be done better at slab level rather than per
allocation site.

A class of objects which can consume noticeable amount of memory which
can be attributed to userland is likely to be on its own slab already
or separating it out to its own slab is likely to be a good idea.
Marking those slabs as kmemcg accounted seems better suited to the
semantics - it's always about classes of objects - and less
error-prone than marking individual allocation sites.

This also reduces the number of slabs to worry about and more
importantly makes it clear which slabs need to be replicated for
kmemcg accounting from the beginning and the slab part of
implementation can be far simpler / more static.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 03:08:32PM +0100, Michal Hocko wrote:
...
> > Therefore this patch switches to the white list policy. Now kmalloc
> > users have to explicitly opt in by passing __GFP_ACCOUNT flag.
> > 
> > Currently, the list of accounted objects is quite limited and only
> > includes those allocations that (1) are known to be easily triggered
> > from userspace and (2) can fail gracefully (for the full list see patch
> > no. 5) and it still misses many object types. However, accounting only
> > those objects should be a satisfactory approximation of the behavior we
> > used to have for most sane workloads.
> 
> I am _all_ for this semantic I am just not sure what to do with the
> legacy kmem controller. Can we change its semantic? If we cannot do that

I think we can. If somebody reports a "bug" caused by this change, i.e.
basically notices that something that used to be accounted is not any
longer, it will be trivial to fix by adding __GFP_ACCOUNT where
appropriate. If it is not, e.g. if accounting of objects of a particular
type leads to intense false-sharing, we would end up disabling
accounting for it anyway.

> we would have to distinguish legacy and unified hierarchies during
> runtime and add the flag automagically for the first one (that would
> however require to keep __GFP_NOACCOUNT as well) which is all as clear
> as mud. But maybe the workloads which are using kmem legacy API can cope
> with that.
> 
> Anyway if we go this way then I think the kmem accounting would be safe
> to be enabled by default with the cgroup2.
> 
> > Thanks,
> > 
> > Vladimir Davydov (5):
> >   Revert "kernfs: do not account ino_ida allocations to memcg"
> >   Revert "gfp: add __GFP_NOACCOUNT"
> 
> The patch ordering would break the bisectability. I would simply squash

How's that? AFAICS the kernel should compile after any first N=1..5
patches of the series applied.

> both places into the patch which replaces the flag.
> 

IMO it is more readable the way it is, but I don't insist.

Thanks,
Vladimir

> >   memcg: only account kmem allocations marked as __GFP_ACCOUNT
> >   vmalloc: allow to account vmalloc to memcg
> >   Account certain kmem allocations to memcg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Johannes Weiner
On Mon, Nov 09, 2015 at 03:08:32PM +0100, Michal Hocko wrote:
> I am _all_ for this semantic I am just not sure what to do with the
> legacy kmem controller. Can we change its semantic? If we cannot do that
> we would have to distinguish legacy and unified hierarchies during
> runtime and add the flag automagically for the first one (that would
> however require to keep __GFP_NOACCOUNT as well) which is all as clear
> as mud. But maybe the workloads which are using kmem legacy API can cope
> with that.

I think we can make that change for the existing kmem accounting too,
simply because the whitelist should be covering all memory consumers
that actually matter for isolation in practice. Yes, there is a risk
for accidents, but we are not actually intending to change semantics.

> Anyway if we go this way then I think the kmem accounting would be safe
> to be enabled by default with the cgroup2.

Cool, I'm happy we're on the same page about this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 02:32:53PM -0500, Tejun Heo wrote:
> On Mon, Nov 09, 2015 at 10:27:47PM +0300, Vladimir Davydov wrote:
> > Of course, we could rework slab merging so that kmem_cache_create
> > returned a new dummy cache even if it was actually merged. Such a cache
> > would point to the real cache, which would be used for allocations. This
> > wouldn't limit slab merging, but this would add one more dereference to
> > alloc path, which is even worse.
> 
> Hmmm, this could be me not really understanding but why can't we let
> all slabs to be merged regardless of SLAB_ACCOUNT flag for root memcg
> and point to per-memcg slabs (may be merged among them but most likely

Because we won't be able to distinguish kmem_cache_alloc calls that
should be accounted from those that shouldn't. The problem is if two
caches

A = kmem_cache_create(...)

and

B = kmem_cache_create(...)

happen to be merged, A and B will point to the same kmem_cache struct.
As a result, there is no way to distinguish

kmem_cache_alloc(A)

which we want to account from

kmem_cache_alloc(B)

which we don't.

> won't matter) for !root.  We're indirecting once anyway, no?

If kmem accounting is not used, we aren't indirecting. That's why I
don't think we can use dummy kmem_cache struct for merged caches, where
we could store __GFP_ACCOUNT flag.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Tejun Heo
Hello, Vladimir.

On Mon, Nov 09, 2015 at 11:12:18PM +0300, Vladimir Davydov wrote:
> Because we won't be able to distinguish kmem_cache_alloc calls that
> should be accounted from those that shouldn't. The problem is if two
> caches
> 
>   A = kmem_cache_create(...)
> 
> and
> 
>   B = kmem_cache_create(...)
> 
> happen to be merged, A and B will point to the same kmem_cache struct.
> As a result, there is no way to distinguish
> 
>   kmem_cache_alloc(A)
> 
> which we want to account from
> 
>   kmem_cache_alloc(B)
> 
> which we don't.

Hmm can't we simply merge among !SLAB_ACCOUNT and SLAB_ACCOUNT
kmem_caches within themselves?  I don't think we'd be losing anything
by restricting merge at that level.  For anything to be tagged
SLAB_ACCOUNT, it has to have a potential to grow enormous after all.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 01:54:01PM -0500, Tejun Heo wrote:
> On Mon, Nov 09, 2015 at 09:28:40PM +0300, Vladimir Davydov wrote:
> > > I am _all_ for this semantic I am just not sure what to do with the
> > > legacy kmem controller. Can we change its semantic? If we cannot do that
> > 
> > I think we can. If somebody reports a "bug" caused by this change, i.e.
> > basically notices that something that used to be accounted is not any
> > longer, it will be trivial to fix by adding __GFP_ACCOUNT where
> > appropriate. If it is not, e.g. if accounting of objects of a particular
> > type leads to intense false-sharing, we would end up disabling
> > accounting for it anyway.
> 
> I agree too, if anything is meaningfully broken by the flip, it just
> indicates that the whitelist needs to be expanded; however, I wonder
> whether this would be done better at slab level rather than per
> allocation site.

I'd like to, but this is not as simple as it seems at first glance. The
problem is that slab caches of the same size are actively merged with
each other. If we just added SLAB_ACCOUNT flag, which would be passed to
kmem_cache_create to enable accounting, we'd divide all caches into two
groups that couldn't be merged with each other even if kmem accounting
was not used at all. This would be a show stopper.

Of course, we could rework slab merging so that kmem_cache_create
returned a new dummy cache even if it was actually merged. Such a cache
would point to the real cache, which would be used for allocations. This
wouldn't limit slab merging, but this would add one more dereference to
alloc path, which is even worse.

That's why I decided to go with marking individual allocations.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Tejun Heo
Hello, Vladmir.

On Mon, Nov 09, 2015 at 10:27:47PM +0300, Vladimir Davydov wrote:
> Of course, we could rework slab merging so that kmem_cache_create
> returned a new dummy cache even if it was actually merged. Such a cache
> would point to the real cache, which would be used for allocations. This
> wouldn't limit slab merging, but this would add one more dereference to
> alloc path, which is even worse.

Hmmm, this could be me not really understanding but why can't we let
all slabs to be merged regardless of SLAB_ACCOUNT flag for root memcg
and point to per-memcg slabs (may be merged among them but most likely
won't matter) for !root.  We're indirecting once anyway, no?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] memcg/kmem: switch to white list policy

2015-11-09 Thread Vladimir Davydov
On Mon, Nov 09, 2015 at 03:30:53PM -0500, Tejun Heo wrote:
...
> Hmm can't we simply merge among !SLAB_ACCOUNT and SLAB_ACCOUNT
> kmem_caches within themselves?  I don't think we'd be losing anything
> by restricting merge at that level.  For anything to be tagged
> SLAB_ACCOUNT, it has to have a potential to grow enormous after all.

OK, I'll prepare v2 which will introduce SLAB_ACCOUNT and add it to
SLAB_MERGE_SAME. Let's see what slab maintainers think of it.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/