Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Tejun Heo
On Mon, Jun 16, 2014 at 04:29:15PM +0200, Michal Hocko wrote:
> > They're all in the mainline now.
> 
> git grep CFTYPE_ON_ON_DFL origin/master didn't show me anything.

lol, it should have been CFTYPE_ONLY_ON_DFL.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Mon 16-06-14 10:12:33, Tejun Heo wrote:
> On Mon, Jun 16, 2014 at 04:04:48PM +0200, Michal Hocko wrote:
> > > For whatever reason, a user is stuck with thread-level granularity for
> > > controllers which work that way, the user can use the old hierarchies
> > > for them for the time being.
> > 
> > So he can mount memcg with new cgroup API and others with old?
> 
> Yes, you can read Documentation/cgroups/unified-hierarchy.txt for more
> details.  I think I cc'd you when posting unified hierarchy patchset,
> didn't I?

OK, I've obviously pushed that out of my brain, because you are really
clear about it:
"
All controllers which are not bound to other hierarchies are
automatically bound to unified hierarchy and show up at the root of
it. Controllers which are enabled only in the root of unified
hierarchy can be bound to other hierarchies at any time.  This allows
mixing unified hierarchy with the traditional multiple hierarchies in
a fully backward compatible way.
"

This of course sorts out my concerns. Sorry about the noise!

> > > Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
> > > opposite. 
> > 
> > OK, I wasn't aware of this. On which branch I find this?
> 
> They're all in the mainline now.

git grep CFTYPE_ON_ON_DFL origin/master didn't show me anything.

Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Tejun Heo
On Mon, Jun 16, 2014 at 04:04:48PM +0200, Michal Hocko wrote:
> > For whatever reason, a user is stuck with thread-level granularity for
> > controllers which work that way, the user can use the old hierarchies
> > for them for the time being.
> 
> So he can mount memcg with new cgroup API and others with old?

Yes, you can read Documentation/cgroups/unified-hierarchy.txt for more
details.  I think I cc'd you when posting unified hierarchy patchset,
didn't I?

> > Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
> > opposite. 
> 
> OK, I wasn't aware of this. On which branch I find this?

They're all in the mainline now.

> > Knobs marked with the flag only appear on the default
> > hierarchy (cgroup core internally calls it the default hierarchy as
> > this is the tree all the controllers are attached to by default).
> 
> I am not sure I understand. So they are visible only in the hierarchy
> mounted with the new cgroup API (sane or how is it called)?

Yeap.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Mon 16-06-14 09:57:41, Tejun Heo wrote:
> Hello, Michal.
> 
> On Mon, Jun 16, 2014 at 02:59:15PM +0200, Michal Hocko wrote:
> > > There sure is a question of how fast userland will move to the new
> > > interface. 
> > 
> > Yeah, I was mostly thinking about those who would need to to bigger
> > changes. AFAIR threads will no longer be distributable between groups.
> 
> Thread-level granularity should go away no matter what, but this is
> completely irrelevant to memcg which can't do per-thread anyway.

Yes, I wasn't afraid about memcg. It was a setup which requires more
controllers that I was worried about.

> For whatever reason, a user is stuck with thread-level granularity for
> controllers which work that way, the user can use the old hierarchies
> for them for the time being.

So he can mount memcg with new cgroup API and others with old?

> > > is used but I don't think there's any chance of removing the knob.
> > > There's a reason why we're introducing a new version of the whole
> > > cgroup interface which can co-exist with the existing one after all.
> > > If you wanna version memcg interface separately, maybe that'd work but
> > > it sounds like a lot of extra hassle for not much gain.
> > 
> > No, I didn't mean to version the interface. I just wanted to have
> > gradual transition for potential soft_limit users.
> > 
> > Maybe I am misunderstanding something but I thought that new version of
> > API will contain all knobs which are not marked .flags = CFTYPE_INSANE
> > while the old API will contain all of them.
> 
> Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
> opposite. 

OK, I wasn't aware of this. On which branch I find this?

> Knobs marked with the flag only appear on the default
> hierarchy (cgroup core internally calls it the default hierarchy as
> this is the tree all the controllers are attached to by default).

I am not sure I understand. So they are visible only in the hierarchy
mounted with the new cgroup API (sane or how is it called)?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Tejun Heo
Hello, Michal.

On Mon, Jun 16, 2014 at 02:59:15PM +0200, Michal Hocko wrote:
> > There sure is a question of how fast userland will move to the new
> > interface. 
> 
> Yeah, I was mostly thinking about those who would need to to bigger
> changes. AFAIR threads will no longer be distributable between groups.

Thread-level granularity should go away no matter what, but this is
completely irrelevant to memcg which can't do per-thread anyway.  For
whatever reason, a user is stuck with thread-level granularity for
controllers which work that way, the user can use the old hierarchies
for them for the time being.

> > is used but I don't think there's any chance of removing the knob.
> > There's a reason why we're introducing a new version of the whole
> > cgroup interface which can co-exist with the existing one after all.
> > If you wanna version memcg interface separately, maybe that'd work but
> > it sounds like a lot of extra hassle for not much gain.
> 
> No, I didn't mean to version the interface. I just wanted to have
> gradual transition for potential soft_limit users.
> 
> Maybe I am misunderstanding something but I thought that new version of
> API will contain all knobs which are not marked .flags = CFTYPE_INSANE
> while the old API will contain all of them.

Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
opposite.  Knobs marked with the flag only appear on the default
hierarchy (cgroup core internally calls it the default hierarchy as
this is the tree all the controllers are attached to by default).

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Thu 12-06-14 12:51:05, Johannes Weiner wrote:
> On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
> > On Thu 12-06-14 09:56:00, Johannes Weiner wrote:
> > > On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
> > [...]
> > > > Anyway, the situation now is pretty chaotic. I plan to gather all the
> > > > patchse posted so far and repost for the future discussion. I just need
> > > > to finish some internal tasks and will post it soon.
> > > 
> > > That would be great, thanks, it's really hard to follow this stuff
> > > halfway in and halfway outside of -mm.
> > > 
> > > Now that we roughly figured out what knobs and semantics we want, it
> > > would be great to figure out the merging logistics.
> > > 
> > > I would prefer if we could introduce max, high, low, min in unified
> > > hierarchy, and *only* in there, so that we never have to worry about
> > > it coexisting and interacting with the existing hard and soft limit.

Btw. what is the way to introduce a knob _only_ in the new cgroup API?
I am aware only about .flags = CFTYPE_INSANE which works other way
around.

> > The primary question would be, whether this is is the best transition
> > strategy. I do not know how many users apart from developers are really
> > using unified hierarchy. I would be worried that we merge a feature which
> > will not be used for a long time.
> 
> Unified hierarchy is the next version of the cgroup interface, and
> once the development tag drops I consider the old memcg interface
> deprecated. 

Deprecated in the unified hierarchy mount, right? There will be still
the old API around AFAIU. The deprecated knobs will be only not visible
in the new API. So we cannot simply remove all the code after unified
hierarchy drops its DEVEL status, can we?

> It makes very little sense to me to put up additional
> incentives at this point to continue the use of the old interface,
> when we already struggle with manpower to maintain even one of them.
> 
> > Moreover, if somebody wants to transition from soft limit then it would
> > be really hard because switching to unified hierarchy might be a no-go.
> >
> > I think that it is clear that we should deprecate soft_limit ASAP. I
> > also think it wont't hurt to have min, low, high in both old and unified
> > API and strongly warn if somebody tries to use soft_limit along with any
> > of the new APIs in the first step. Later we can even forbid any
> > combination by a hard failure.
> 
> Why would somebody NOT be able to convert to unified hierarchy
> eventually?

I've mentioned that in other email. I remember people complaining about
threads not being distributable over groups in the past. Things might
have changed in the mean time, I was too busy to pay closer attention so
I might be completely wrong here.

> How big is the intersection of cases that can't convert to unified
> hierarchy AND are using the soft limit AND want to use the new low
> limit?

I am not talking about intentional usage of soft limit with new knobs.
That would be unsupported of course and I meant to complain about that
in the logs and later even fail on an attempt.

> Merging a different concept with its own naming scheme into an already
> confusing interface, spamming the dmesg if someone gets it wrong,
> potentially introducing more breakage with the hard failure, putting
> up incentives to stick with a deprecated and confusing interface...
> This is a lot of horrible stuff in an attempt to accomodate very few
> usecases - if any - when we are *already versioning the interface* and
> have the opportunity for a clean transition.
> 
> The transition to min, low, high, max is effort in itself.  Conflating
> the two models sounds more detrimental than anything else, with a very
> dubious upside at that.
>
> > > It would also be beneficial to introduce them all close to each other,
> > > develop them together, possibly submit them in the same patch series,
> > > so that we know the requirements and how the code should look like in
> > > the big picture and can offer a fully consistent and documented usage
> > > model in the unified hierarchy.
> > 
> > Min and Low should definitely go together. High sounds like an
> > orthogonal problem (pro-active reclaim vs reclaim protection) so I think
> > it can go its own way and pace. We still have to discuss its semantic
> > and I feel it would be a bit disturbing to have everything in one
> > bundle.
> >
> > I do understand your point about the global picture, though. Do you
> > think that there is a risk that formulating semantic for High limit
> > might change the way how Min and Low would be defined?
> 
> I think one of the biggest hinderances in making forward progress on
> individual limits is that we only had a laundry list of occasionally
> conflicting requirements but never a consistent big picture to design
> around and match full usecases to.  It's much easier and less error
> prone to develop the concept as a whole, alongside full 

Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Thu 12-06-14 12:17:33, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
> > The primary question would be, whether this is is the best transition
> > strategy. I do not know how many users apart from developers are really
> > using unified hierarchy. I would be worried that we merge a feature which
> > will not be used for a long time.
> 
> I'm planning to drop __DEVEL__ mask from the unified hierarchy in a
> cycle, at most two. 

OK, I am obviously behind the current cgroup core changes. I thought
that unified hierarchy will be for development only for much more time.

> The biggest hold up at the moment is
> straightening out the interfaces and interaction between memcg and
> blkcg because I think it'd be silly to have to go through another
> round of interface versioning effort right after transitioning to
> unified hierarchy.  I'm not too confident whether it'd be possible to
> get blkcg completely in shape by that time, but, if that takes too
> long, I'll just leave blkcg behind temporarily.  So, at least from
> kernel side, it's not gonna be too long.
> 
> There sure is a question of how fast userland will move to the new
> interface. 

Yeah, I was mostly thinking about those who would need to to bigger
changes. AFAIR threads will no longer be distributable between groups.

> Some are already playing with unified hierarchy and
> planning to migrate as soon as possible but there sure will be others
> who will take more time.  Can't tell for sure, but the thing is that
> migration to min/low/high/max scheme is a signficant migration effort
> too, so I'm not sure how much we'd gain by doing that separately.
> It'd be an extra transition step for userland (optional but still),
> more combinations of configration to handle for memcg, and it's not
> like unified hierarchy is that difficult to transition to.
> 
> > Moreover, if somebody wants to transition from soft limit then it would
> > be really hard because switching to unified hierarchy might be a no-go.
> 
> Why would that be a no-go? 

I remember discussions about per-thread distributions and some other
things missing from the new API.

> Its usage is mostly similar with
> tranditional hierarchies and can be used with other hierarchies, so
> while it'd take some adaptation, in most cases gradual transition
> shouldn't be a big problem.

OK

> > I think that it is clear that we should deprecate soft_limit ASAP. I
> > also think it wont't hurt to have min, low, high in both old and unified
> > API and strongly warn if somebody tries to use soft_limit along with any
> > of the new APIs in the first step. Later we can even forbid any
> > combination by a hard failure.
> 
> I don't quite understand how you plan to deprecate it.  Sure you can
> fail with -EINVAL or whatnot when the wrong combination

Yes, I was thinking that direction. First warn and then EINVAL later.

> is used but I don't think there's any chance of removing the knob.
> There's a reason why we're introducing a new version of the whole
> cgroup interface which can co-exist with the existing one after all.
> If you wanna version memcg interface separately, maybe that'd work but
> it sounds like a lot of extra hassle for not much gain.

No, I didn't mean to version the interface. I just wanted to have
gradual transition for potential soft_limit users.

Maybe I am misunderstanding something but I thought that new version of
API will contain all knobs which are not marked .flags = CFTYPE_INSANE
while the old API will contain all of them.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Thu 12-06-14 12:17:33, Tejun Heo wrote:
 Hello, Michal.
 
 On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
  The primary question would be, whether this is is the best transition
  strategy. I do not know how many users apart from developers are really
  using unified hierarchy. I would be worried that we merge a feature which
  will not be used for a long time.
 
 I'm planning to drop __DEVEL__ mask from the unified hierarchy in a
 cycle, at most two. 

OK, I am obviously behind the current cgroup core changes. I thought
that unified hierarchy will be for development only for much more time.

 The biggest hold up at the moment is
 straightening out the interfaces and interaction between memcg and
 blkcg because I think it'd be silly to have to go through another
 round of interface versioning effort right after transitioning to
 unified hierarchy.  I'm not too confident whether it'd be possible to
 get blkcg completely in shape by that time, but, if that takes too
 long, I'll just leave blkcg behind temporarily.  So, at least from
 kernel side, it's not gonna be too long.
 
 There sure is a question of how fast userland will move to the new
 interface. 

Yeah, I was mostly thinking about those who would need to to bigger
changes. AFAIR threads will no longer be distributable between groups.

 Some are already playing with unified hierarchy and
 planning to migrate as soon as possible but there sure will be others
 who will take more time.  Can't tell for sure, but the thing is that
 migration to min/low/high/max scheme is a signficant migration effort
 too, so I'm not sure how much we'd gain by doing that separately.
 It'd be an extra transition step for userland (optional but still),
 more combinations of configration to handle for memcg, and it's not
 like unified hierarchy is that difficult to transition to.
 
  Moreover, if somebody wants to transition from soft limit then it would
  be really hard because switching to unified hierarchy might be a no-go.
 
 Why would that be a no-go? 

I remember discussions about per-thread distributions and some other
things missing from the new API.

 Its usage is mostly similar with
 tranditional hierarchies and can be used with other hierarchies, so
 while it'd take some adaptation, in most cases gradual transition
 shouldn't be a big problem.

OK

  I think that it is clear that we should deprecate soft_limit ASAP. I
  also think it wont't hurt to have min, low, high in both old and unified
  API and strongly warn if somebody tries to use soft_limit along with any
  of the new APIs in the first step. Later we can even forbid any
  combination by a hard failure.
 
 I don't quite understand how you plan to deprecate it.  Sure you can
 fail with -EINVAL or whatnot when the wrong combination

Yes, I was thinking that direction. First warn and then EINVAL later.

 is used but I don't think there's any chance of removing the knob.
 There's a reason why we're introducing a new version of the whole
 cgroup interface which can co-exist with the existing one after all.
 If you wanna version memcg interface separately, maybe that'd work but
 it sounds like a lot of extra hassle for not much gain.

No, I didn't mean to version the interface. I just wanted to have
gradual transition for potential soft_limit users.

Maybe I am misunderstanding something but I thought that new version of
API will contain all knobs which are not marked .flags = CFTYPE_INSANE
while the old API will contain all of them.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Thu 12-06-14 12:51:05, Johannes Weiner wrote:
 On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
  On Thu 12-06-14 09:56:00, Johannes Weiner wrote:
   On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
  [...]
Anyway, the situation now is pretty chaotic. I plan to gather all the
patchse posted so far and repost for the future discussion. I just need
to finish some internal tasks and will post it soon.
   
   That would be great, thanks, it's really hard to follow this stuff
   halfway in and halfway outside of -mm.
   
   Now that we roughly figured out what knobs and semantics we want, it
   would be great to figure out the merging logistics.
   
   I would prefer if we could introduce max, high, low, min in unified
   hierarchy, and *only* in there, so that we never have to worry about
   it coexisting and interacting with the existing hard and soft limit.

Btw. what is the way to introduce a knob _only_ in the new cgroup API?
I am aware only about .flags = CFTYPE_INSANE which works other way
around.

  The primary question would be, whether this is is the best transition
  strategy. I do not know how many users apart from developers are really
  using unified hierarchy. I would be worried that we merge a feature which
  will not be used for a long time.
 
 Unified hierarchy is the next version of the cgroup interface, and
 once the development tag drops I consider the old memcg interface
 deprecated. 

Deprecated in the unified hierarchy mount, right? There will be still
the old API around AFAIU. The deprecated knobs will be only not visible
in the new API. So we cannot simply remove all the code after unified
hierarchy drops its DEVEL status, can we?

 It makes very little sense to me to put up additional
 incentives at this point to continue the use of the old interface,
 when we already struggle with manpower to maintain even one of them.
 
  Moreover, if somebody wants to transition from soft limit then it would
  be really hard because switching to unified hierarchy might be a no-go.
 
  I think that it is clear that we should deprecate soft_limit ASAP. I
  also think it wont't hurt to have min, low, high in both old and unified
  API and strongly warn if somebody tries to use soft_limit along with any
  of the new APIs in the first step. Later we can even forbid any
  combination by a hard failure.
 
 Why would somebody NOT be able to convert to unified hierarchy
 eventually?

I've mentioned that in other email. I remember people complaining about
threads not being distributable over groups in the past. Things might
have changed in the mean time, I was too busy to pay closer attention so
I might be completely wrong here.

 How big is the intersection of cases that can't convert to unified
 hierarchy AND are using the soft limit AND want to use the new low
 limit?

I am not talking about intentional usage of soft limit with new knobs.
That would be unsupported of course and I meant to complain about that
in the logs and later even fail on an attempt.

 Merging a different concept with its own naming scheme into an already
 confusing interface, spamming the dmesg if someone gets it wrong,
 potentially introducing more breakage with the hard failure, putting
 up incentives to stick with a deprecated and confusing interface...
 This is a lot of horrible stuff in an attempt to accomodate very few
 usecases - if any - when we are *already versioning the interface* and
 have the opportunity for a clean transition.
 
 The transition to min, low, high, max is effort in itself.  Conflating
 the two models sounds more detrimental than anything else, with a very
 dubious upside at that.

   It would also be beneficial to introduce them all close to each other,
   develop them together, possibly submit them in the same patch series,
   so that we know the requirements and how the code should look like in
   the big picture and can offer a fully consistent and documented usage
   model in the unified hierarchy.
  
  Min and Low should definitely go together. High sounds like an
  orthogonal problem (pro-active reclaim vs reclaim protection) so I think
  it can go its own way and pace. We still have to discuss its semantic
  and I feel it would be a bit disturbing to have everything in one
  bundle.
 
  I do understand your point about the global picture, though. Do you
  think that there is a risk that formulating semantic for High limit
  might change the way how Min and Low would be defined?
 
 I think one of the biggest hinderances in making forward progress on
 individual limits is that we only had a laundry list of occasionally
 conflicting requirements but never a consistent big picture to design
 around and match full usecases to.  It's much easier and less error
 prone to develop the concept as a whole, alongside full real-life
 configurations.
 
 They are symmetrical pieces whose semantics very much depend on each
 other, so I wouldn't like too much lag between 

Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Tejun Heo
Hello, Michal.

On Mon, Jun 16, 2014 at 02:59:15PM +0200, Michal Hocko wrote:
  There sure is a question of how fast userland will move to the new
  interface. 
 
 Yeah, I was mostly thinking about those who would need to to bigger
 changes. AFAIR threads will no longer be distributable between groups.

Thread-level granularity should go away no matter what, but this is
completely irrelevant to memcg which can't do per-thread anyway.  For
whatever reason, a user is stuck with thread-level granularity for
controllers which work that way, the user can use the old hierarchies
for them for the time being.

  is used but I don't think there's any chance of removing the knob.
  There's a reason why we're introducing a new version of the whole
  cgroup interface which can co-exist with the existing one after all.
  If you wanna version memcg interface separately, maybe that'd work but
  it sounds like a lot of extra hassle for not much gain.
 
 No, I didn't mean to version the interface. I just wanted to have
 gradual transition for potential soft_limit users.
 
 Maybe I am misunderstanding something but I thought that new version of
 API will contain all knobs which are not marked .flags = CFTYPE_INSANE
 while the old API will contain all of them.

Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
opposite.  Knobs marked with the flag only appear on the default
hierarchy (cgroup core internally calls it the default hierarchy as
this is the tree all the controllers are attached to by default).

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Mon 16-06-14 09:57:41, Tejun Heo wrote:
 Hello, Michal.
 
 On Mon, Jun 16, 2014 at 02:59:15PM +0200, Michal Hocko wrote:
   There sure is a question of how fast userland will move to the new
   interface. 
  
  Yeah, I was mostly thinking about those who would need to to bigger
  changes. AFAIR threads will no longer be distributable between groups.
 
 Thread-level granularity should go away no matter what, but this is
 completely irrelevant to memcg which can't do per-thread anyway.

Yes, I wasn't afraid about memcg. It was a setup which requires more
controllers that I was worried about.

 For whatever reason, a user is stuck with thread-level granularity for
 controllers which work that way, the user can use the old hierarchies
 for them for the time being.

So he can mount memcg with new cgroup API and others with old?

   is used but I don't think there's any chance of removing the knob.
   There's a reason why we're introducing a new version of the whole
   cgroup interface which can co-exist with the existing one after all.
   If you wanna version memcg interface separately, maybe that'd work but
   it sounds like a lot of extra hassle for not much gain.
  
  No, I didn't mean to version the interface. I just wanted to have
  gradual transition for potential soft_limit users.
  
  Maybe I am misunderstanding something but I thought that new version of
  API will contain all knobs which are not marked .flags = CFTYPE_INSANE
  while the old API will contain all of them.
 
 Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
 opposite. 

OK, I wasn't aware of this. On which branch I find this?

 Knobs marked with the flag only appear on the default
 hierarchy (cgroup core internally calls it the default hierarchy as
 this is the tree all the controllers are attached to by default).

I am not sure I understand. So they are visible only in the hierarchy
mounted with the new cgroup API (sane or how is it called)?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Tejun Heo
On Mon, Jun 16, 2014 at 04:04:48PM +0200, Michal Hocko wrote:
  For whatever reason, a user is stuck with thread-level granularity for
  controllers which work that way, the user can use the old hierarchies
  for them for the time being.
 
 So he can mount memcg with new cgroup API and others with old?

Yes, you can read Documentation/cgroups/unified-hierarchy.txt for more
details.  I think I cc'd you when posting unified hierarchy patchset,
didn't I?

  Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
  opposite. 
 
 OK, I wasn't aware of this. On which branch I find this?

They're all in the mainline now.

  Knobs marked with the flag only appear on the default
  hierarchy (cgroup core internally calls it the default hierarchy as
  this is the tree all the controllers are attached to by default).
 
 I am not sure I understand. So they are visible only in the hierarchy
 mounted with the new cgroup API (sane or how is it called)?

Yeap.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Michal Hocko
On Mon 16-06-14 10:12:33, Tejun Heo wrote:
 On Mon, Jun 16, 2014 at 04:04:48PM +0200, Michal Hocko wrote:
   For whatever reason, a user is stuck with thread-level granularity for
   controllers which work that way, the user can use the old hierarchies
   for them for the time being.
  
  So he can mount memcg with new cgroup API and others with old?
 
 Yes, you can read Documentation/cgroups/unified-hierarchy.txt for more
 details.  I think I cc'd you when posting unified hierarchy patchset,
 didn't I?

OK, I've obviously pushed that out of my brain, because you are really
clear about it:

All controllers which are not bound to other hierarchies are
automatically bound to unified hierarchy and show up at the root of
it. Controllers which are enabled only in the root of unified
hierarchy can be bound to other hierarchies at any time.  This allows
mixing unified hierarchy with the traditional multiple hierarchies in
a fully backward compatible way.


This of course sorts out my concerns. Sorry about the noise!

   Nope, some changes don't fit that model.  CFTYPE_ON_ON_DFL is the
   opposite. 
  
  OK, I wasn't aware of this. On which branch I find this?
 
 They're all in the mainline now.

git grep CFTYPE_ON_ON_DFL origin/master didn't show me anything.

Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-16 Thread Tejun Heo
On Mon, Jun 16, 2014 at 04:29:15PM +0200, Michal Hocko wrote:
  They're all in the mainline now.
 
 git grep CFTYPE_ON_ON_DFL origin/master didn't show me anything.

lol, it should have been CFTYPE_ONLY_ON_DFL.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Johannes Weiner
On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
> On Thu 12-06-14 09:56:00, Johannes Weiner wrote:
> > On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
> [...]
> > > Anyway, the situation now is pretty chaotic. I plan to gather all the
> > > patchse posted so far and repost for the future discussion. I just need
> > > to finish some internal tasks and will post it soon.
> > 
> > That would be great, thanks, it's really hard to follow this stuff
> > halfway in and halfway outside of -mm.
> > 
> > Now that we roughly figured out what knobs and semantics we want, it
> > would be great to figure out the merging logistics.
> > 
> > I would prefer if we could introduce max, high, low, min in unified
> > hierarchy, and *only* in there, so that we never have to worry about
> > it coexisting and interacting with the existing hard and soft limit.
> 
> The primary question would be, whether this is is the best transition
> strategy. I do not know how many users apart from developers are really
> using unified hierarchy. I would be worried that we merge a feature which
> will not be used for a long time.

Unified hierarchy is the next version of the cgroup interface, and
once the development tag drops I consider the old memcg interface
deprecated.  It makes very little sense to me to put up additional
incentives at this point to continue the use of the old interface,
when we already struggle with manpower to maintain even one of them.

> Moreover, if somebody wants to transition from soft limit then it would
> be really hard because switching to unified hierarchy might be a no-go.
>
> I think that it is clear that we should deprecate soft_limit ASAP. I
> also think it wont't hurt to have min, low, high in both old and unified
> API and strongly warn if somebody tries to use soft_limit along with any
> of the new APIs in the first step. Later we can even forbid any
> combination by a hard failure.

Why would somebody NOT be able to convert to unified hierarchy
eventually?

How big is the intersection of cases that can't convert to unified
hierarchy AND are using the soft limit AND want to use the new low
limit?

Merging a different concept with its own naming scheme into an already
confusing interface, spamming the dmesg if someone gets it wrong,
potentially introducing more breakage with the hard failure, putting
up incentives to stick with a deprecated and confusing interface...
This is a lot of horrible stuff in an attempt to accomodate very few
usecases - if any - when we are *already versioning the interface* and
have the opportunity for a clean transition.

The transition to min, low, high, max is effort in itself.  Conflating
the two models sounds more detrimental than anything else, with a very
dubious upside at that.

> > It would also be beneficial to introduce them all close to each other,
> > develop them together, possibly submit them in the same patch series,
> > so that we know the requirements and how the code should look like in
> > the big picture and can offer a fully consistent and documented usage
> > model in the unified hierarchy.
> 
> Min and Low should definitely go together. High sounds like an
> orthogonal problem (pro-active reclaim vs reclaim protection) so I think
> it can go its own way and pace. We still have to discuss its semantic
> and I feel it would be a bit disturbing to have everything in one
> bundle.
>
> I do understand your point about the global picture, though. Do you
> think that there is a risk that formulating semantic for High limit
> might change the way how Min and Low would be defined?

I think one of the biggest hinderances in making forward progress on
individual limits is that we only had a laundry list of occasionally
conflicting requirements but never a consistent big picture to design
around and match full usecases to.  It's much easier and less error
prone to develop the concept as a whole, alongside full real-life
configurations.

They are symmetrical pieces whose semantics very much depend on each
other, so I wouldn't like too much lag between those.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Tejun Heo
Hello, Michal.

On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
> The primary question would be, whether this is is the best transition
> strategy. I do not know how many users apart from developers are really
> using unified hierarchy. I would be worried that we merge a feature which
> will not be used for a long time.

I'm planning to drop __DEVEL__ mask from the unified hierarchy in a
cycle, at most two.  The biggest hold up at the moment is
straightening out the interfaces and interaction between memcg and
blkcg because I think it'd be silly to have to go through another
round of interface versioning effort right after transitioning to
unified hierarchy.  I'm not too confident whether it'd be possible to
get blkcg completely in shape by that time, but, if that takes too
long, I'll just leave blkcg behind temporarily.  So, at least from
kernel side, it's not gonna be too long.

There sure is a question of how fast userland will move to the new
interface.  Some are already playing with unified hierarchy and
planning to migrate as soon as possible but there sure will be others
who will take more time.  Can't tell for sure, but the thing is that
migration to min/low/high/max scheme is a signficant migration effort
too, so I'm not sure how much we'd gain by doing that separately.
It'd be an extra transition step for userland (optional but still),
more combinations of configration to handle for memcg, and it's not
like unified hierarchy is that difficult to transition to.

> Moreover, if somebody wants to transition from soft limit then it would
> be really hard because switching to unified hierarchy might be a no-go.

Why would that be a no-go?  Its usage is mostly similar with
tranditional hierarchies and can be used with other hierarchies, so
while it'd take some adaptation, in most cases gradual transition
shouldn't be a big problem.

> I think that it is clear that we should deprecate soft_limit ASAP. I
> also think it wont't hurt to have min, low, high in both old and unified
> API and strongly warn if somebody tries to use soft_limit along with any
> of the new APIs in the first step. Later we can even forbid any
> combination by a hard failure.

I don't quite understand how you plan to deprecate it.  Sure you can
fail with -EINVAL or whatnot when the wrong combination is used but I
don't think there's any chance of removing the knob.  There's a reason
why we're introducing a new version of the whole cgroup interface
which can co-exist with the existing one after all.  If you wanna
version memcg interface separately, maybe that'd work but it sounds
like a lot of extra hassle for not much gain.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Michal Hocko
On Thu 12-06-14 09:56:00, Johannes Weiner wrote:
> On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
[...]
> > Anyway, the situation now is pretty chaotic. I plan to gather all the
> > patchse posted so far and repost for the future discussion. I just need
> > to finish some internal tasks and will post it soon.
> 
> That would be great, thanks, it's really hard to follow this stuff
> halfway in and halfway outside of -mm.
> 
> Now that we roughly figured out what knobs and semantics we want, it
> would be great to figure out the merging logistics.
> 
> I would prefer if we could introduce max, high, low, min in unified
> hierarchy, and *only* in there, so that we never have to worry about
> it coexisting and interacting with the existing hard and soft limit.

The primary question would be, whether this is is the best transition
strategy. I do not know how many users apart from developers are really
using unified hierarchy. I would be worried that we merge a feature which
will not be used for a long time.

Moreover, if somebody wants to transition from soft limit then it would
be really hard because switching to unified hierarchy might be a no-go.

I think that it is clear that we should deprecate soft_limit ASAP. I
also think it wont't hurt to have min, low, high in both old and unified
API and strongly warn if somebody tries to use soft_limit along with any
of the new APIs in the first step. Later we can even forbid any
combination by a hard failure.

> It would also be beneficial to introduce them all close to each other,
> develop them together, possibly submit them in the same patch series,
> so that we know the requirements and how the code should look like in
> the big picture and can offer a fully consistent and documented usage
> model in the unified hierarchy.

Min and Low should definitely go together. High sounds like an
orthogonal problem (pro-active reclaim vs reclaim protection) so I think
it can go its own way and pace. We still have to discuss its semantic
and I feel it would be a bit disturbing to have everything in one
bundle. 
I do understand your point about the global picture, though. Do you
think that there is a risk that formulating semantic for High limit
might change the way how Min and Low would be defined?

> Does that make sense?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Johannes Weiner
On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
> On Wed 11-06-14 11:36:31, Johannes Weiner wrote:
> [...]
> > This code is truly dreadful.
> > 
> > Don't call it guarantee when it doesn't guarantee anything.  I thought
> > we agreed that min, low, high, max, is reasonable nomenclature, please
> > use it consistently.
> 
> I can certainly change the internal naming. I will use your wmark naming
> suggestion.

Cool, thanks.

> > With my proposed cleanups and scalability fixes in the other mail, the
> > vmscan.c changes to support the min watermark would be something like
> > the following.
> 
> The semantic is, however, much different as pointed out in the other email.
> The following on top of you cleanup will lead to the same deadlock
> described in 1st patch (mm, memcg: allow OOM if no memcg is eligible
> during direct reclaim).

I'm currently reworking shrink_zones() and getting rid of
all_unreclaimable() etc. to remove the code duplication.

> Anyway, the situation now is pretty chaotic. I plan to gather all the
> patchse posted so far and repost for the future discussion. I just need
> to finish some internal tasks and will post it soon.

That would be great, thanks, it's really hard to follow this stuff
halfway in and halfway outside of -mm.

Now that we roughly figured out what knobs and semantics we want, it
would be great to figure out the merging logistics.

I would prefer if we could introduce max, high, low, min in unified
hierarchy, and *only* in there, so that we never have to worry about
it coexisting and interacting with the existing hard and soft limit.

It would also be beneficial to introduce them all close to each other,
develop them together, possibly submit them in the same patch series,
so that we know the requirements and how the code should look like in
the big picture and can offer a fully consistent and documented usage
model in the unified hierarchy.

Does that make sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Michal Hocko
On Wed 11-06-14 11:36:31, Johannes Weiner wrote:
[...]
> This code is truly dreadful.
> 
> Don't call it guarantee when it doesn't guarantee anything.  I thought
> we agreed that min, low, high, max, is reasonable nomenclature, please
> use it consistently.

I can certainly change the internal naming. I will use your wmark naming
suggestion.
 
> With my proposed cleanups and scalability fixes in the other mail, the
> vmscan.c changes to support the min watermark would be something like
> the following.

The semantic is, however, much different as pointed out in the other email.
The following on top of you cleanup will lead to the same deadlock
described in 1st patch (mm, memcg: allow OOM if no memcg is eligible
during direct reclaim).

Anyway, the situation now is pretty chaotic. I plan to gather all the
patchse posted so far and repost for the future discussion. I just need
to finish some internal tasks and will post it soon.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 687076b7a1a6..cee19b6d04dc 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2259,7 +2259,7 @@ static void shrink_zone(struct zone *zone, struct 
> scan_control *sc)
>*/
>   if (priority < DEF_PRIORITY - 2)
>   break;
> -
> + case MEMCG_WMARK_MIN:
>   /* XXX: skip the whole subtree */
>   memcg = mem_cgroup_iter(root, memcg, );
>   continue;
> 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Michal Hocko
On Wed 11-06-14 11:36:31, Johannes Weiner wrote:
[...]
 This code is truly dreadful.
 
 Don't call it guarantee when it doesn't guarantee anything.  I thought
 we agreed that min, low, high, max, is reasonable nomenclature, please
 use it consistently.

I can certainly change the internal naming. I will use your wmark naming
suggestion.
 
 With my proposed cleanups and scalability fixes in the other mail, the
 vmscan.c changes to support the min watermark would be something like
 the following.

The semantic is, however, much different as pointed out in the other email.
The following on top of you cleanup will lead to the same deadlock
described in 1st patch (mm, memcg: allow OOM if no memcg is eligible
during direct reclaim).

Anyway, the situation now is pretty chaotic. I plan to gather all the
patchse posted so far and repost for the future discussion. I just need
to finish some internal tasks and will post it soon.

 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 687076b7a1a6..cee19b6d04dc 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -2259,7 +2259,7 @@ static void shrink_zone(struct zone *zone, struct 
 scan_control *sc)
*/
   if (priority  DEF_PRIORITY - 2)
   break;
 -
 + case MEMCG_WMARK_MIN:
   /* XXX: skip the whole subtree */
   memcg = mem_cgroup_iter(root, memcg, reclaim);
   continue;
 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Johannes Weiner
On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
 On Wed 11-06-14 11:36:31, Johannes Weiner wrote:
 [...]
  This code is truly dreadful.
  
  Don't call it guarantee when it doesn't guarantee anything.  I thought
  we agreed that min, low, high, max, is reasonable nomenclature, please
  use it consistently.
 
 I can certainly change the internal naming. I will use your wmark naming
 suggestion.

Cool, thanks.

  With my proposed cleanups and scalability fixes in the other mail, the
  vmscan.c changes to support the min watermark would be something like
  the following.
 
 The semantic is, however, much different as pointed out in the other email.
 The following on top of you cleanup will lead to the same deadlock
 described in 1st patch (mm, memcg: allow OOM if no memcg is eligible
 during direct reclaim).

I'm currently reworking shrink_zones() and getting rid of
all_unreclaimable() etc. to remove the code duplication.

 Anyway, the situation now is pretty chaotic. I plan to gather all the
 patchse posted so far and repost for the future discussion. I just need
 to finish some internal tasks and will post it soon.

That would be great, thanks, it's really hard to follow this stuff
halfway in and halfway outside of -mm.

Now that we roughly figured out what knobs and semantics we want, it
would be great to figure out the merging logistics.

I would prefer if we could introduce max, high, low, min in unified
hierarchy, and *only* in there, so that we never have to worry about
it coexisting and interacting with the existing hard and soft limit.

It would also be beneficial to introduce them all close to each other,
develop them together, possibly submit them in the same patch series,
so that we know the requirements and how the code should look like in
the big picture and can offer a fully consistent and documented usage
model in the unified hierarchy.

Does that make sense?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Michal Hocko
On Thu 12-06-14 09:56:00, Johannes Weiner wrote:
 On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
[...]
  Anyway, the situation now is pretty chaotic. I plan to gather all the
  patchse posted so far and repost for the future discussion. I just need
  to finish some internal tasks and will post it soon.
 
 That would be great, thanks, it's really hard to follow this stuff
 halfway in and halfway outside of -mm.
 
 Now that we roughly figured out what knobs and semantics we want, it
 would be great to figure out the merging logistics.
 
 I would prefer if we could introduce max, high, low, min in unified
 hierarchy, and *only* in there, so that we never have to worry about
 it coexisting and interacting with the existing hard and soft limit.

The primary question would be, whether this is is the best transition
strategy. I do not know how many users apart from developers are really
using unified hierarchy. I would be worried that we merge a feature which
will not be used for a long time.

Moreover, if somebody wants to transition from soft limit then it would
be really hard because switching to unified hierarchy might be a no-go.

I think that it is clear that we should deprecate soft_limit ASAP. I
also think it wont't hurt to have min, low, high in both old and unified
API and strongly warn if somebody tries to use soft_limit along with any
of the new APIs in the first step. Later we can even forbid any
combination by a hard failure.

 It would also be beneficial to introduce them all close to each other,
 develop them together, possibly submit them in the same patch series,
 so that we know the requirements and how the code should look like in
 the big picture and can offer a fully consistent and documented usage
 model in the unified hierarchy.

Min and Low should definitely go together. High sounds like an
orthogonal problem (pro-active reclaim vs reclaim protection) so I think
it can go its own way and pace. We still have to discuss its semantic
and I feel it would be a bit disturbing to have everything in one
bundle. 
I do understand your point about the global picture, though. Do you
think that there is a risk that formulating semantic for High limit
might change the way how Min and Low would be defined?

 Does that make sense?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Tejun Heo
Hello, Michal.

On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
 The primary question would be, whether this is is the best transition
 strategy. I do not know how many users apart from developers are really
 using unified hierarchy. I would be worried that we merge a feature which
 will not be used for a long time.

I'm planning to drop __DEVEL__ mask from the unified hierarchy in a
cycle, at most two.  The biggest hold up at the moment is
straightening out the interfaces and interaction between memcg and
blkcg because I think it'd be silly to have to go through another
round of interface versioning effort right after transitioning to
unified hierarchy.  I'm not too confident whether it'd be possible to
get blkcg completely in shape by that time, but, if that takes too
long, I'll just leave blkcg behind temporarily.  So, at least from
kernel side, it's not gonna be too long.

There sure is a question of how fast userland will move to the new
interface.  Some are already playing with unified hierarchy and
planning to migrate as soon as possible but there sure will be others
who will take more time.  Can't tell for sure, but the thing is that
migration to min/low/high/max scheme is a signficant migration effort
too, so I'm not sure how much we'd gain by doing that separately.
It'd be an extra transition step for userland (optional but still),
more combinations of configration to handle for memcg, and it's not
like unified hierarchy is that difficult to transition to.

 Moreover, if somebody wants to transition from soft limit then it would
 be really hard because switching to unified hierarchy might be a no-go.

Why would that be a no-go?  Its usage is mostly similar with
tranditional hierarchies and can be used with other hierarchies, so
while it'd take some adaptation, in most cases gradual transition
shouldn't be a big problem.

 I think that it is clear that we should deprecate soft_limit ASAP. I
 also think it wont't hurt to have min, low, high in both old and unified
 API and strongly warn if somebody tries to use soft_limit along with any
 of the new APIs in the first step. Later we can even forbid any
 combination by a hard failure.

I don't quite understand how you plan to deprecate it.  Sure you can
fail with -EINVAL or whatnot when the wrong combination is used but I
don't think there's any chance of removing the knob.  There's a reason
why we're introducing a new version of the whole cgroup interface
which can co-exist with the existing one after all.  If you wanna
version memcg interface separately, maybe that'd work but it sounds
like a lot of extra hassle for not much gain.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-12 Thread Johannes Weiner
On Thu, Jun 12, 2014 at 04:22:37PM +0200, Michal Hocko wrote:
 On Thu 12-06-14 09:56:00, Johannes Weiner wrote:
  On Thu, Jun 12, 2014 at 03:22:07PM +0200, Michal Hocko wrote:
 [...]
   Anyway, the situation now is pretty chaotic. I plan to gather all the
   patchse posted so far and repost for the future discussion. I just need
   to finish some internal tasks and will post it soon.
  
  That would be great, thanks, it's really hard to follow this stuff
  halfway in and halfway outside of -mm.
  
  Now that we roughly figured out what knobs and semantics we want, it
  would be great to figure out the merging logistics.
  
  I would prefer if we could introduce max, high, low, min in unified
  hierarchy, and *only* in there, so that we never have to worry about
  it coexisting and interacting with the existing hard and soft limit.
 
 The primary question would be, whether this is is the best transition
 strategy. I do not know how many users apart from developers are really
 using unified hierarchy. I would be worried that we merge a feature which
 will not be used for a long time.

Unified hierarchy is the next version of the cgroup interface, and
once the development tag drops I consider the old memcg interface
deprecated.  It makes very little sense to me to put up additional
incentives at this point to continue the use of the old interface,
when we already struggle with manpower to maintain even one of them.

 Moreover, if somebody wants to transition from soft limit then it would
 be really hard because switching to unified hierarchy might be a no-go.

 I think that it is clear that we should deprecate soft_limit ASAP. I
 also think it wont't hurt to have min, low, high in both old and unified
 API and strongly warn if somebody tries to use soft_limit along with any
 of the new APIs in the first step. Later we can even forbid any
 combination by a hard failure.

Why would somebody NOT be able to convert to unified hierarchy
eventually?

How big is the intersection of cases that can't convert to unified
hierarchy AND are using the soft limit AND want to use the new low
limit?

Merging a different concept with its own naming scheme into an already
confusing interface, spamming the dmesg if someone gets it wrong,
potentially introducing more breakage with the hard failure, putting
up incentives to stick with a deprecated and confusing interface...
This is a lot of horrible stuff in an attempt to accomodate very few
usecases - if any - when we are *already versioning the interface* and
have the opportunity for a clean transition.

The transition to min, low, high, max is effort in itself.  Conflating
the two models sounds more detrimental than anything else, with a very
dubious upside at that.

  It would also be beneficial to introduce them all close to each other,
  develop them together, possibly submit them in the same patch series,
  so that we know the requirements and how the code should look like in
  the big picture and can offer a fully consistent and documented usage
  model in the unified hierarchy.
 
 Min and Low should definitely go together. High sounds like an
 orthogonal problem (pro-active reclaim vs reclaim protection) so I think
 it can go its own way and pace. We still have to discuss its semantic
 and I feel it would be a bit disturbing to have everything in one
 bundle.

 I do understand your point about the global picture, though. Do you
 think that there is a risk that formulating semantic for High limit
 might change the way how Min and Low would be defined?

I think one of the biggest hinderances in making forward progress on
individual limits is that we only had a laundry list of occasionally
conflicting requirements but never a consistent big picture to design
around and match full usecases to.  It's much easier and less error
prone to develop the concept as a whole, alongside full real-life
configurations.

They are symmetrical pieces whose semantics very much depend on each
other, so I wouldn't like too much lag between those.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-11 Thread Johannes Weiner
On Wed, Jun 11, 2014 at 10:00:24AM +0200, Michal Hocko wrote:
> Some users (e.g. Google) would like to have stronger semantic than low
> limit offers currently. The fallback mode is not desirable and they
> prefer hitting OOM killer rather than ignoring low limit for protected
> groups.
> 
> There are other possible usecases which can benefit from hard
> guarantees. There are loads which will simply start trashing if the
> memory working set drops under certain level and it is more appropriate
> to simply kill and restart such a load if the required memory cannot
> be provided. Another usecase would be a hard memory isolation for
> containers.
> 
> The min_limit is initialized to 0 and it has precedence over low_limit.
> If the reclaim is not able to find any memcg in the reclaimed hierarchy
> above min_limit then OOM killer is triggered to resolve the situation.
> 
> Signed-off-by: Michal Hocko 
> ---

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 99137aecd95f..8e844bd42c51 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2220,13 +2220,12 @@ static inline bool should_continue_reclaim(struct 
> zone *zone,
>   *
>   * @zone: zone to shrink
>   * @sc: scan control with additional reclaim parameters
> - * @honor_memcg_guarantee: do not reclaim memcgs which are within their 
> memory
> - * guarantee
> + * @soft_guarantee: Use soft guarantee reclaim target for memcg reclaim.
>   *
>   * Returns the number of reclaimed memcgs.
>   */
>  static unsigned __shrink_zone(struct zone *zone, struct scan_control *sc,
> - bool honor_memcg_guarantee)
> + bool soft_guarantee)
>  {
>   unsigned long nr_reclaimed, nr_scanned;
>   unsigned nr_scanned_groups = 0;
> @@ -2245,11 +2244,10 @@ static unsigned __shrink_zone(struct zone *zone, 
> struct scan_control *sc,
>   memcg = mem_cgroup_iter(root, NULL, );
>   do {
>   struct lruvec *lruvec;
> - bool within_guarantee;
>  
>   /* Memcg might be protected from the reclaim */
> - within_guarantee = mem_cgroup_within_guarantee(memcg, 
> root);
> - if (honor_memcg_guarantee && within_guarantee) {
> + if (mem_cgroup_within_guarantee(memcg, root,
> + soft_guarantee)) {
>   /*
>* It would be more optimal to skip the memcg
>* subtree now but we do not have a memcg iter
> @@ -2259,8 +2257,8 @@ static unsigned __shrink_zone(struct zone *zone, struct 
> scan_control *sc,
>   continue;
>   }
>  
> - if (within_guarantee)
> - mem_cgroup_guarantee_breached(memcg);
> + if (!soft_guarantee)
> + mem_cgroup_soft_guarantee_breached(memcg);
>  
>   lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>   nr_scanned_groups++;
> @@ -2297,20 +2295,27 @@ static unsigned __shrink_zone(struct zone *zone, 
> struct scan_control *sc,
>  
>  static void shrink_zone(struct zone *zone, struct scan_control *sc)
>  {
> - bool honor_guarantee = true;
> + bool soft_guarantee = true;
>  
> - while (!__shrink_zone(zone, sc, honor_guarantee)) {
> + while (!__shrink_zone(zone, sc, soft_guarantee)) {
>   /*
>* The previous round of reclaim didn't find anything to scan
>* because
> -  * a) the whole reclaimed hierarchy is within guarantee so
> -  *we fallback to ignore the guarantee because other option
> -  *would be the OOM
> +  * a) the whole reclaimed hierarchy is within soft guarantee so
> +  *we are switching to the hard guarantee reclaim target
>* b) multiple reclaimers are racing and so the first round
>*should be retried
>*/
> - if (mem_cgroup_all_within_guarantee(sc->target_mem_cgroup))
> - honor_guarantee = false;
> + if (mem_cgroup_all_within_guarantee(sc->target_mem_cgroup,
> + soft_guarantee)) {
> + /*
> +  * Nothing to reclaim even with hard guarantees so
> +  * we have to OOM
> +  */
> + if (!soft_guarantee)
> + break;
> + soft_guarantee = false;
> + }
>   }
>  }
>  
> @@ -2574,7 +2579,8 @@ out:
>* If the target memcg is not eligible for reclaim then we have no 
> option
>* but OOM
>*/
> - if (!sc->nr_scanned && 
> mem_cgroup_all_within_guarantee(sc->target_mem_cgroup))
> + if (!sc->nr_scanned &&
> + 

[PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-11 Thread Michal Hocko
Some users (e.g. Google) would like to have stronger semantic than low
limit offers currently. The fallback mode is not desirable and they
prefer hitting OOM killer rather than ignoring low limit for protected
groups.

There are other possible usecases which can benefit from hard
guarantees. There are loads which will simply start trashing if the
memory working set drops under certain level and it is more appropriate
to simply kill and restart such a load if the required memory cannot
be provided. Another usecase would be a hard memory isolation for
containers.

The min_limit is initialized to 0 and it has precedence over low_limit.
If the reclaim is not able to find any memcg in the reclaimed hierarchy
above min_limit then OOM killer is triggered to resolve the situation.

Signed-off-by: Michal Hocko 
---
 Documentation/cgroups/memory.txt | 26 ++
 include/linux/memcontrol.h   | 14 --
 include/linux/res_counter.h  | 32 ++--
 mm/memcontrol.c  | 18 +++---
 mm/oom_kill.c|  6 --
 mm/vmscan.c  | 38 ++
 6 files changed, 93 insertions(+), 41 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index bf895d7e1363..6929a06c9e5d 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -61,6 +61,7 @@ Brief summary of control files.
  memory.low_limit_breached  # number of times low_limit has been
 # ignored and the cgroup reclaimed even
 # when it was above the limit
+ memory.min_limit_in_bytes  # set/show min limit for memory reclaim
  memory.memsw.limit_in_bytes# set/show limit of memory+Swap usage
  memory.failcnt # show the number of memory usage hits 
limits
  memory.memsw.failcnt   # show the number of memory+Swap hits limits
@@ -248,14 +249,23 @@ global VM. Cgroups can get reclaimed basically under two 
conditions
to select and kill the bulkiest task in the hiearchy. (See 10. OOM Control
below.)
 
-Groups might be also protected from both global and limit reclaim by
-low_limit_in_bytes knob. If the limit is non-zero the reclaim logic
-doesn't include groups (and their subgroups - see 6. Hierarchy support)
-which are below the low limit if there is other eligible cgroup in the
-reclaimed hierarchy. If all groups which participate reclaim are under
-their low limits then all of them are reclaimed and the low limit is
-ignored. low_limit_breached counter in memory.stat file can be checked
-to see how many times such an event occurred.
+Groups might be also protected from both global and limit reclaim
+by low_limit_in_bytes and min_limit_in_bytes knobs. The first one
+provides an optimistic reclaim protection while the later one provides
+hard memory reclaim protection guarantee. Both limits are 0 by default
+and min watermark has always precedence to low watermark.
+
+If the low limit is non-zero the reclaim logic doesn't include
+groups (and their subgroups - see 6. Hierarchy support) which are
+below low_limit if there is other eligible cgroup in the reclaimed
+hierarchy. If all groups which participate reclaim are under their low
+limits then all of them are reclaimed and the low limit is ignored.
+low_limit_breached counter in memory.stat file can be checked to see how
+many times such an event occurred.
+
+If, however, all the groups under reclaimed hierarchy are under their min
+limits then no reclaim is done and OOM killer is triggered to resolve the
+situation. In other words low_limit is never breached by the reclaim.
 
 Note2: When panic_on_oom is set to "2", the whole system will panic.
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5e2ca2163b12..ddb96729a6b6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -93,10 +93,11 @@ bool task_in_mem_cgroup(struct task_struct *task,
const struct mem_cgroup *memcg);
 
 extern bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
-   struct mem_cgroup *root);
+   struct mem_cgroup *root, bool soft_guarantee);
 
-extern void mem_cgroup_guarantee_breached(struct mem_cgroup *memcg);
-extern bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root);
+extern void mem_cgroup_soft_guarantee_breached(struct mem_cgroup *memcg);
+extern bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root,
+   bool soft_guarantee);
 
 extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
 extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
@@ -295,14 +296,15 @@ static inline struct lruvec 
*mem_cgroup_page_lruvec(struct page *page,
 }
 
 static inline bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
-   struct mem_cgroup *root)
+   

[PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-11 Thread Michal Hocko
Some users (e.g. Google) would like to have stronger semantic than low
limit offers currently. The fallback mode is not desirable and they
prefer hitting OOM killer rather than ignoring low limit for protected
groups.

There are other possible usecases which can benefit from hard
guarantees. There are loads which will simply start trashing if the
memory working set drops under certain level and it is more appropriate
to simply kill and restart such a load if the required memory cannot
be provided. Another usecase would be a hard memory isolation for
containers.

The min_limit is initialized to 0 and it has precedence over low_limit.
If the reclaim is not able to find any memcg in the reclaimed hierarchy
above min_limit then OOM killer is triggered to resolve the situation.

Signed-off-by: Michal Hocko mho...@suse.cz
---
 Documentation/cgroups/memory.txt | 26 ++
 include/linux/memcontrol.h   | 14 --
 include/linux/res_counter.h  | 32 ++--
 mm/memcontrol.c  | 18 +++---
 mm/oom_kill.c|  6 --
 mm/vmscan.c  | 38 ++
 6 files changed, 93 insertions(+), 41 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index bf895d7e1363..6929a06c9e5d 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -61,6 +61,7 @@ Brief summary of control files.
  memory.low_limit_breached  # number of times low_limit has been
 # ignored and the cgroup reclaimed even
 # when it was above the limit
+ memory.min_limit_in_bytes  # set/show min limit for memory reclaim
  memory.memsw.limit_in_bytes# set/show limit of memory+Swap usage
  memory.failcnt # show the number of memory usage hits 
limits
  memory.memsw.failcnt   # show the number of memory+Swap hits limits
@@ -248,14 +249,23 @@ global VM. Cgroups can get reclaimed basically under two 
conditions
to select and kill the bulkiest task in the hiearchy. (See 10. OOM Control
below.)
 
-Groups might be also protected from both global and limit reclaim by
-low_limit_in_bytes knob. If the limit is non-zero the reclaim logic
-doesn't include groups (and their subgroups - see 6. Hierarchy support)
-which are below the low limit if there is other eligible cgroup in the
-reclaimed hierarchy. If all groups which participate reclaim are under
-their low limits then all of them are reclaimed and the low limit is
-ignored. low_limit_breached counter in memory.stat file can be checked
-to see how many times such an event occurred.
+Groups might be also protected from both global and limit reclaim
+by low_limit_in_bytes and min_limit_in_bytes knobs. The first one
+provides an optimistic reclaim protection while the later one provides
+hard memory reclaim protection guarantee. Both limits are 0 by default
+and min watermark has always precedence to low watermark.
+
+If the low limit is non-zero the reclaim logic doesn't include
+groups (and their subgroups - see 6. Hierarchy support) which are
+below low_limit if there is other eligible cgroup in the reclaimed
+hierarchy. If all groups which participate reclaim are under their low
+limits then all of them are reclaimed and the low limit is ignored.
+low_limit_breached counter in memory.stat file can be checked to see how
+many times such an event occurred.
+
+If, however, all the groups under reclaimed hierarchy are under their min
+limits then no reclaim is done and OOM killer is triggered to resolve the
+situation. In other words low_limit is never breached by the reclaim.
 
 Note2: When panic_on_oom is set to 2, the whole system will panic.
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5e2ca2163b12..ddb96729a6b6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -93,10 +93,11 @@ bool task_in_mem_cgroup(struct task_struct *task,
const struct mem_cgroup *memcg);
 
 extern bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
-   struct mem_cgroup *root);
+   struct mem_cgroup *root, bool soft_guarantee);
 
-extern void mem_cgroup_guarantee_breached(struct mem_cgroup *memcg);
-extern bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root);
+extern void mem_cgroup_soft_guarantee_breached(struct mem_cgroup *memcg);
+extern bool mem_cgroup_all_within_guarantee(struct mem_cgroup *root,
+   bool soft_guarantee);
 
 extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page);
 extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
@@ -295,14 +296,15 @@ static inline struct lruvec 
*mem_cgroup_page_lruvec(struct page *page,
 }
 
 static inline bool mem_cgroup_within_guarantee(struct mem_cgroup *memcg,
-   struct mem_cgroup 

Re: [PATCH 2/2] memcg: Allow guarantee reclaim

2014-06-11 Thread Johannes Weiner
On Wed, Jun 11, 2014 at 10:00:24AM +0200, Michal Hocko wrote:
 Some users (e.g. Google) would like to have stronger semantic than low
 limit offers currently. The fallback mode is not desirable and they
 prefer hitting OOM killer rather than ignoring low limit for protected
 groups.
 
 There are other possible usecases which can benefit from hard
 guarantees. There are loads which will simply start trashing if the
 memory working set drops under certain level and it is more appropriate
 to simply kill and restart such a load if the required memory cannot
 be provided. Another usecase would be a hard memory isolation for
 containers.
 
 The min_limit is initialized to 0 and it has precedence over low_limit.
 If the reclaim is not able to find any memcg in the reclaimed hierarchy
 above min_limit then OOM killer is triggered to resolve the situation.
 
 Signed-off-by: Michal Hocko mho...@suse.cz
 ---

 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 99137aecd95f..8e844bd42c51 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -2220,13 +2220,12 @@ static inline bool should_continue_reclaim(struct 
 zone *zone,
   *
   * @zone: zone to shrink
   * @sc: scan control with additional reclaim parameters
 - * @honor_memcg_guarantee: do not reclaim memcgs which are within their 
 memory
 - * guarantee
 + * @soft_guarantee: Use soft guarantee reclaim target for memcg reclaim.
   *
   * Returns the number of reclaimed memcgs.
   */
  static unsigned __shrink_zone(struct zone *zone, struct scan_control *sc,
 - bool honor_memcg_guarantee)
 + bool soft_guarantee)
  {
   unsigned long nr_reclaimed, nr_scanned;
   unsigned nr_scanned_groups = 0;
 @@ -2245,11 +2244,10 @@ static unsigned __shrink_zone(struct zone *zone, 
 struct scan_control *sc,
   memcg = mem_cgroup_iter(root, NULL, reclaim);
   do {
   struct lruvec *lruvec;
 - bool within_guarantee;
  
   /* Memcg might be protected from the reclaim */
 - within_guarantee = mem_cgroup_within_guarantee(memcg, 
 root);
 - if (honor_memcg_guarantee  within_guarantee) {
 + if (mem_cgroup_within_guarantee(memcg, root,
 + soft_guarantee)) {
   /*
* It would be more optimal to skip the memcg
* subtree now but we do not have a memcg iter
 @@ -2259,8 +2257,8 @@ static unsigned __shrink_zone(struct zone *zone, struct 
 scan_control *sc,
   continue;
   }
  
 - if (within_guarantee)
 - mem_cgroup_guarantee_breached(memcg);
 + if (!soft_guarantee)
 + mem_cgroup_soft_guarantee_breached(memcg);
  
   lruvec = mem_cgroup_zone_lruvec(zone, memcg);
   nr_scanned_groups++;
 @@ -2297,20 +2295,27 @@ static unsigned __shrink_zone(struct zone *zone, 
 struct scan_control *sc,
  
  static void shrink_zone(struct zone *zone, struct scan_control *sc)
  {
 - bool honor_guarantee = true;
 + bool soft_guarantee = true;
  
 - while (!__shrink_zone(zone, sc, honor_guarantee)) {
 + while (!__shrink_zone(zone, sc, soft_guarantee)) {
   /*
* The previous round of reclaim didn't find anything to scan
* because
 -  * a) the whole reclaimed hierarchy is within guarantee so
 -  *we fallback to ignore the guarantee because other option
 -  *would be the OOM
 +  * a) the whole reclaimed hierarchy is within soft guarantee so
 +  *we are switching to the hard guarantee reclaim target
* b) multiple reclaimers are racing and so the first round
*should be retried
*/
 - if (mem_cgroup_all_within_guarantee(sc-target_mem_cgroup))
 - honor_guarantee = false;
 + if (mem_cgroup_all_within_guarantee(sc-target_mem_cgroup,
 + soft_guarantee)) {
 + /*
 +  * Nothing to reclaim even with hard guarantees so
 +  * we have to OOM
 +  */
 + if (!soft_guarantee)
 + break;
 + soft_guarantee = false;
 + }
   }
  }
  
 @@ -2574,7 +2579,8 @@ out:
* If the target memcg is not eligible for reclaim then we have no 
 option
* but OOM
*/
 - if (!sc-nr_scanned  
 mem_cgroup_all_within_guarantee(sc-target_mem_cgroup))
 + if (!sc-nr_scanned 
 + mem_cgroup_all_within_guarantee(sc-target_mem_cgroup, 
 false))
   return 0;

This code is truly dreadful.

Don't call it guarantee