Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Michal Hocko
On Thu 08-11-12 07:29:23, Tejun Heo wrote:
> Hey, Michal.
> 
> On Thu, Nov 08, 2012 at 04:20:39PM +0100, Michal Hocko wrote:
> > > So, in the above example in CPU2, (B->state & FREEZING) test and
> > > freezer_apply_state(C, false) can't be interleaved with the same
> > > inheritance operation from CPU1.  They either happen before or after.
> > 
> > I am not sure I understand what you mean by inheritance operation but
> 
> The operation of looking at one's parent and inherting the state.
> Here, checking parent->state and calling apply_state accordingly.
> 
> > you are right that the race is not possible because spin_lock makes
> > sure that Foo->state is done after the lock(Foo->child) and spin_unlock
> > then serves as a leave barrier so the other CPUs will see the correctly
> > updated value. The rest is handled by the fixed ordered tree walk. So
> > there is really no interleaving going on here.
> 
> I'm worried that this is a bit too subtle.

Dunno, it looks obvious now, I just missed the entry implicit
barriers by spinlocks and again sorry about the confusion.

> This will work fine with a single hierarchy mutex protecting hierarchy
> updates and state propagations through it and that should work for
> most controllers too.

But single mutex is just ugly.

> I want to have at least one example of finer grained locking for
> future reference and cgroup_freezer happened to be the one I started
> working on.

I think this is a good example because it shows how to share the state
without too many implementation details.

> So, it is more complicated (not necessarily in written code but the
> sync rules) than necessary.  I'm still not sure whether to keep it or
> not.

I think the locking is fine and I would keep it this way rather than a
big lock.

> I'll add more documentation about synchronization in
> cgroup_for_each_descendant_pre() either way.

more doc cannot hurt ;)

Thanks
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Tejun Heo
Hey, Michal.

On Thu, Nov 08, 2012 at 04:20:39PM +0100, Michal Hocko wrote:
> > So, in the above example in CPU2, (B->state & FREEZING) test and
> > freezer_apply_state(C, false) can't be interleaved with the same
> > inheritance operation from CPU1.  They either happen before or after.
> 
> I am not sure I understand what you mean by inheritance operation but

The operation of looking at one's parent and inherting the state.
Here, checking parent->state and calling apply_state accordingly.

> you are right that the race is not possible because spin_lock makes
> sure that Foo->state is done after the lock(Foo->child) and spin_unlock
> then serves as a leave barrier so the other CPUs will see the correctly
> updated value. The rest is handled by the fixed ordered tree walk. So
> there is really no interleaving going on here.

I'm worried that this is a bit too subtle.  This will work fine with a
single hierarchy mutex protecting hierarchy updates and state
propagations through it and that should work for most controllers too.
I want to have at least one example of finer grained locking for
future reference and cgroup_freezer happened to be the one I started
working on.  So, it is more complicated (not necessarily in written
code but the sync rules) than necessary.  I'm still not sure whether
to keep it or not.  I'll add more documentation about synchronization
in cgroup_for_each_descendant_pre() either way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Michal Hocko
On Thu 08-11-12 06:18:48, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Nov 08, 2012 at 03:08:52PM +0100, Michal Hocko wrote:
> > This seems to be racy because parent->state access is not linearized.
> > Say we have parallel freeze and thawing on a tree like the following:
> > A
> > |
> > B
> > |
> > C 
> > 
> > pre_order will visit them in B, C order.
> > CPU1CPU2
> > freezer_apply_state(A, true)
> > A->state & FREEZING == true freezer_apply_state(A, false)
> > A->state & FREEZING == false
> > freezer_apply_state(B, false)
> > B->state & FREEZING == false
> > freezer_apply_state(B, true)
> > 
> > B->state & FREEZING == true
> > freezer_apply_state(C, true)
> > freezer_apply_state(C, false)
> > 
> > So A, C are thawed while B is frozen. Or am I missing something which
> > would prevent from this kind of race?
> 
> The rule is that there will be at least one inheritance operation
> after a parent is updated.  The exact order of propagation doesn't
> matter as long as there's at least one inheritance event after the
> latest update to a parent.  This works because inheritance operations
> are atomic to each other.  If one inheritance operation "saw" an
> update to its parent, the next inheritance operation is guaranteed to
> see at least upto that update.
> 
> So, in the above example in CPU2, (B->state & FREEZING) test and
> freezer_apply_state(C, false) can't be interleaved with the same
> inheritance operation from CPU1.  They either happen before or after.

I am not sure I understand what you mean by inheritance operation but
you are right that the race is not possible because spin_lock makes
sure that Foo->state is done after the lock(Foo->child) and spin_unlock
then serves as a leave barrier so the other CPUs will see the correctly
updated value. The rest is handled by the fixed ordered tree walk. So
there is really no interleaving going on here.

Sorry about the confusion!

Feel free to add my Reviewed-by.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Tejun Heo
Hello, Michal.

On Thu, Nov 08, 2012 at 03:08:52PM +0100, Michal Hocko wrote:
> This seems to be racy because parent->state access is not linearized.
> Say we have parallel freeze and thawing on a tree like the following:
>   A
>   |
>   B
> |
> C 
> 
> pre_order will visit them in B, C order.
>   CPU1CPU2
> freezer_apply_state(A, true)
> A->state & FREEZING == true   freezer_apply_state(A, false)
>   A->state & FREEZING == false
>   freezer_apply_state(B, false)
>   B->state & FREEZING == false
> freezer_apply_state(B, true)
> 
> B->state & FREEZING == true
> freezer_apply_state(C, true)
>   freezer_apply_state(C, false)
> 
> So A, C are thawed while B is frozen. Or am I missing something which
> would prevent from this kind of race?

The rule is that there will be at least one inheritance operation
after a parent is updated.  The exact order of propagation doesn't
matter as long as there's at least one inheritance event after the
latest update to a parent.  This works because inheritance operations
are atomic to each other.  If one inheritance operation "saw" an
update to its parent, the next inheritance operation is guaranteed to
see at least upto that update.

So, in the above example in CPU2, (B->state & FREEZING) test and
freezer_apply_state(C, false) can't be interleaved with the same
inheritance operation from CPU1.  They either happen before or after.

Maybe it's too subtle.  The only reason I didn't use a giant
freezer_mutex was that I wanted to demonstrate how to do state
propagation without depending on single giant lock.  Maybe nobody
wants that and this should use one mutex to protect the hierarchy.  I
don't know.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Michal Hocko
On Wed 07-11-12 08:39:19, Tejun Heo wrote:
[...]
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
[...]
> @@ -320,14 +388,39 @@ static void freezer_apply_state(struct f
>   * @freezer: freezer of interest
>   * @freeze: whether to freeze or thaw
>   *
> - * Freeze or thaw @cgroup according to @freeze.
> + * Freeze or thaw @freezer according to @freeze.  The operations are
> + * recursive - all descendants of @freezer will be affected.
>   */
>  static void freezer_change_state(struct freezer *freezer, bool freeze)
>  {
> + struct cgroup *pos;
> +
>   /* update @freezer */
>   spin_lock_irq(>lock);
>   freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
>   spin_unlock_irq(>lock);
> +
> + /*
> +  * Update all its descendants in pre-order traversal.  Each
> +  * descendant will try to inherit its parent's FREEZING state as
> +  * CGROUP_FREEZING_PARENT.
> +  */
> + rcu_read_lock();
> + cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
> + struct freezer *pos_f = cgroup_freezer(pos);
> + struct freezer *parent = parent_freezer(pos_f);
> +
> + /*
> +  * Our update to @parent->state is already visible which is
> +  * all we need.  No need to lock @parent.  For more info on
> +  * synchronization, see freezer_post_create().
> +  */
> + spin_lock_irq(_f->lock);
> + freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
> + CGROUP_FREEZING_PARENT);
> + spin_unlock_irq(_f->lock);
> + }
> + rcu_read_unlock();
>  }

This seems to be racy because parent->state access is not linearized.
Say we have parallel freeze and thawing on a tree like the following:
A
|
B
|
C 

pre_order will visit them in B, C order.
CPU1CPU2
freezer_apply_state(A, true)
A->state & FREEZING == true freezer_apply_state(A, false)
A->state & FREEZING == false
freezer_apply_state(B, false)
B->state & FREEZING == false
freezer_apply_state(B, true)

B->state & FREEZING == true
freezer_apply_state(C, true)
freezer_apply_state(C, false)

So A, C are thawed while B is frozen. Or am I missing something which
would prevent from this kind of race?

The easy fix is to move spin_unlock_irq(>lock); after
rcu_read_unlock.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Michal Hocko
On Wed 07-11-12 08:39:19, Tejun Heo wrote:
[...]
 --- a/kernel/cgroup_freezer.c
 +++ b/kernel/cgroup_freezer.c
[...]
 @@ -320,14 +388,39 @@ static void freezer_apply_state(struct f
   * @freezer: freezer of interest
   * @freeze: whether to freeze or thaw
   *
 - * Freeze or thaw @cgroup according to @freeze.
 + * Freeze or thaw @freezer according to @freeze.  The operations are
 + * recursive - all descendants of @freezer will be affected.
   */
  static void freezer_change_state(struct freezer *freezer, bool freeze)
  {
 + struct cgroup *pos;
 +
   /* update @freezer */
   spin_lock_irq(freezer-lock);
   freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
   spin_unlock_irq(freezer-lock);
 +
 + /*
 +  * Update all its descendants in pre-order traversal.  Each
 +  * descendant will try to inherit its parent's FREEZING state as
 +  * CGROUP_FREEZING_PARENT.
 +  */
 + rcu_read_lock();
 + cgroup_for_each_descendant_pre(pos, freezer-css.cgroup) {
 + struct freezer *pos_f = cgroup_freezer(pos);
 + struct freezer *parent = parent_freezer(pos_f);
 +
 + /*
 +  * Our update to @parent-state is already visible which is
 +  * all we need.  No need to lock @parent.  For more info on
 +  * synchronization, see freezer_post_create().
 +  */
 + spin_lock_irq(pos_f-lock);
 + freezer_apply_state(pos_f, parent-state  CGROUP_FREEZING,
 + CGROUP_FREEZING_PARENT);
 + spin_unlock_irq(pos_f-lock);
 + }
 + rcu_read_unlock();
  }

This seems to be racy because parent-state access is not linearized.
Say we have parallel freeze and thawing on a tree like the following:
A
|
B
|
C 

pre_order will visit them in B, C order.
CPU1CPU2
freezer_apply_state(A, true)
A-state  FREEZING == true freezer_apply_state(A, false)
A-state  FREEZING == false
freezer_apply_state(B, false)
B-state  FREEZING == false
freezer_apply_state(B, true)

B-state  FREEZING == true
freezer_apply_state(C, true)
freezer_apply_state(C, false)

So A, C are thawed while B is frozen. Or am I missing something which
would prevent from this kind of race?

The easy fix is to move spin_unlock_irq(freezer-lock); after
rcu_read_unlock.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Tejun Heo
Hello, Michal.

On Thu, Nov 08, 2012 at 03:08:52PM +0100, Michal Hocko wrote:
 This seems to be racy because parent-state access is not linearized.
 Say we have parallel freeze and thawing on a tree like the following:
   A
   |
   B
 |
 C 
 
 pre_order will visit them in B, C order.
   CPU1CPU2
 freezer_apply_state(A, true)
 A-state  FREEZING == true   freezer_apply_state(A, false)
   A-state  FREEZING == false
   freezer_apply_state(B, false)
   B-state  FREEZING == false
 freezer_apply_state(B, true)
 
 B-state  FREEZING == true
 freezer_apply_state(C, true)
   freezer_apply_state(C, false)
 
 So A, C are thawed while B is frozen. Or am I missing something which
 would prevent from this kind of race?

The rule is that there will be at least one inheritance operation
after a parent is updated.  The exact order of propagation doesn't
matter as long as there's at least one inheritance event after the
latest update to a parent.  This works because inheritance operations
are atomic to each other.  If one inheritance operation saw an
update to its parent, the next inheritance operation is guaranteed to
see at least upto that update.

So, in the above example in CPU2, (B-state  FREEZING) test and
freezer_apply_state(C, false) can't be interleaved with the same
inheritance operation from CPU1.  They either happen before or after.

Maybe it's too subtle.  The only reason I didn't use a giant
freezer_mutex was that I wanted to demonstrate how to do state
propagation without depending on single giant lock.  Maybe nobody
wants that and this should use one mutex to protect the hierarchy.  I
don't know.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Michal Hocko
On Thu 08-11-12 06:18:48, Tejun Heo wrote:
 Hello, Michal.
 
 On Thu, Nov 08, 2012 at 03:08:52PM +0100, Michal Hocko wrote:
  This seems to be racy because parent-state access is not linearized.
  Say we have parallel freeze and thawing on a tree like the following:
  A
  |
  B
  |
  C 
  
  pre_order will visit them in B, C order.
  CPU1CPU2
  freezer_apply_state(A, true)
  A-state  FREEZING == true freezer_apply_state(A, false)
  A-state  FREEZING == false
  freezer_apply_state(B, false)
  B-state  FREEZING == false
  freezer_apply_state(B, true)
  
  B-state  FREEZING == true
  freezer_apply_state(C, true)
  freezer_apply_state(C, false)
  
  So A, C are thawed while B is frozen. Or am I missing something which
  would prevent from this kind of race?
 
 The rule is that there will be at least one inheritance operation
 after a parent is updated.  The exact order of propagation doesn't
 matter as long as there's at least one inheritance event after the
 latest update to a parent.  This works because inheritance operations
 are atomic to each other.  If one inheritance operation saw an
 update to its parent, the next inheritance operation is guaranteed to
 see at least upto that update.
 
 So, in the above example in CPU2, (B-state  FREEZING) test and
 freezer_apply_state(C, false) can't be interleaved with the same
 inheritance operation from CPU1.  They either happen before or after.

I am not sure I understand what you mean by inheritance operation but
you are right that the race is not possible because spin_lock makes
sure that Foo-state is done after the lock(Foo-child) and spin_unlock
then serves as a leave barrier so the other CPUs will see the correctly
updated value. The rest is handled by the fixed ordered tree walk. So
there is really no interleaving going on here.

Sorry about the confusion!

Feel free to add my Reviewed-by.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Tejun Heo
Hey, Michal.

On Thu, Nov 08, 2012 at 04:20:39PM +0100, Michal Hocko wrote:
  So, in the above example in CPU2, (B-state  FREEZING) test and
  freezer_apply_state(C, false) can't be interleaved with the same
  inheritance operation from CPU1.  They either happen before or after.
 
 I am not sure I understand what you mean by inheritance operation but

The operation of looking at one's parent and inherting the state.
Here, checking parent-state and calling apply_state accordingly.

 you are right that the race is not possible because spin_lock makes
 sure that Foo-state is done after the lock(Foo-child) and spin_unlock
 then serves as a leave barrier so the other CPUs will see the correctly
 updated value. The rest is handled by the fixed ordered tree walk. So
 there is really no interleaving going on here.

I'm worried that this is a bit too subtle.  This will work fine with a
single hierarchy mutex protecting hierarchy updates and state
propagations through it and that should work for most controllers too.
I want to have at least one example of finer grained locking for
future reference and cgroup_freezer happened to be the one I started
working on.  So, it is more complicated (not necessarily in written
code but the sync rules) than necessary.  I'm still not sure whether
to keep it or not.  I'll add more documentation about synchronization
in cgroup_for_each_descendant_pre() either way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-08 Thread Michal Hocko
On Thu 08-11-12 07:29:23, Tejun Heo wrote:
 Hey, Michal.
 
 On Thu, Nov 08, 2012 at 04:20:39PM +0100, Michal Hocko wrote:
   So, in the above example in CPU2, (B-state  FREEZING) test and
   freezer_apply_state(C, false) can't be interleaved with the same
   inheritance operation from CPU1.  They either happen before or after.
  
  I am not sure I understand what you mean by inheritance operation but
 
 The operation of looking at one's parent and inherting the state.
 Here, checking parent-state and calling apply_state accordingly.
 
  you are right that the race is not possible because spin_lock makes
  sure that Foo-state is done after the lock(Foo-child) and spin_unlock
  then serves as a leave barrier so the other CPUs will see the correctly
  updated value. The rest is handled by the fixed ordered tree walk. So
  there is really no interleaving going on here.
 
 I'm worried that this is a bit too subtle.

Dunno, it looks obvious now, I just missed the entryleave implicit
barriers by spinlocks and again sorry about the confusion.

 This will work fine with a single hierarchy mutex protecting hierarchy
 updates and state propagations through it and that should work for
 most controllers too.

But single mutex is just ugly.

 I want to have at least one example of finer grained locking for
 future reference and cgroup_freezer happened to be the one I started
 working on.

I think this is a good example because it shows how to share the state
without too many implementation details.

 So, it is more complicated (not necessarily in written code but the
 sync rules) than necessary.  I'm still not sure whether to keep it or
 not.

I think the locking is fine and I would keep it this way rather than a
big lock.

 I'll add more documentation about synchronization in
 cgroup_for_each_descendant_pre() either way.

more doc cannot hurt ;)

Thanks
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-07 Thread Tejun Heo
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved.  They all operated
separately.

This patch implements proper hierarchy support.  If a cgroup is
frozen, all its descendants are frozen.  A cgroup is thawed iff it and
all its ancestors are THAWED.  freezer.self_freezing shows the current
freezing state for the cgroup itself.  freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.

freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state.  update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.

Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations.  Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.

As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.

Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too.  This behavior change is
intended and has been warned via .broken_hierarchy.

v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor.  Fixed.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup_freezer.c |  161 
 1 file changed, 123 insertions(+), 38 deletions(-)

--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,6 +22,13 @@
 #include 
 #include 
 
+/*
+ * A cgroup is freezing if any FREEZING flags are set.  FREEZING_SELF is
+ * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared
+ * for "THAWED".  FREEZING_PARENT is set if the parent freezer is FREEZING
+ * for whatever reason.  IOW, a cgroup has FREEZING_PARENT set if one of
+ * its ancestors has FREEZING_SELF set.
+ */
 enum freezer_state_flags {
CGROUP_FREEZER_ONLINE   = (1 << 0), /* freezer is fully online */
CGROUP_FREEZING_SELF= (1 << 1), /* this freezer is freezing */
@@ -50,6 +57,15 @@ static inline struct freezer *task_freez
struct freezer, css);
 }
 
+static struct freezer *parent_freezer(struct freezer *freezer)
+{
+   struct cgroup *pcg = freezer->css.cgroup->parent;
+
+   if (pcg)
+   return cgroup_freezer(pcg);
+   return NULL;
+}
+
 bool cgroup_freezing(struct task_struct *task)
 {
bool ret;
@@ -74,17 +90,6 @@ static const char *freezer_state_strs(un
return "THAWED";
 };
 
-/*
- * State diagram
- * Transitions are caused by userspace writes to the freezer.state file.
- * The values in parenthesis are state labels. The rest are edge labels.
- *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- *^ ^| |
- *| \___THAWED___/ |
- *\__THAWED/
- */
-
 struct cgroup_subsys freezer_subsys;
 
 static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
@@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freez
  * freezer_post_create - commit creation of a freezer cgroup
  * @cgroup: cgroup being created
  *
- * We're committing to creation of @cgroup.  Mark it online.
+ * We're committing to creation of @cgroup.  Mark it online and inherit
+ * parent's freezing state while holding both parent's and our
+ * freezer->lock.
  */
 static void freezer_post_create(struct cgroup *cgroup)
 {
struct freezer *freezer = cgroup_freezer(cgroup);
+   struct freezer *parent = parent_freezer(freezer);
+
+   /*
+* The following double locking and freezing state inheritance
+* guarantee that @cgroup can never escape ancestors' freezing
+* states.  See cgroup_for_each_descendant_pre() for details.
+*/
+   if (parent)
+   spin_lock_irq(>lock);
+   spin_lock_nested(>lock, SINGLE_DEPTH_NESTING);
 
-   spin_lock_irq(>lock);
freezer->state |= CGROUP_FREEZER_ONLINE;
-   spin_unlock_irq(>lock);
+
+   if (parent && (parent->state & CGROUP_FREEZING)) {
+   freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
+   atomic_inc(_freezing_cnt);
+   }
+
+   spin_unlock(>lock);
+   if (parent)
+   spin_unlock_irq(>lock);
 }
 
 /**
@@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup
 {
struct freezer *freezer = cgroup_freezer(new_cgrp);
struct task_struct *task;
+   bool clear_frozen = false;
 
spin_lock_irq(>lock);
 
@@ -172,10 +197,25 @@ static void 

[PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support

2012-11-07 Thread Tejun Heo
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved.  They all operated
separately.

This patch implements proper hierarchy support.  If a cgroup is
frozen, all its descendants are frozen.  A cgroup is thawed iff it and
all its ancestors are THAWED.  freezer.self_freezing shows the current
freezing state for the cgroup itself.  freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.

freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state.  update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.

Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations.  Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.

As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.

Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too.  This behavior change is
intended and has been warned via .broken_hierarchy.

v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor.  Fixed.

Signed-off-by: Tejun Heo t...@kernel.org
---
 kernel/cgroup_freezer.c |  161 
 1 file changed, 123 insertions(+), 38 deletions(-)

--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,6 +22,13 @@
 #include linux/freezer.h
 #include linux/seq_file.h
 
+/*
+ * A cgroup is freezing if any FREEZING flags are set.  FREEZING_SELF is
+ * set if FROZEN is written to freezer.state cgroupfs file, and cleared
+ * for THAWED.  FREEZING_PARENT is set if the parent freezer is FREEZING
+ * for whatever reason.  IOW, a cgroup has FREEZING_PARENT set if one of
+ * its ancestors has FREEZING_SELF set.
+ */
 enum freezer_state_flags {
CGROUP_FREEZER_ONLINE   = (1  0), /* freezer is fully online */
CGROUP_FREEZING_SELF= (1  1), /* this freezer is freezing */
@@ -50,6 +57,15 @@ static inline struct freezer *task_freez
struct freezer, css);
 }
 
+static struct freezer *parent_freezer(struct freezer *freezer)
+{
+   struct cgroup *pcg = freezer-css.cgroup-parent;
+
+   if (pcg)
+   return cgroup_freezer(pcg);
+   return NULL;
+}
+
 bool cgroup_freezing(struct task_struct *task)
 {
bool ret;
@@ -74,17 +90,6 @@ static const char *freezer_state_strs(un
return THAWED;
 };
 
-/*
- * State diagram
- * Transitions are caused by userspace writes to the freezer.state file.
- * The values in parenthesis are state labels. The rest are edge labels.
- *
- * (THAWED) --FROZEN-- (FREEZING) --FROZEN-- (FROZEN)
- *^ ^| |
- *| \___THAWED___/ |
- *\__THAWED/
- */
-
 struct cgroup_subsys freezer_subsys;
 
 static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
@@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freez
  * freezer_post_create - commit creation of a freezer cgroup
  * @cgroup: cgroup being created
  *
- * We're committing to creation of @cgroup.  Mark it online.
+ * We're committing to creation of @cgroup.  Mark it online and inherit
+ * parent's freezing state while holding both parent's and our
+ * freezer-lock.
  */
 static void freezer_post_create(struct cgroup *cgroup)
 {
struct freezer *freezer = cgroup_freezer(cgroup);
+   struct freezer *parent = parent_freezer(freezer);
+
+   /*
+* The following double locking and freezing state inheritance
+* guarantee that @cgroup can never escape ancestors' freezing
+* states.  See cgroup_for_each_descendant_pre() for details.
+*/
+   if (parent)
+   spin_lock_irq(parent-lock);
+   spin_lock_nested(freezer-lock, SINGLE_DEPTH_NESTING);
 
-   spin_lock_irq(freezer-lock);
freezer-state |= CGROUP_FREEZER_ONLINE;
-   spin_unlock_irq(freezer-lock);
+
+   if (parent  (parent-state  CGROUP_FREEZING)) {
+   freezer-state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
+   atomic_inc(system_freezing_cnt);
+   }
+
+   spin_unlock(freezer-lock);
+   if (parent)
+   spin_unlock_irq(parent-lock);
 }
 
 /**
@@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup
 {
struct freezer *freezer = cgroup_freezer(new_cgrp);
struct task_struct *task;
+   bool clear_frozen = false;