Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Paolo Valente

Il giorno 30/mag/2014, alle ore 17:37, Tejun Heo  ha scritto:

> On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
>> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
>> index 768fe44..cdd2528 100644
>> --- a/include/linux/cgroup_subsys.h
>> +++ b/include/linux/cgroup_subsys.h
>> @@ -39,6 +39,10 @@ SUBSYS(net_cls)
>> SUBSYS(blkio)
>> #endif
>> 
>> +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
>> +SUBSYS(bfqio)
>> +#endif
> 
> So, ummm, I don't think this is a good idea.  Why aren't you plugging
> into the blkcg infrastructure as cfq does?  Why does it need to be a
> separate controller?
> 

It does not, actually. It is just that when we implemented that part, there was 
no blkcg infrastructure. After that, I have gone on experimenting with the 
low-latency heuristics and all the other stuff. Finally I have decided to first 
propose this new version of bfq, and then deal also with blkcg integration in 
case of a positive welcome.

Thanks,
Paolo

> Thanks.
> 
> -- 
> tejun


--
Paolo Valente 
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy  
homepage:  http://algogroup.unimore.it/people/paolo/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Paolo Valente

Il giorno 30/mag/2014, alle ore 17:39, Tejun Heo  ha scritto:

> On Fri, May 30, 2014 at 11:37:18AM -0400, Tejun Heo wrote:
>> On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
>>> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
>>> index 768fe44..cdd2528 100644
>>> --- a/include/linux/cgroup_subsys.h
>>> +++ b/include/linux/cgroup_subsys.h
>>> @@ -39,6 +39,10 @@ SUBSYS(net_cls)
>>> SUBSYS(blkio)
>>> #endif
>>> 
>>> +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
>>> +SUBSYS(bfqio)
>>> +#endif
>> 
>> So, ummm, I don't think this is a good idea.  Why aren't you plugging
>> into the blkcg infrastructure as cfq does?  Why does it need to be a
>> separate controller?
> 
> If there's something which doesn't work for bfq in blkcg, please let
> me know.  I'd be happy to make it work.
> 

This will probably be very useful for us.

Thanks a lot,
Paolo

> Thanks.
> 
> -- 
> tejun


--
Paolo Valente 
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy  
homepage:  http://algogroup.unimore.it/people/paolo/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Tejun Heo
On Fri, May 30, 2014 at 11:37:18AM -0400, Tejun Heo wrote:
> On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
> > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> > index 768fe44..cdd2528 100644
> > --- a/include/linux/cgroup_subsys.h
> > +++ b/include/linux/cgroup_subsys.h
> > @@ -39,6 +39,10 @@ SUBSYS(net_cls)
> >  SUBSYS(blkio)
> >  #endif
> >  
> > +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
> > +SUBSYS(bfqio)
> > +#endif
> 
> So, ummm, I don't think this is a good idea.  Why aren't you plugging
> into the blkcg infrastructure as cfq does?  Why does it need to be a
> separate controller?

If there's something which doesn't work for bfq in blkcg, please let
me know.  I'd be happy to make it work.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Tejun Heo
On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index 768fe44..cdd2528 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -39,6 +39,10 @@ SUBSYS(net_cls)
>  SUBSYS(blkio)
>  #endif
>  
> +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
> +SUBSYS(bfqio)
> +#endif

So, ummm, I don't think this is a good idea.  Why aren't you plugging
into the blkcg infrastructure as cfq does?  Why does it need to be a
separate controller?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Tejun Heo
On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
 diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
 index 768fe44..cdd2528 100644
 --- a/include/linux/cgroup_subsys.h
 +++ b/include/linux/cgroup_subsys.h
 @@ -39,6 +39,10 @@ SUBSYS(net_cls)
  SUBSYS(blkio)
  #endif
  
 +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
 +SUBSYS(bfqio)
 +#endif

So, ummm, I don't think this is a good idea.  Why aren't you plugging
into the blkcg infrastructure as cfq does?  Why does it need to be a
separate controller?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Tejun Heo
On Fri, May 30, 2014 at 11:37:18AM -0400, Tejun Heo wrote:
 On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
  diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
  index 768fe44..cdd2528 100644
  --- a/include/linux/cgroup_subsys.h
  +++ b/include/linux/cgroup_subsys.h
  @@ -39,6 +39,10 @@ SUBSYS(net_cls)
   SUBSYS(blkio)
   #endif
   
  +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
  +SUBSYS(bfqio)
  +#endif
 
 So, ummm, I don't think this is a good idea.  Why aren't you plugging
 into the blkcg infrastructure as cfq does?  Why does it need to be a
 separate controller?

If there's something which doesn't work for bfq in blkcg, please let
me know.  I'd be happy to make it work.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Paolo Valente

Il giorno 30/mag/2014, alle ore 17:39, Tejun Heo t...@kernel.org ha scritto:

 On Fri, May 30, 2014 at 11:37:18AM -0400, Tejun Heo wrote:
 On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
 diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
 index 768fe44..cdd2528 100644
 --- a/include/linux/cgroup_subsys.h
 +++ b/include/linux/cgroup_subsys.h
 @@ -39,6 +39,10 @@ SUBSYS(net_cls)
 SUBSYS(blkio)
 #endif
 
 +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
 +SUBSYS(bfqio)
 +#endif
 
 So, ummm, I don't think this is a good idea.  Why aren't you plugging
 into the blkcg infrastructure as cfq does?  Why does it need to be a
 separate controller?
 
 If there's something which doesn't work for bfq in blkcg, please let
 me know.  I'd be happy to make it work.
 

This will probably be very useful for us.

Thanks a lot,
Paolo

 Thanks.
 
 -- 
 tejun


--
Paolo Valente 
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy  
homepage:  http://algogroup.unimore.it/people/paolo/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-30 Thread Paolo Valente

Il giorno 30/mag/2014, alle ore 17:37, Tejun Heo t...@kernel.org ha scritto:

 On Thu, May 29, 2014 at 11:05:33AM +0200, Paolo Valente wrote:
 diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
 index 768fe44..cdd2528 100644
 --- a/include/linux/cgroup_subsys.h
 +++ b/include/linux/cgroup_subsys.h
 @@ -39,6 +39,10 @@ SUBSYS(net_cls)
 SUBSYS(blkio)
 #endif
 
 +#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
 +SUBSYS(bfqio)
 +#endif
 
 So, ummm, I don't think this is a good idea.  Why aren't you plugging
 into the blkcg infrastructure as cfq does?  Why does it need to be a
 separate controller?
 

It does not, actually. It is just that when we implemented that part, there was 
no blkcg infrastructure. After that, I have gone on experimenting with the 
low-latency heuristics and all the other stuff. Finally I have decided to first 
propose this new version of bfq, and then deal also with blkcg integration in 
case of a positive welcome.

Thanks,
Paolo

 Thanks.
 
 -- 
 tejun


--
Paolo Valente 
Algogroup
Dipartimento di Fisica, Informatica e Matematica
Via Campi, 213/B
41125 Modena - Italy  
homepage:  http://algogroup.unimore.it/people/paolo/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-29 Thread Paolo Valente
From: Fabio Checconi 

Complete support for full hierarchical scheduling, with a cgroups
interface. The name of the new subsystem is bfqio.

Weights can be assigned explicitly to groups and processes through the
cgroups interface, differently from what happens, for single
processes, if the cgroups interface is not used (as explained in the
description of patch 2). In particular, since each node has a full
scheduler, each group can be assigned its own weight.

Signed-off-by: Fabio Checconi 
Signed-off-by: Paolo Valente 
Signed-off-by: Arianna Avanzini 
---
 block/Kconfig.iosched |  13 +-
 block/bfq-cgroup.c| 891 ++
 block/bfq-iosched.c   |  66 ++--
 block/bfq-sched.c |  64 ++-
 block/bfq.h   | 122 +-
 include/linux/cgroup_subsys.h |   4 +
 6 files changed,  insertions(+), 49 deletions(-)
 create mode 100644 block/bfq-cgroup.c

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 8f98cc7..a3675cb 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -46,7 +46,18 @@ config IOSCHED_BFQ
  The BFQ I/O scheduler tries to distribute bandwidth among all
  processes according to their weights.
  It aims at distributing the bandwidth as desired, regardless
- of the disk parameters and with any workload.
+ of the disk parameters and with any workload. If compiled
+ built-in (saying Y here), BFQ can be configured to support
+ hierarchical scheduling.
+
+config CGROUP_BFQIO
+   bool "BFQ hierarchical scheduling support"
+   depends on CGROUPS && IOSCHED_BFQ=y
+   default n
+   ---help---
+ Enable hierarchical scheduling in BFQ, using the cgroups
+ filesystem interface.  The name of the subsystem will be
+ bfqio.
 
 choice
prompt "Default I/O scheduler"
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
new file mode 100644
index 000..00a7a1b
--- /dev/null
+++ b/block/bfq-cgroup.c
@@ -0,0 +1,891 @@
+/*
+ * BFQ: CGROUPS support.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe 
+ *
+ * Copyright (C) 2008 Fabio Checconi 
+ *   Paolo Valente 
+ *
+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
+ * file.
+ */
+
+#ifdef CONFIG_CGROUP_BFQIO
+
+static DEFINE_MUTEX(bfqio_mutex);
+
+static bool bfqio_is_removed(struct bfqio_cgroup *bgrp)
+{
+   return bgrp ? !bgrp->online : false;
+}
+
+static struct bfqio_cgroup bfqio_root_cgroup = {
+   .weight = BFQ_DEFAULT_GRP_WEIGHT,
+   .ioprio = BFQ_DEFAULT_GRP_IOPRIO,
+   .ioprio_class = BFQ_DEFAULT_GRP_CLASS,
+};
+
+static inline void bfq_init_entity(struct bfq_entity *entity,
+  struct bfq_group *bfqg)
+{
+   entity->weight = entity->new_weight;
+   entity->orig_weight = entity->new_weight;
+   entity->ioprio = entity->new_ioprio;
+   entity->ioprio_class = entity->new_ioprio_class;
+   entity->parent = bfqg->my_entity;
+   entity->sched_data = >sched_data;
+}
+
+static struct bfqio_cgroup *css_to_bfqio(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct bfqio_cgroup, css) : NULL;
+}
+
+/*
+ * Search the bfq_group for bfqd into the hash table (by now only a list)
+ * of bgrp.  Must be called under rcu_read_lock().
+ */
+static struct bfq_group *bfqio_lookup_group(struct bfqio_cgroup *bgrp,
+   struct bfq_data *bfqd)
+{
+   struct bfq_group *bfqg;
+   void *key;
+
+   hlist_for_each_entry_rcu(bfqg, >group_data, group_node) {
+   key = rcu_dereference(bfqg->bfqd);
+   if (key == bfqd)
+   return bfqg;
+   }
+
+   return NULL;
+}
+
+static inline void bfq_group_init_entity(struct bfqio_cgroup *bgrp,
+struct bfq_group *bfqg)
+{
+   struct bfq_entity *entity = >entity;
+
+   /*
+* If the weight of the entity has never been set via the sysfs
+* interface, then bgrp->weight == 0. In this case we initialize
+* the weight from the current ioprio value. Otherwise, the group
+* weight, if set, has priority over the ioprio value.
+*/
+   if (bgrp->weight == 0) {
+   entity->new_weight = bfq_ioprio_to_weight(bgrp->ioprio);
+   entity->new_ioprio = bgrp->ioprio;
+   } else {
+   entity->new_weight = bgrp->weight;
+   entity->new_ioprio = bfq_weight_to_ioprio(bgrp->weight);
+   }
+   entity->orig_weight = entity->weight = entity->new_weight;
+   entity->ioprio = entity->new_ioprio;
+   entity->ioprio_class = entity->new_ioprio_class = bgrp->ioprio_class;
+   entity->my_sched_data = >sched_data;
+}
+
+static inline void bfq_group_set_parent(struct bfq_group *bfqg,
+   struct bfq_group *parent)
+{
+  

[PATCH RFC - TAKE TWO - 02/12] block, bfq: add full hierarchical scheduling and cgroups support

2014-05-29 Thread Paolo Valente
From: Fabio Checconi fchecc...@gmail.com

Complete support for full hierarchical scheduling, with a cgroups
interface. The name of the new subsystem is bfqio.

Weights can be assigned explicitly to groups and processes through the
cgroups interface, differently from what happens, for single
processes, if the cgroups interface is not used (as explained in the
description of patch 2). In particular, since each node has a full
scheduler, each group can be assigned its own weight.

Signed-off-by: Fabio Checconi fchecc...@gmail.com
Signed-off-by: Paolo Valente paolo.vale...@unimore.it
Signed-off-by: Arianna Avanzini avanzini.aria...@gmail.com
---
 block/Kconfig.iosched |  13 +-
 block/bfq-cgroup.c| 891 ++
 block/bfq-iosched.c   |  66 ++--
 block/bfq-sched.c |  64 ++-
 block/bfq.h   | 122 +-
 include/linux/cgroup_subsys.h |   4 +
 6 files changed,  insertions(+), 49 deletions(-)
 create mode 100644 block/bfq-cgroup.c

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 8f98cc7..a3675cb 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -46,7 +46,18 @@ config IOSCHED_BFQ
  The BFQ I/O scheduler tries to distribute bandwidth among all
  processes according to their weights.
  It aims at distributing the bandwidth as desired, regardless
- of the disk parameters and with any workload.
+ of the disk parameters and with any workload. If compiled
+ built-in (saying Y here), BFQ can be configured to support
+ hierarchical scheduling.
+
+config CGROUP_BFQIO
+   bool BFQ hierarchical scheduling support
+   depends on CGROUPS  IOSCHED_BFQ=y
+   default n
+   ---help---
+ Enable hierarchical scheduling in BFQ, using the cgroups
+ filesystem interface.  The name of the subsystem will be
+ bfqio.
 
 choice
prompt Default I/O scheduler
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
new file mode 100644
index 000..00a7a1b
--- /dev/null
+++ b/block/bfq-cgroup.c
@@ -0,0 +1,891 @@
+/*
+ * BFQ: CGROUPS support.
+ *
+ * Based on ideas and code from CFQ:
+ * Copyright (C) 2003 Jens Axboe ax...@kernel.dk
+ *
+ * Copyright (C) 2008 Fabio Checconi fa...@gandalf.sssup.it
+ *   Paolo Valente paolo.vale...@unimore.it
+ *
+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
+ * file.
+ */
+
+#ifdef CONFIG_CGROUP_BFQIO
+
+static DEFINE_MUTEX(bfqio_mutex);
+
+static bool bfqio_is_removed(struct bfqio_cgroup *bgrp)
+{
+   return bgrp ? !bgrp-online : false;
+}
+
+static struct bfqio_cgroup bfqio_root_cgroup = {
+   .weight = BFQ_DEFAULT_GRP_WEIGHT,
+   .ioprio = BFQ_DEFAULT_GRP_IOPRIO,
+   .ioprio_class = BFQ_DEFAULT_GRP_CLASS,
+};
+
+static inline void bfq_init_entity(struct bfq_entity *entity,
+  struct bfq_group *bfqg)
+{
+   entity-weight = entity-new_weight;
+   entity-orig_weight = entity-new_weight;
+   entity-ioprio = entity-new_ioprio;
+   entity-ioprio_class = entity-new_ioprio_class;
+   entity-parent = bfqg-my_entity;
+   entity-sched_data = bfqg-sched_data;
+}
+
+static struct bfqio_cgroup *css_to_bfqio(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct bfqio_cgroup, css) : NULL;
+}
+
+/*
+ * Search the bfq_group for bfqd into the hash table (by now only a list)
+ * of bgrp.  Must be called under rcu_read_lock().
+ */
+static struct bfq_group *bfqio_lookup_group(struct bfqio_cgroup *bgrp,
+   struct bfq_data *bfqd)
+{
+   struct bfq_group *bfqg;
+   void *key;
+
+   hlist_for_each_entry_rcu(bfqg, bgrp-group_data, group_node) {
+   key = rcu_dereference(bfqg-bfqd);
+   if (key == bfqd)
+   return bfqg;
+   }
+
+   return NULL;
+}
+
+static inline void bfq_group_init_entity(struct bfqio_cgroup *bgrp,
+struct bfq_group *bfqg)
+{
+   struct bfq_entity *entity = bfqg-entity;
+
+   /*
+* If the weight of the entity has never been set via the sysfs
+* interface, then bgrp-weight == 0. In this case we initialize
+* the weight from the current ioprio value. Otherwise, the group
+* weight, if set, has priority over the ioprio value.
+*/
+   if (bgrp-weight == 0) {
+   entity-new_weight = bfq_ioprio_to_weight(bgrp-ioprio);
+   entity-new_ioprio = bgrp-ioprio;
+   } else {
+   entity-new_weight = bgrp-weight;
+   entity-new_ioprio = bfq_weight_to_ioprio(bgrp-weight);
+   }
+   entity-orig_weight = entity-weight = entity-new_weight;
+   entity-ioprio = entity-new_ioprio;
+   entity-ioprio_class = entity-new_ioprio_class = bgrp-ioprio_class;
+   entity-my_sched_data = bfqg-sched_data;
+}
+
+static