Re: [Xen-devel] [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues

2018-04-24 Thread Praveen Kumar

On Tuesday 24 April 2018 04:33 PM, Dario Faggioli wrote:

On Tue, 2018-04-24 at 14:30 +0530, Praveen Kumar wrote:

Hi Dario,


Hi!


On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:

On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:


+if ( svc->credit < entry->credit )
+node = &parent->rb_left;
+else
+node = &parent->rb_right;
+
+(*pos)++;
+}
+rb_link_node(&svc->runq_elem, parent, node);
+rb_insert_color(&svc->runq_elem, root);
+}


Wait, where's the part where we cache which element is the one with
the
highest credits? (And the same applies to the tree-removal
function, of
course.)

In fact, we need a field for storing such a cache in the runqueue
data
structure as well, and we need to keep it updated.


I thought of caching the left most node as done in rb_tree, but I
thought of taking an easy way to have things working and delaying
the
Linux rb_tree caching variant to be ported in next patch or so.


That is fine, as soon as the fact that you are not doing it right now,
but planning to do it in another patch is stated somewhere (e.g., cover
letter or a changelog).


I would suggest we do not try to use the rb_*_cached() functions,
and
cache rightmost explicitly in runqueue_data.


Ok, that sounds better, I will introduce an entry for rightmost
element
to be cached in runqueue_data
Also, lets port the Linux rb_tree cache variant as well ( probably
in
future we may use that ).


I'm not sure about this last part. I mean, I can see other uses of rb-
trees, TBH, and ones where such caching would be useful. Still, I'll do
the port when we actually decide to use the new functionallity (or
when, e.g., we run into issues retro-fitting a Linux fix, etc).



Sure, I am fine with that too.


Err... is it? Isn't the leftmost element the one with the _least_
credits? It looks to me that we want rb_last().

And IAC, we don't want to have to traverse the tree to get the
runnable
vcpu with the highest credit, we want it available in O(1) time.

That's why we want to cache it.


Yes, it looks like an error. Will update the patch in v2.


Right. So, I think the main problem with this patch was this one, i.e.,
the fact that the runqueue was sorted in the wrong order.

Then there is the lack of caching, for O(1) access to the head of the
runqueue itself. As said, it is ok for that to come in its own patch of
this series. It is also ok if this comes as a later patch, as soon as
that is clearly stated.


Ok. Working on the same.


Sure, let me have 3 series, first; Linux porting , second; rb_tree
changes which doesn't have caching and third; have rightmost node
cached.


I'd actually skip doing the porting right now... Although, in this
case, it's not really my call, and others may have different a opinion.

Sure, I am fine with not including the porting work as part of this 
patch. Probably, we may take this as a different activity as you 
mentioned above.


Regards,

~Praveen.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues

2018-04-24 Thread Dario Faggioli
On Tue, 2018-04-24 at 14:30 +0530, Praveen Kumar wrote:
> Hi Dario,
> 
Hi!

> On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:
> > On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
> > > 
> > > +if ( svc->credit < entry->credit )
> > > +node = &parent->rb_left;
> > > +else
> > > +node = &parent->rb_right;
> > > +
> > > +(*pos)++;
> > > +}
> > > +rb_link_node(&svc->runq_elem, parent, node);
> > > +rb_insert_color(&svc->runq_elem, root);
> > > +}
> > > 
> > Wait, where's the part where we cache which element is the one with
> > the
> > highest credits? (And the same applies to the tree-removal
> > function, of
> > course.)
> > 
> > In fact, we need a field for storing such a cache in the runqueue
> > data
> > structure as well, and we need to keep it updated.
> 
> I thought of caching the left most node as done in rb_tree, but I 
> thought of taking an easy way to have things working and delaying
> the 
> Linux rb_tree caching variant to be ported in next patch or so.
>
That is fine, as soon as the fact that you are not doing it right now,
but planning to do it in another patch is stated somewhere (e.g., cover
letter or a changelog).

> > I would suggest we do not try to use the rb_*_cached() functions,
> > and
> > cache rightmost explicitly in runqueue_data.
> 
> Ok, that sounds better, I will introduce an entry for rightmost
> element 
> to be cached in runqueue_data
> Also, lets port the Linux rb_tree cache variant as well ( probably
> in 
> future we may use that ).
>
I'm not sure about this last part. I mean, I can see other uses of rb-
trees, TBH, and ones where such caching would be useful. Still, I'll do
the port when we actually decide to use the new functionallity (or
when, e.g., we run into issues retro-fitting a Linux fix, etc).

> > Err... is it? Isn't the leftmost element the one with the _least_
> > credits? It looks to me that we want rb_last().
> > 
> > And IAC, we don't want to have to traverse the tree to get the
> > runnable
> > vcpu with the highest credit, we want it available in O(1) time.
> > 
> > That's why we want to cache it.
> 
> Yes, it looks like an error. Will update the patch in v2.
>
Right. So, I think the main problem with this patch was this one, i.e.,
the fact that the runqueue was sorted in the wrong order.

Then there is the lack of caching, for O(1) access to the head of the
runqueue itself. As said, it is ok for that to come in its own patch of
this series. It is also ok if this comes as a later patch, as soon as
that is clearly stated.

> Sure, let me have 3 series, first; Linux porting , second; rb_tree 
> changes which doesn't have caching and third; have rightmost node
> cached.
> 
I'd actually skip doing the porting right now... Although, in this
case, it's not really my call, and others may have different a opinion.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues

2018-04-24 Thread Praveen Kumar

Hi Dario,

On Tuesday 17 April 2018 04:16 PM, Dario Faggioli wrote:

On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:

The patch optimized the sorted credit2 runq from simple linked list
to
rb-tree implementation. This way we will gain performance and
scalability when the number of vCPUs are huge.

Signed-off-by: Praveen Kumar 
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -600,6 +601,29 @@ static inline bool has_cap(const struct
csched2_vcpu *svc)
  return svc->budget != STIME_MAX;
  }
  
+static void runq_insert_rb(struct rb_root *root,

+   struct csched2_vcpu *svc,
+   int *pos)


I'd call this rb_insert() or rb_runq_insert(), but that's taste (and
it's certainly a minor thing).

Sure, would prefer rb_runq_insert()

+{
+struct csched2_vcpu *entry = NULL;
+struct rb_node **node = &root->rb_node;
+struct rb_node *parent = NULL;
+
+while (*node) {
+parent = *node;
+entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
+// Check if we are maintaining the sorted


Pointless comment. I'd leave the line blank (but that's taste again).

Will remove this.

+if ( svc->credit < entry->credit )
+node = &parent->rb_left;
+else
+node = &parent->rb_right;
+
+(*pos)++;
+}
+rb_link_node(&svc->runq_elem, parent, node);
+rb_insert_color(&svc->runq_elem, root);
+}


Wait, where's the part where we cache which element is the one with the
highest credits? (And the same applies to the tree-removal function, of
course.)

In fact, we need a field for storing such a cache in the runqueue data
structure as well, and we need to keep it updated.

Linux (recently) added an rb-tree variant that do this internally, see
cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a
variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest
dl and rq leftmost caching".
I thought of caching the left most node as done in rb_tree, but I 
thought of taking an easy way to have things working and delaying the 
Linux rb_tree caching variant to be ported in next patch or so.

So, I'd say that we either import that from there, or do the caching of
leftmost element explicitly where needed. Note that, however, that
Linux's variant caches leftmost, because they always use rb-trees for
structures where the head of the queue is the element with the
_smallest_ key.

In our case here, we want the queue ordered in descending credit order,
so it must be thought carefully whether or not we could use the new
variant (maybe "just" inverting the binary search tree relationship),
or we'd need another one that caches righmost.

I would suggest we do not try to use the rb_*_cached() functions, and
cache rightmost explicitly in runqueue_data.


Ok, that sounds better, I will introduce an entry for rightmost element 
to be cached in runqueue_data
Also, lets port the Linux rb_tree cache variant as well ( probably in 
future we may use that ).

@@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops,
int cpu,
   * 2) If there's someone waiting whose credit is positive,
   *run until your credit ~= his.
   */
-if ( ! list_empty(runq) )
+if ( ! RB_EMPTY_ROOT(runq) )
  {
-struct csched2_vcpu *swait = runq_elem(runq->next);
+// Find the left most element, which is the most probable
candidate
+struct rb_node *node = rb_first(runq);


Err... is it? Isn't the leftmost element the one with the _least_
credits? It looks to me that we want rb_last().

And IAC, we don't want to have to traverse the tree to get the runnable
vcpu with the highest credit, we want it available in O(1) time.

That's why we want to cache it.

Yes, it looks like an error. Will update the patch in v2.

+// TODO Can we take rb_next ?
+//struct rb_node *node = &rb_next(root->rb_node);
+

What do you mean here?


This won't matter if we do caching.

+struct csched2_vcpu *swait = runq_elem(node);
  if ( ! is_idle_vcpu(swait->vcpu)
   && swait->credit > 0 )
  {
  rt_credit = snext->credit - swait->credit;
  }
  }
@@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data
*rqd,
  snext = csched2_vcpu(idle_vcpu[cpu]);
  
   check_runq:

-list_for_each_safe( iter, temp, &rqd->runq )
-{
-struct csched2_vcpu * svc = list_entry(iter, struct
csched2_vcpu, runq_elem);
+for (iter = rb_first(&rqd->runq); iter != NULL; iter =
rb_next(iter)) {
+struct csched2_vcpu * svc = rb_entry(iter, struct
csched2_vcpu, runq_elem);


Same as above. I don't think this is from where we want to start. And
no matter whether we want to start from rb_first() or rb_last(), we
certainly don't want to have to traverse the tree to get to there.


@@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
  dump_pcpu(ops, j);
  
  pr

Re: [Xen-devel] [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues

2018-04-17 Thread Dario Faggioli
On Tue, 2018-04-03 at 22:25 +0530, Praveen Kumar wrote:
> The patch optimized the sorted credit2 runq from simple linked list
> to
> rb-tree implementation. This way we will gain performance and
> scalability when the number of vCPUs are huge.
> 
> Signed-off-by: Praveen Kumar 

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -600,6 +601,29 @@ static inline bool has_cap(const struct
> csched2_vcpu *svc)
>  return svc->budget != STIME_MAX;
>  }
>  
> +static void runq_insert_rb(struct rb_root *root,
> +   struct csched2_vcpu *svc,
> +   int *pos)
>
I'd call this rb_insert() or rb_runq_insert(), but that's taste (and
it's certainly a minor thing).

> +{
> +struct csched2_vcpu *entry = NULL;
> +struct rb_node **node = &root->rb_node;
> +struct rb_node *parent = NULL;
> +
> +while (*node) {
> +parent = *node;
> +entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
> +// Check if we are maintaining the sorted
>
Pointless comment. I'd leave the line blank (but that's taste again).

> +if ( svc->credit < entry->credit )
> +node = &parent->rb_left;
> +else
> +node = &parent->rb_right;
> +
> +(*pos)++;
> +}
> +rb_link_node(&svc->runq_elem, parent, node);
> +rb_insert_color(&svc->runq_elem, root);
> +}
>
Wait, where's the part where we cache which element is the one with the
highest credits? (And the same applies to the tree-removal function, of
course.)

In fact, we need a field for storing such a cache in the runqueue data
structure as well, and we need to keep it updated.

Linux (recently) added an rb-tree variant that do this internally, see
cd9e61ed1eebb "rbtree: cache leftmost node internally", and how such a
variant is used, e.g. 2161573ecd693 "sched/deadline: replace earliest
dl and rq leftmost caching".

So, I'd say that we either import that from there, or do the caching of
leftmost element explicitly where needed. Note that, however, that
Linux's variant caches leftmost, because they always use rb-trees for
structures where the head of the queue is the element with the
_smallest_ key.

In our case here, we want the queue ordered in descending credit order,
so it must be thought carefully whether or not we could use the new
variant (maybe "just" inverting the binary search tree relationship),
or we'd need another one that caches righmost.

I would suggest we do not try to use the rb_*_cached() functions, and
cache rightmost explicitly in runqueue_data.

> @@ -3201,17 +3225,21 @@ csched2_runtime(const struct scheduler *ops,
> int cpu,
>   * 2) If there's someone waiting whose credit is positive,
>   *run until your credit ~= his.
>   */
> -if ( ! list_empty(runq) )
> +if ( ! RB_EMPTY_ROOT(runq) )
>  {
> -struct csched2_vcpu *swait = runq_elem(runq->next);
> +// Find the left most element, which is the most probable
> candidate
> +struct rb_node *node = rb_first(runq);
> 
Err... is it? Isn't the leftmost element the one with the _least_
credits? It looks to me that we want rb_last().

And IAC, we don't want to have to traverse the tree to get the runnable
vcpu with the highest credit, we want it available in O(1) time.

That's why we want to cache it.

> +// TODO Can we take rb_next ?
> +//struct rb_node *node = &rb_next(root->rb_node);
> +
What do you mean here?

> +struct csched2_vcpu *swait = runq_elem(node);
>  if ( ! is_idle_vcpu(swait->vcpu)
>   && swait->credit > 0 )
>  {
>  rt_credit = snext->credit - swait->credit;
>  }
>  }

> @@ -3345,9 +3373,8 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
>  snext = csched2_vcpu(idle_vcpu[cpu]);
>  
>   check_runq:
> -list_for_each_safe( iter, temp, &rqd->runq )
> -{
> -struct csched2_vcpu * svc = list_entry(iter, struct
> csched2_vcpu, runq_elem);
> +for (iter = rb_first(&rqd->runq); iter != NULL; iter =
> rb_next(iter)) {
> +struct csched2_vcpu * svc = rb_entry(iter, struct
> csched2_vcpu, runq_elem);
>
Same as above. I don't think this is from where we want to start. And
no matter whether we want to start from rb_first() or rb_last(), we
certainly don't want to have to traverse the tree to get to there.

> @@ -3761,8 +3789,8 @@ csched2_dump(const struct scheduler *ops)
>  dump_pcpu(ops, j);
>  
>  printk("RUNQ:\n");
> -list_for_each( iter, runq )
> -{
> +
> +for (iter = rb_first(runq); iter != NULL; iter =
> rb_next(iter)) {
>
And the same applies here as well. I think that, if we want the
runqueue printed in the proper order, we need to start from the
righmost, and use rb_prev().

Oh, and about caching, I'd say it is okay if you want to turn this into
a series, the first patch of which does not have it, and you introduce
it in the second patch.

Regards,
Dario
-- 
<

[Xen-devel] [RFC PATCH 1/1] xen: credit2: rb-tree for runqueues

2018-04-03 Thread Praveen Kumar
The patch optimized the sorted credit2 runq from simple linked list to
rb-tree implementation. This way we will gain performance and
scalability when the number of vCPUs are huge.

Signed-off-by: Praveen Kumar 
---
 xen/common/sched_credit2.c | 94 ++
 1 file changed, 61 insertions(+), 33 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 9a3e71f1c8..3802c2888f 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Meant only for helping developers during debugging. */
 /* #define d2printk printk */
@@ -471,7 +472,7 @@ custom_param("credit2_runqueue", parse_credit2_runqueue);
 struct csched2_runqueue_data {
 spinlock_t lock;   /* Lock for this runqueue */
 
-struct list_head runq; /* Ordered list of runnable vms   */
+struct rb_root runq;   /* Runqueue is an rbtree  */
 int id;/* ID of this runqueue (-1 if invalid)*/
 
 int load;  /* Instantaneous load (num of non-idle vcpus) */
@@ -536,7 +537,7 @@ struct csched2_vcpu {
 s_time_t load_last_update; /* Last time average was updated   
*/
 s_time_t avgload;  /* Decaying queue load 
*/
 
-struct list_head runq_elem;/* On the runqueue (rqd->runq) 
*/
+struct rb_node runq_elem;  /* On the runqueue (rqd->runq) 
*/
 struct list_head parked_elem;  /* On the parked_vcpus list
*/
 struct list_head rqd_elem; /* On csched2_runqueue_data's svc list 
*/
 struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target 
*/
@@ -600,6 +601,29 @@ static inline bool has_cap(const struct csched2_vcpu *svc)
 return svc->budget != STIME_MAX;
 }
 
+static void runq_insert_rb(struct rb_root *root,
+   struct csched2_vcpu *svc,
+   int *pos)
+{
+struct csched2_vcpu *entry = NULL;
+struct rb_node **node = &root->rb_node;
+struct rb_node *parent = NULL;
+
+while (*node) {
+parent = *node;
+entry = rb_entry(parent, struct csched2_vcpu, runq_elem);
+// Check if we are maintaining the sorted
+if ( svc->credit < entry->credit )
+node = &parent->rb_left;
+else
+node = &parent->rb_right;
+
+(*pos)++;
+}
+rb_link_node(&svc->runq_elem, parent, node);
+rb_insert_color(&svc->runq_elem, root);
+}
+
 /*
  * Hyperthreading (SMT) support.
  *
@@ -793,12 +817,12 @@ static s_time_t c2t(struct csched2_runqueue_data *rqd, 
s_time_t credit, struct c
 
 static inline int vcpu_on_runq(struct csched2_vcpu *svc)
 {
-return !list_empty(&svc->runq_elem);
+return !RB_EMPTY_NODE(&svc->runq_elem);
 }
 
-static inline struct csched2_vcpu * runq_elem(struct list_head *elem)
+static inline struct csched2_vcpu * runq_elem(struct rb_node *elem)
 {
-return list_entry(elem, struct csched2_vcpu, runq_elem);
+return rb_entry(elem, struct csched2_vcpu, runq_elem);
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)
@@ -812,7 +836,7 @@ static void activate_runqueue(struct csched2_private *prv, 
int rqi)
 rqd->max_weight = 1;
 rqd->id = rqi;
 INIT_LIST_HEAD(&rqd->svc);
-INIT_LIST_HEAD(&rqd->runq);
+rqd->runq = RB_ROOT;
 spin_lock_init(&rqd->lock);
 
 __cpumask_set_cpu(rqi, &prv->active_queues);
@@ -1272,9 +1296,8 @@ update_load(const struct scheduler *ops,
 static void
 runq_insert(const struct scheduler *ops, struct csched2_vcpu *svc)
 {
-struct list_head *iter;
 unsigned int cpu = svc->vcpu->processor;
-struct list_head * runq = &c2rqd(ops, cpu)->runq;
+struct rb_root *runq = &c2rqd(ops, cpu)->runq;
 int pos = 0;
 
 ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
@@ -1287,16 +1310,7 @@ runq_insert(const struct scheduler *ops, struct 
csched2_vcpu *svc)
 ASSERT(!svc->vcpu->is_running);
 ASSERT(!(svc->flags & CSFLAG_scheduled));
 
-list_for_each( iter, runq )
-{
-struct csched2_vcpu * iter_svc = runq_elem(iter);
-
-if ( svc->credit > iter_svc->credit )
-break;
-
-pos++;
-}
-list_add_tail(&svc->runq_elem, iter);
+runq_insert_rb(runq, svc, &pos);
 
 if ( unlikely(tb_init_done) )
 {
@@ -1315,8 +1329,11 @@ runq_insert(const struct scheduler *ops, struct 
csched2_vcpu *svc)
 
 static inline void runq_remove(struct csched2_vcpu *svc)
 {
+// TODO This assert might not be required, as we always have a check before
+// calling this API
 ASSERT(vcpu_on_runq(svc));
-list_del_init(&svc->runq_elem);
+rb_erase(&svc->runq_elem, &svc->rqd->runq);
+RB_CLEAR_NODE(&svc->runq_elem);
 }
 
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *,