Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-22 Thread jamal
On Tue, 2008-22-01 at 08:21 +0100, Jarek Poplawski wrote:
 On 22-01-2008 01:29, David Miller wrote:
 ...
  Fix this right, make a structure like:
  
  struct kernel_gnet_stats_rate_est {
  struct gnet_stats_rate_est  est;
  void*gen_estimator;
  }
  
  And update all the code as needed.
 
 Thanks!
  I'll try this...

Jarek,

That looks different from the suggestion from Dave.

May i throw in another bone? Theoretically i can see why it would be a
really bad idea to walk 50K estimators every time you delete one - which
is horrible if you are trying to destroy the say 50K of them and gets
worse as the number of schedulers with 50K classes goes up.

But i am wondering why a simpler list couldnt be walked, meaning:

In gen_kill_estimator(), instead of:

for (idx=0; idx = EST_MAX_INTERVAL; idx++) {

Would deriving a better initial index be a big improvement?
for (e = elist[est-interval].list; e; e = e-next) {


cheers,
jamal

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-22 Thread Jarek Poplawski
On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote:
...
 Jarek,
 
 That looks different from the suggestion from Dave.

Hmm..., I'm not sure you mean my or your suggestion here, but you
are right anyway...

 May i throw in another bone? Theoretically i can see why it would be a
 really bad idea to walk 50K estimators every time you delete one - which
 is horrible if you are trying to destroy the say 50K of them and gets
 worse as the number of schedulers with 50K classes goes up.
 
 But i am wondering why a simpler list couldnt be walked, meaning:
 
 In gen_kill_estimator(), instead of:
 
 for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
 
 Would deriving a better initial index be a big improvement?
 for (e = elist[est-interval].list; e; e = e-next) {

Maybe I miss something, but there still could be a lot of this walking
and IMHO any such longer waiting with BHs disabled is hard to accept
with current memory sizes and low-latencies prices. And currently time
seems to be even more precious here: RCU can't even free any
gen_estimator memory during such large qdisc with classes deletion.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-22 Thread jamal
On Tue, 2008-22-01 at 13:29 +0100, Jarek Poplawski wrote:
 On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote:
 ...
  Jarek,
  
  That looks different from the suggestion from Dave.
 
 Hmm..., I'm not sure you mean my or your suggestion here, but you
 are right anyway...

Your idea to grab a pointer to the estimator so you can quickly find it
upon destruction is a good one.
The challenge was not to break the ABI to user space.
Dave suggested to use a different struct for the kernel side and
maintain the user one as is[1]. Your patch didnt do this, hence my
statement;-

 Maybe I miss something, but there still could be a lot of this walking

Indeed, that is possible in the case of many estimators configured with
the same interval - because they will all fall in the same table bucket
and the idx is not that useful to begin with.

I was wrong given the nature of interval - the majority of the users
will have an estimator interval of say 1 sec which will put everything
in one bucket still. 
We could introduce a proper index that will allow proper distribution
and have that stored by the class. I am not sure i made sense.
But you are coding - and your idea sounds better.

cheers,
jamal


[1] This is _not uncommon_ (note the usage of double negation here for
emphasis;-) technique actually; ones that go further for example can be
found all over the net/sched code (struct tcf_police vs tc_police) etc.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-22 Thread Jarek Poplawski
On Tue, Jan 22, 2008 at 08:54:28AM -0500, jamal wrote:
 On Tue, 2008-22-01 at 13:29 +0100, Jarek Poplawski wrote:
  On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote:
  ...
   Jarek,
   
   That looks different from the suggestion from Dave.
  
  Hmm..., I'm not sure you mean my or your suggestion here, but you
  are right anyway...
 
 Your idea to grab a pointer to the estimator so you can quickly find it
 upon destruction is a good one.

Let's say it's quite common in the kernel type of idea...

 The challenge was not to break the ABI to user space.
 Dave suggested to use a different struct for the kernel side and
 maintain the user one as is[1]. Your patch didnt do this, hence my
 statement;-

Sure! David's idea is very good, but before reading his message I
decided to try something 'safer'. I simply don't know these structures
like you or David, and after these mistakes at the beginning, I tried
to avoid the next. But now I see it shouldn't be so hard, and I'll do
this David's way, I hope!

  Maybe I miss something, but there still could be a lot of this walking
 
 Indeed, that is possible in the case of many estimators configured with
 the same interval - because they will all fall in the same table bucket
 and the idx is not that useful to begin with.
 
 I was wrong given the nature of interval - the majority of the users
 will have an estimator interval of say 1 sec which will put everything
 in one bucket still. 
 We could introduce a proper index that will allow proper distribution
 and have that stored by the class. I am not sure i made sense.
 But you are coding - and your idea sounds better.

As a matter of fact, I've considered yesterday some additional hash
table too, but wasn't sure it's worth of the savings. It seems there
should be considered optimization of these estimator structures yet.

But since there are now a few similar reports about misterious
problems during deletion of large qdisc, it would be nice to remove
main suspects around this at any price...

Many thanks for your concern with this! (And, if miss something again,
don't be afraid to make it straight  clear; I really prefer to know
the truth about my mistakes, than polite silence.)

Regards,
Jarek P.

 [1] This is _not uncommon_ (note the usage of double negation here for
 emphasis;-) technique actually; ones that go further for example can be
 found all over the net/sched code (struct tcf_police vs tc_police) etc.

I think David's points were very clear and of course right!; it seems now
that only my 'usual' way of friendly joking about this could be more
challenge than any double-negation and turns friendly fire insted...
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-21 Thread Jarek Poplawski

So, let's try something easy first: #ifdef __KERNEL__. (I know there
are many esthetes around, but since this subject looks quite dirty...)

Alternatively we could change an api, and as a matter of fact there was
such a try some time ago, but is it really worth of such a mess?

Regards,
Jarek P.
--- (take 2)

gen_kill_estimator() is called during qdisc_destroy() with BHs disabled,
and each time does searching for each list member. This can block soft
interrupts for quite a long time when many classes are used. This patch 
changes this by storing pointers to internal gen_estimator structures
with gnet_stats_rate_est structures used by clients of gen_estimator.

This method removes currently possible registering in gen_estimator the
same structures more than once, but it isn't used after all. (There is
added a warning if gen_new_estimator() is called with structures being
used by gen_estimator already.) Thanks to David Miller for pointing an
error in the first version of this patch.
 
Reported-by: Badalian Vyacheslav [EMAIL PROTECTED]
Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
 
[needs more testing]
 
---

 include/linux/gen_stats.h |4 
 net/core/gen_estimator.c  |   36 +---
 2 files changed, 37 insertions(+), 3 deletions(-)

diff -Nurp 2.6.24-rc8-mm1-/include/linux/gen_stats.h 
2.6.24-rc8-mm1+/include/linux/gen_stats.h
--- 2.6.24-rc8-mm1-/include/linux/gen_stats.h   2007-10-09 22:31:38.0 
+0200
+++ 2.6.24-rc8-mm1+/include/linux/gen_stats.h   2008-01-21 22:53:47.0 
+0100
@@ -28,11 +28,15 @@ struct gnet_stats_basic
  * struct gnet_stats_rate_est - rate estimator
  * @bps: current byte rate
  * @pps: current packet rate
+ * @gen_estimator: internal data
  */
 struct gnet_stats_rate_est
 {
__u32   bps;
__u32   pps;
+#ifdef __KERNEL__
+   unsigned long   gen_estimator;
+#endif
 };
 
 /**
diff -Nurp 2.6.24-rc8-mm1-/net/core/gen_estimator.c 
2.6.24-rc8-mm1+/net/core/gen_estimator.c
--- 2.6.24-rc8-mm1-/net/core/gen_estimator.c2008-01-19 17:54:45.0 
+0100
+++ 2.6.24-rc8-mm1+/net/core/gen_estimator.c2008-01-20 20:58:35.0 
+0100
@@ -139,6 +139,9 @@ skip:
rcu_read_unlock();
 }
 
+static void gen_kill_estimator_find(struct gnet_stats_basic *bstats,
+   struct gnet_stats_rate_est *rate_est);
+
 /**
  * gen_new_estimator - create a new rate estimator
  * @bstats: basic statistics
@@ -171,6 +174,10 @@ int gen_new_estimator(struct gnet_stats_
if (parm-interval  -2 || parm-interval  3)
return -EINVAL;
 
+   if (rate_est-gen_estimator)
+   /* not sure: not zeroed in alloc or reused */
+   gen_kill_estimator_find(bstats, rate_est);
+
est = kzalloc(sizeof(*est), GFP_KERNEL);
if (est == NULL)
return -ENOBUFS;
@@ -184,6 +191,7 @@ int gen_new_estimator(struct gnet_stats_
est-avbps = rate_est-bps5;
est-last_packets = bstats-packets;
est-avpps = rate_est-pps10;
+   rate_est-gen_estimator = (unsigned long)est;
 
if (!elist[idx].timer.function) {
INIT_LIST_HEAD(elist[idx].list);
@@ -209,13 +217,30 @@ static void __gen_kill_estimator(struct 
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  *
- * Removes the rate estimator specified by bstats and rate_est
- * and deletes the timer.
+ * Removes the rate estimator specified by bstats and rate_est.
  *
  * NOTE: Called under rtnl_mutex
  */
 void gen_kill_estimator(struct gnet_stats_basic *bstats,
-   struct gnet_stats_rate_est *rate_est)
+   struct gnet_stats_rate_est *rate_est)
+{
+   if (rate_est  rate_est-gen_estimator) {
+   struct gen_estimator *e;
+   
+   e = (struct gen_estimator *)rate_est-gen_estimator;
+
+   rate_est-gen_estimator = 0;
+   write_lock_bh(est_lock);
+   e-bstats = NULL;
+   write_unlock_bh(est_lock);
+
+   list_del_rcu(e-list);
+   call_rcu(e-e_rcu, __gen_kill_estimator);
+   }
+}
+
+static void gen_kill_estimator_find(struct gnet_stats_basic *bstats,
+   struct gnet_stats_rate_est *rate_est)
 {
int idx;
struct gen_estimator *e, *n;
@@ -236,6 +261,11 @@ void gen_kill_estimator(struct gnet_stat
 
list_del_rcu(e-list);
call_rcu(e-e_rcu, __gen_kill_estimator);
+
+   WARN_ON_ONCE(1); /* gen_new_estimator() repeated? */
+   rate_est-gen_estimator = 0;
+
+   return;
}
}
 }
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-21 Thread Jarek Poplawski
On Mon, Jan 21, 2008 at 11:31:37PM +0100, Jarek Poplawski wrote:
 
 So, let's try something easy first: #ifdef __KERNEL__. (I know there
...

SORRY!!! Of course this is still wrong, I withdraw this patch.

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-21 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Mon, 21 Jan 2008 23:31:37 +0100

 
 So, let's try something easy first: #ifdef __KERNEL__. (I know there
 are many esthetes around, but since this subject looks quite dirty...)

You can't do this, the attribute is copied to the user netlink SKB
using sizeof(struct gnet_stats_rate_est) as the size (or more
specifically sizeof(*p) where p is a pointer to this structure).

Therefore this new patch has the same exact problem.

Fix this right, make a structure like:

struct kernel_gnet_stats_rate_est {
struct gnet_stats_rate_est  est;
void*gen_estimator;
}

And update all the code as needed.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-21 Thread Jarek Poplawski
On 22-01-2008 01:29, David Miller wrote:
...
 Fix this right, make a structure like:
 
 struct kernel_gnet_stats_rate_est {
   struct gnet_stats_rate_est  est;
   void*gen_estimator;
 }
 
 And update all the code as needed.

Thanks! I'll try this...

...But, as a matter of fact I've thought about something similar (of
course much worse), and I was afraid of doing quite a lot of changes
at once, maybe again skip something like here. So, maybe one more
tiny RFC here...

I've tried this from the other side: to make alternative, new api of
gen_estimator functions, and then the rest of changes without hurry.
This looks not very nice, but IMHO should be safer (especially
considering my 'knowledge' of this code and current changes). There
is this not very nice additional parameter used e.g. in
ngen_new_estimator(), but it seems, after all changes, this should
be more visible how and where this could be optimized. (And, after
all, this new pointer shouldn't be used very often, so could sit a
bit further.)

Anyway, if you don't like this idea, let me know and I'll try yours.
It will only need more time for this.

This is done against 2.6.24-rc8-mm1 with this 3/3 cosmetic patch. 
I'll send soon part 2: htb patch to use this.

Regards,
Jarek P.

---

 include/net/gen_stats.h  |   11 +
 net/core/gen_estimator.c |   99 ++-
 2 files changed, 101 insertions(+), 9 deletions(-)


diff -Nurp 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h 
2.6.24-rc8-mm1-p3+/include/net/gen_stats.h
--- 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h  2007-10-09 22:31:38.0 
+0200
+++ 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h  2008-01-22 00:13:48.0 
+0100
@@ -46,4 +46,15 @@ extern int gen_replace_estimator(struct 
 struct gnet_stats_rate_est *rate_est,
 spinlock_t *stats_lock, struct rtattr *opt);
 
+extern int ngen_new_estimator(struct gnet_stats_basic *bstats,
+ struct gnet_stats_rate_est *rate_est,
+ spinlock_t *stats_lock, struct rtattr *opt,
+ unsigned long *pgen_estimator);
+extern void ngen_kill_estimator(unsigned long *pgen_estimator);
+
+extern int ngen_replace_estimator(struct gnet_stats_basic *bstats,
+ struct gnet_stats_rate_est *rate_est,
+ spinlock_t *stats_lock, struct rtattr *opt,
+ unsigned long *pgen_estimator);
+
 #endif
diff -Nurp 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c 
2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c
--- 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c 2008-01-22 00:01:30.0 
+0100
+++ 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c 2008-01-22 00:22:37.0 
+0100
@@ -140,26 +140,30 @@ skip:
 }
 
 /**
- * gen_new_estimator - create a new rate estimator
+ * ngen_new_estimator - create a new rate estimator (new version)
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  * @stats_lock: statistics lock
  * @opt: rate estimator configuration TLV
+ * @pgen_estimator: pointer to return ngen_new_estimator data
  *
  * Creates a new rate estimator with bstats as source and rate_est
  * as destination. A new timer with the interval specified in the
  * configuration TLV is created. Upon each interval, the latest statistics
  * will be read from bstats and the estimated rate will be stored in
  * rate_est with the statistics lock grabed during this period.
+ * Called directly for pgen_estimator and possibility of fast kill
+ * or indirectly by gen_new_estimator.
  *
- * Returns 0 on success or a negative error code.
+ * Returns 0 and data pointed by pgen_estimator on success
+ * or a negative error code.
  *
  * NOTE: Called under rtnl_mutex
  */
-int gen_new_estimator(struct gnet_stats_basic *bstats,
- struct gnet_stats_rate_est *rate_est,
- spinlock_t *stats_lock,
- struct rtattr *opt)
+int ngen_new_estimator(struct gnet_stats_basic *bstats,
+  struct gnet_stats_rate_est *rate_est,
+  spinlock_t *stats_lock, struct rtattr *opt,
+  unsigned long *pgen_estimator)
 {
struct gen_estimator *est;
struct gnet_estimator *parm = RTA_DATA(opt);
@@ -184,6 +188,7 @@ int gen_new_estimator(struct gnet_stats_
est-avbps = rate_est-bps5;
est-last_packets = bstats-packets;
est-avpps = rate_est-pps10;
+   *pgen_estimator = (unsigned long)est;
 
if (!elist[idx].timer.function) {
INIT_LIST_HEAD(elist[idx].list);
@@ -197,6 +202,32 @@ int gen_new_estimator(struct gnet_stats_
return 0;
 }
 
+/**
+ * gen_new_estimator - create a new rate estimator 
+ * @bstats: basic statistics
+ * @rate_est: rate estimator statistics
+ * @stats_lock: statistics lock
+ * @opt: rate estimator configuration 

Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator

2008-01-21 Thread Jarek Poplawski
On Tue, Jan 22, 2008 at 08:21:52AM +0100, Jarek Poplawski wrote:
...

Part 2 of mini RFC:

HTB changes to use new, faster gen_estimator functions._

This is done against 2.6.24-rc8-mm1. 

Thanks,
Jarek P.

---

diff -Nurp 2.6.24-rc8-mm1-/net/sched/sch_htb.c 
2.6.24-rc8-mm1+/net/sched/sch_htb.c
--- 2.6.24-rc8-mm1-/net/sched/sch_htb.c 2008-01-19 17:54:49.0 +0100
+++ 2.6.24-rc8-mm1+/net/sched/sch_htb.c 2008-01-22 00:00:31.0 +0100
@@ -127,6 +127,7 @@ struct htb_class {
int prio;   /* For parent to leaf return possible here */
int quantum;/* we do backup. Finally full replacement  */
/* of un.leaf originals should be done. */
+   unsigned long   gen_estimator;  /* ngen_new_estimator() data */
 };
 
 static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate,
@@ -1195,7 +1196,7 @@ static void htb_destroy_class(struct Qdi
BUG_TRAP(cl-un.leaf.q);
qdisc_destroy(cl-un.leaf.q);
}
-   gen_kill_estimator(cl-bstats, cl-rate_est);
+   ngen_kill_estimator(cl-gen_estimator);
qdisc_put_rtab(cl-rate);
qdisc_put_rtab(cl-ceil);
 
@@ -1348,9 +1349,10 @@ static int htb_change_class(struct Qdisc
if ((cl = kzalloc(sizeof(*cl), GFP_KERNEL)) == NULL)
goto failure;
 
-   gen_new_estimator(cl-bstats, cl-rate_est,
- sch-dev-queue_lock,
- tca[TCA_RATE-1] ? : est.rta);
+   ngen_new_estimator(cl-bstats, cl-rate_est,
+  sch-dev-queue_lock,
+  tca[TCA_RATE-1] ? : est.rta,
+  cl-gen_estimator);
cl-refcnt = 1;
INIT_LIST_HEAD(cl-sibling);
INIT_HLIST_NODE(cl-hlist);
@@ -1404,9 +1406,10 @@ static int htb_change_class(struct Qdisc
  parent ? parent-children : q-root);
} else {
if (tca[TCA_RATE-1])
-   gen_replace_estimator(cl-bstats, cl-rate_est,
- sch-dev-queue_lock,
- tca[TCA_RATE-1]);
+   ngen_replace_estimator(cl-bstats, cl-rate_est,
+  sch-dev-queue_lock,
+  tca[TCA_RATE-1],
+  cl-gen_estimator);
sch_tree_lock(sch);
}
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html