Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, 2008-22-01 at 08:21 +0100, Jarek Poplawski wrote: On 22-01-2008 01:29, David Miller wrote: ... Fix this right, make a structure like: struct kernel_gnet_stats_rate_est { struct gnet_stats_rate_est est; void*gen_estimator; } And update all the code as needed. Thanks! I'll try this... Jarek, That looks different from the suggestion from Dave. May i throw in another bone? Theoretically i can see why it would be a really bad idea to walk 50K estimators every time you delete one - which is horrible if you are trying to destroy the say 50K of them and gets worse as the number of schedulers with 50K classes goes up. But i am wondering why a simpler list couldnt be walked, meaning: In gen_kill_estimator(), instead of: for (idx=0; idx = EST_MAX_INTERVAL; idx++) { Would deriving a better initial index be a big improvement? for (e = elist[est-interval].list; e; e = e-next) { cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote: ... Jarek, That looks different from the suggestion from Dave. Hmm..., I'm not sure you mean my or your suggestion here, but you are right anyway... May i throw in another bone? Theoretically i can see why it would be a really bad idea to walk 50K estimators every time you delete one - which is horrible if you are trying to destroy the say 50K of them and gets worse as the number of schedulers with 50K classes goes up. But i am wondering why a simpler list couldnt be walked, meaning: In gen_kill_estimator(), instead of: for (idx=0; idx = EST_MAX_INTERVAL; idx++) { Would deriving a better initial index be a big improvement? for (e = elist[est-interval].list; e; e = e-next) { Maybe I miss something, but there still could be a lot of this walking and IMHO any such longer waiting with BHs disabled is hard to accept with current memory sizes and low-latencies prices. And currently time seems to be even more precious here: RCU can't even free any gen_estimator memory during such large qdisc with classes deletion. Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, 2008-22-01 at 13:29 +0100, Jarek Poplawski wrote: On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote: ... Jarek, That looks different from the suggestion from Dave. Hmm..., I'm not sure you mean my or your suggestion here, but you are right anyway... Your idea to grab a pointer to the estimator so you can quickly find it upon destruction is a good one. The challenge was not to break the ABI to user space. Dave suggested to use a different struct for the kernel side and maintain the user one as is[1]. Your patch didnt do this, hence my statement;- Maybe I miss something, but there still could be a lot of this walking Indeed, that is possible in the case of many estimators configured with the same interval - because they will all fall in the same table bucket and the idx is not that useful to begin with. I was wrong given the nature of interval - the majority of the users will have an estimator interval of say 1 sec which will put everything in one bucket still. We could introduce a proper index that will allow proper distribution and have that stored by the class. I am not sure i made sense. But you are coding - and your idea sounds better. cheers, jamal [1] This is _not uncommon_ (note the usage of double negation here for emphasis;-) technique actually; ones that go further for example can be found all over the net/sched code (struct tcf_police vs tc_police) etc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, Jan 22, 2008 at 08:54:28AM -0500, jamal wrote: On Tue, 2008-22-01 at 13:29 +0100, Jarek Poplawski wrote: On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote: ... Jarek, That looks different from the suggestion from Dave. Hmm..., I'm not sure you mean my or your suggestion here, but you are right anyway... Your idea to grab a pointer to the estimator so you can quickly find it upon destruction is a good one. Let's say it's quite common in the kernel type of idea... The challenge was not to break the ABI to user space. Dave suggested to use a different struct for the kernel side and maintain the user one as is[1]. Your patch didnt do this, hence my statement;- Sure! David's idea is very good, but before reading his message I decided to try something 'safer'. I simply don't know these structures like you or David, and after these mistakes at the beginning, I tried to avoid the next. But now I see it shouldn't be so hard, and I'll do this David's way, I hope! Maybe I miss something, but there still could be a lot of this walking Indeed, that is possible in the case of many estimators configured with the same interval - because they will all fall in the same table bucket and the idx is not that useful to begin with. I was wrong given the nature of interval - the majority of the users will have an estimator interval of say 1 sec which will put everything in one bucket still. We could introduce a proper index that will allow proper distribution and have that stored by the class. I am not sure i made sense. But you are coding - and your idea sounds better. As a matter of fact, I've considered yesterday some additional hash table too, but wasn't sure it's worth of the savings. It seems there should be considered optimization of these estimator structures yet. But since there are now a few similar reports about misterious problems during deletion of large qdisc, it would be nice to remove main suspects around this at any price... Many thanks for your concern with this! (And, if miss something again, don't be afraid to make it straight clear; I really prefer to know the truth about my mistakes, than polite silence.) Regards, Jarek P. [1] This is _not uncommon_ (note the usage of double negation here for emphasis;-) technique actually; ones that go further for example can be found all over the net/sched code (struct tcf_police vs tc_police) etc. I think David's points were very clear and of course right!; it seems now that only my 'usual' way of friendly joking about this could be more challenge than any double-negation and turns friendly fire insted... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
So, let's try something easy first: #ifdef __KERNEL__. (I know there are many esthetes around, but since this subject looks quite dirty...) Alternatively we could change an api, and as a matter of fact there was such a try some time ago, but is it really worth of such a mess? Regards, Jarek P. --- (take 2) gen_kill_estimator() is called during qdisc_destroy() with BHs disabled, and each time does searching for each list member. This can block soft interrupts for quite a long time when many classes are used. This patch changes this by storing pointers to internal gen_estimator structures with gnet_stats_rate_est structures used by clients of gen_estimator. This method removes currently possible registering in gen_estimator the same structures more than once, but it isn't used after all. (There is added a warning if gen_new_estimator() is called with structures being used by gen_estimator already.) Thanks to David Miller for pointing an error in the first version of this patch. Reported-by: Badalian Vyacheslav [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] [needs more testing] --- include/linux/gen_stats.h |4 net/core/gen_estimator.c | 36 +--- 2 files changed, 37 insertions(+), 3 deletions(-) diff -Nurp 2.6.24-rc8-mm1-/include/linux/gen_stats.h 2.6.24-rc8-mm1+/include/linux/gen_stats.h --- 2.6.24-rc8-mm1-/include/linux/gen_stats.h 2007-10-09 22:31:38.0 +0200 +++ 2.6.24-rc8-mm1+/include/linux/gen_stats.h 2008-01-21 22:53:47.0 +0100 @@ -28,11 +28,15 @@ struct gnet_stats_basic * struct gnet_stats_rate_est - rate estimator * @bps: current byte rate * @pps: current packet rate + * @gen_estimator: internal data */ struct gnet_stats_rate_est { __u32 bps; __u32 pps; +#ifdef __KERNEL__ + unsigned long gen_estimator; +#endif }; /** diff -Nurp 2.6.24-rc8-mm1-/net/core/gen_estimator.c 2.6.24-rc8-mm1+/net/core/gen_estimator.c --- 2.6.24-rc8-mm1-/net/core/gen_estimator.c2008-01-19 17:54:45.0 +0100 +++ 2.6.24-rc8-mm1+/net/core/gen_estimator.c2008-01-20 20:58:35.0 +0100 @@ -139,6 +139,9 @@ skip: rcu_read_unlock(); } +static void gen_kill_estimator_find(struct gnet_stats_basic *bstats, + struct gnet_stats_rate_est *rate_est); + /** * gen_new_estimator - create a new rate estimator * @bstats: basic statistics @@ -171,6 +174,10 @@ int gen_new_estimator(struct gnet_stats_ if (parm-interval -2 || parm-interval 3) return -EINVAL; + if (rate_est-gen_estimator) + /* not sure: not zeroed in alloc or reused */ + gen_kill_estimator_find(bstats, rate_est); + est = kzalloc(sizeof(*est), GFP_KERNEL); if (est == NULL) return -ENOBUFS; @@ -184,6 +191,7 @@ int gen_new_estimator(struct gnet_stats_ est-avbps = rate_est-bps5; est-last_packets = bstats-packets; est-avpps = rate_est-pps10; + rate_est-gen_estimator = (unsigned long)est; if (!elist[idx].timer.function) { INIT_LIST_HEAD(elist[idx].list); @@ -209,13 +217,30 @@ static void __gen_kill_estimator(struct * @bstats: basic statistics * @rate_est: rate estimator statistics * - * Removes the rate estimator specified by bstats and rate_est - * and deletes the timer. + * Removes the rate estimator specified by bstats and rate_est. * * NOTE: Called under rtnl_mutex */ void gen_kill_estimator(struct gnet_stats_basic *bstats, - struct gnet_stats_rate_est *rate_est) + struct gnet_stats_rate_est *rate_est) +{ + if (rate_est rate_est-gen_estimator) { + struct gen_estimator *e; + + e = (struct gen_estimator *)rate_est-gen_estimator; + + rate_est-gen_estimator = 0; + write_lock_bh(est_lock); + e-bstats = NULL; + write_unlock_bh(est_lock); + + list_del_rcu(e-list); + call_rcu(e-e_rcu, __gen_kill_estimator); + } +} + +static void gen_kill_estimator_find(struct gnet_stats_basic *bstats, + struct gnet_stats_rate_est *rate_est) { int idx; struct gen_estimator *e, *n; @@ -236,6 +261,11 @@ void gen_kill_estimator(struct gnet_stat list_del_rcu(e-list); call_rcu(e-e_rcu, __gen_kill_estimator); + + WARN_ON_ONCE(1); /* gen_new_estimator() repeated? */ + rate_est-gen_estimator = 0; + + return; } } } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Mon, Jan 21, 2008 at 11:31:37PM +0100, Jarek Poplawski wrote: So, let's try something easy first: #ifdef __KERNEL__. (I know there ... SORRY!!! Of course this is still wrong, I withdraw this patch. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski [EMAIL PROTECTED] Date: Mon, 21 Jan 2008 23:31:37 +0100 So, let's try something easy first: #ifdef __KERNEL__. (I know there are many esthetes around, but since this subject looks quite dirty...) You can't do this, the attribute is copied to the user netlink SKB using sizeof(struct gnet_stats_rate_est) as the size (or more specifically sizeof(*p) where p is a pointer to this structure). Therefore this new patch has the same exact problem. Fix this right, make a structure like: struct kernel_gnet_stats_rate_est { struct gnet_stats_rate_est est; void*gen_estimator; } And update all the code as needed. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On 22-01-2008 01:29, David Miller wrote: ... Fix this right, make a structure like: struct kernel_gnet_stats_rate_est { struct gnet_stats_rate_est est; void*gen_estimator; } And update all the code as needed. Thanks! I'll try this... ...But, as a matter of fact I've thought about something similar (of course much worse), and I was afraid of doing quite a lot of changes at once, maybe again skip something like here. So, maybe one more tiny RFC here... I've tried this from the other side: to make alternative, new api of gen_estimator functions, and then the rest of changes without hurry. This looks not very nice, but IMHO should be safer (especially considering my 'knowledge' of this code and current changes). There is this not very nice additional parameter used e.g. in ngen_new_estimator(), but it seems, after all changes, this should be more visible how and where this could be optimized. (And, after all, this new pointer shouldn't be used very often, so could sit a bit further.) Anyway, if you don't like this idea, let me know and I'll try yours. It will only need more time for this. This is done against 2.6.24-rc8-mm1 with this 3/3 cosmetic patch. I'll send soon part 2: htb patch to use this. Regards, Jarek P. --- include/net/gen_stats.h | 11 + net/core/gen_estimator.c | 99 ++- 2 files changed, 101 insertions(+), 9 deletions(-) diff -Nurp 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h --- 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h 2007-10-09 22:31:38.0 +0200 +++ 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h 2008-01-22 00:13:48.0 +0100 @@ -46,4 +46,15 @@ extern int gen_replace_estimator(struct struct gnet_stats_rate_est *rate_est, spinlock_t *stats_lock, struct rtattr *opt); +extern int ngen_new_estimator(struct gnet_stats_basic *bstats, + struct gnet_stats_rate_est *rate_est, + spinlock_t *stats_lock, struct rtattr *opt, + unsigned long *pgen_estimator); +extern void ngen_kill_estimator(unsigned long *pgen_estimator); + +extern int ngen_replace_estimator(struct gnet_stats_basic *bstats, + struct gnet_stats_rate_est *rate_est, + spinlock_t *stats_lock, struct rtattr *opt, + unsigned long *pgen_estimator); + #endif diff -Nurp 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c --- 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c 2008-01-22 00:01:30.0 +0100 +++ 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c 2008-01-22 00:22:37.0 +0100 @@ -140,26 +140,30 @@ skip: } /** - * gen_new_estimator - create a new rate estimator + * ngen_new_estimator - create a new rate estimator (new version) * @bstats: basic statistics * @rate_est: rate estimator statistics * @stats_lock: statistics lock * @opt: rate estimator configuration TLV + * @pgen_estimator: pointer to return ngen_new_estimator data * * Creates a new rate estimator with bstats as source and rate_est * as destination. A new timer with the interval specified in the * configuration TLV is created. Upon each interval, the latest statistics * will be read from bstats and the estimated rate will be stored in * rate_est with the statistics lock grabed during this period. + * Called directly for pgen_estimator and possibility of fast kill + * or indirectly by gen_new_estimator. * - * Returns 0 on success or a negative error code. + * Returns 0 and data pointed by pgen_estimator on success + * or a negative error code. * * NOTE: Called under rtnl_mutex */ -int gen_new_estimator(struct gnet_stats_basic *bstats, - struct gnet_stats_rate_est *rate_est, - spinlock_t *stats_lock, - struct rtattr *opt) +int ngen_new_estimator(struct gnet_stats_basic *bstats, + struct gnet_stats_rate_est *rate_est, + spinlock_t *stats_lock, struct rtattr *opt, + unsigned long *pgen_estimator) { struct gen_estimator *est; struct gnet_estimator *parm = RTA_DATA(opt); @@ -184,6 +188,7 @@ int gen_new_estimator(struct gnet_stats_ est-avbps = rate_est-bps5; est-last_packets = bstats-packets; est-avpps = rate_est-pps10; + *pgen_estimator = (unsigned long)est; if (!elist[idx].timer.function) { INIT_LIST_HEAD(elist[idx].list); @@ -197,6 +202,32 @@ int gen_new_estimator(struct gnet_stats_ return 0; } +/** + * gen_new_estimator - create a new rate estimator + * @bstats: basic statistics + * @rate_est: rate estimator statistics + * @stats_lock: statistics lock + * @opt: rate estimator configuration
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, Jan 22, 2008 at 08:21:52AM +0100, Jarek Poplawski wrote: ... Part 2 of mini RFC: HTB changes to use new, faster gen_estimator functions._ This is done against 2.6.24-rc8-mm1. Thanks, Jarek P. --- diff -Nurp 2.6.24-rc8-mm1-/net/sched/sch_htb.c 2.6.24-rc8-mm1+/net/sched/sch_htb.c --- 2.6.24-rc8-mm1-/net/sched/sch_htb.c 2008-01-19 17:54:49.0 +0100 +++ 2.6.24-rc8-mm1+/net/sched/sch_htb.c 2008-01-22 00:00:31.0 +0100 @@ -127,6 +127,7 @@ struct htb_class { int prio; /* For parent to leaf return possible here */ int quantum;/* we do backup. Finally full replacement */ /* of un.leaf originals should be done. */ + unsigned long gen_estimator; /* ngen_new_estimator() data */ }; static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate, @@ -1195,7 +1196,7 @@ static void htb_destroy_class(struct Qdi BUG_TRAP(cl-un.leaf.q); qdisc_destroy(cl-un.leaf.q); } - gen_kill_estimator(cl-bstats, cl-rate_est); + ngen_kill_estimator(cl-gen_estimator); qdisc_put_rtab(cl-rate); qdisc_put_rtab(cl-ceil); @@ -1348,9 +1349,10 @@ static int htb_change_class(struct Qdisc if ((cl = kzalloc(sizeof(*cl), GFP_KERNEL)) == NULL) goto failure; - gen_new_estimator(cl-bstats, cl-rate_est, - sch-dev-queue_lock, - tca[TCA_RATE-1] ? : est.rta); + ngen_new_estimator(cl-bstats, cl-rate_est, + sch-dev-queue_lock, + tca[TCA_RATE-1] ? : est.rta, + cl-gen_estimator); cl-refcnt = 1; INIT_LIST_HEAD(cl-sibling); INIT_HLIST_NODE(cl-hlist); @@ -1404,9 +1406,10 @@ static int htb_change_class(struct Qdisc parent ? parent-children : q-root); } else { if (tca[TCA_RATE-1]) - gen_replace_estimator(cl-bstats, cl-rate_est, - sch-dev-queue_lock, - tca[TCA_RATE-1]); + ngen_replace_estimator(cl-bstats, cl-rate_est, + sch-dev-queue_lock, + tca[TCA_RATE-1], + cl-gen_estimator); sch_tree_lock(sch); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html