Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-29 Thread Jarek Poplawski
On Thu, Jun 28, 2007 at 02:55:51PM +0200, Patrick McHardy wrote:
 Jarek Poplawski wrote:
  On Thu, Jun 28, 2007 at 02:23:36PM +0200, Patrick McHardy wrote:
  
 Jarek Poplawski wrote:
 
 @@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic 
 *bstats,
   struct gen_estimator *est, **pest;
 
   for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
 - int killed = 0;
   pest = elist[idx].list;
   while ((est=*pest) != NULL) {
 
...
 Its overkill in that case. The concurrent additions and removals
 can't happen.

BTW, if we talk about overkills: is there any reason to do these
for  while until the end? I can't see why anybody should add the
same *bstats  *rate_est more than once (or max twice if we let
to add, change  remove them independently). With a large number
of classes this could matter.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-29 Thread Jarek Poplawski
On Fri, Jun 29, 2007 at 09:02:41AM +0200, Jarek Poplawski wrote:
...
 same *bstats  *rate_est more than once (or max twice if we let
 to add, change  remove them independently).

...but this doesn't look sensible at all!

So, maybe, if we would need something counted with two intervals...
But, nobody seems to use such possibility, anyway.

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Jarek Poplawski
On Wed, Jun 27, 2007 at 05:25:45PM +0200, Patrick McHardy wrote:
 Patrick McHardy wrote:
  [NET]: gen_estimator: fix locking and timer related bugs
  
 
 
 That one still left a race, we could be reinitalizing the timer
 while it is still running. This patch additionally makes sure
 each timer is only initialized once.
 

 [NET]: gen_estimator: fix locking and timer related bugs
 
 As noticed by Jarek Poplawski [EMAIL PROTECTED], the timer removal in
 gen_kill_estimator races with the timer function rearming the timer.
 
 Additionally there are a few more related problems that seem to be
 relicts from the timer when the estimator was qdisc specific and

- relicts from the timer when the estimator was qdisc specific and
+ relicts from the time when the estimator was qdisc specific and

 could rely on the rtnl or dev-qdisc_lock:

I've lost some time thinking about this rtnl and checking where
these gen_ functions are used, and how much foolish could be
asking about this here, so, it seems there should be some policy
about commenting required locking in networking - I mean after
reading e.g. sch_generic.c you could wrongly think no comments
means: no locking required. (And probably it would be better/
easier for the more experienced to do some supplements, if you
know what I mean...)

 
 - the check whether the list is empty and a timer needs to be started
   when adding a new estimator doesn't take the lock, so it races
   against concurrent additions, which can result in the timer beeing
   added twice or getting reinitialized after being added.
 
 - the new estimator's next pointer is also set without holding the
   lock, again racing against concurrent additions with possible
   list corruption as a result.
 
 - the timer deletion when killing an estimator is also not under
   the lock and races against timer arming when adding a new estimator.
 
 Fix by holding the lock around the entire list addition and initial
 timer arming. Removal is not done explicitly anymore, instead the
 timer function only rearms the timer when there are still estimators
 present.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
 
 ---
 commit b6a0c468c258d96c6f132fc71ca74225235bc223
 tree 6f61004cf4810a4826aa5c7477e4d455ae3a5698
 parent 48d8d7ee5dd17c64833e0343ab4ae8ef01cc2648
 author Patrick McHardy [EMAIL PROTECTED] Wed, 27 Jun 2007 17:06:02 +0200
 committer Patrick McHardy [EMAIL PROTECTED] Wed, 27 Jun 2007 17:24:13 +0200
 
  net/core/gen_estimator.c |   27 +++
  1 files changed, 11 insertions(+), 16 deletions(-)
 
 diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
 index 17daf4c..88a7805 100644
 --- a/net/core/gen_estimator.c
 +++ b/net/core/gen_estimator.c
 @@ -127,8 +127,8 @@ static void est_timer(unsigned long arg)
   e-rate_est-pps = (e-avpps+0x1FF)10;
   spin_unlock(e-stats_lock);
   }
 -
 - mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
 + if (elist[idx].list != NULL)
 + mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
   read_unlock(est_lock);
  }
  
 @@ -152,6 +152,7 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
  {
   struct gen_estimator *est;
   struct gnet_estimator *parm = RTA_DATA(opt);
 + int idx;
  
   if (RTA_PAYLOAD(opt)  sizeof(*parm))
   return -EINVAL;
 @@ -163,7 +164,7 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
   if (est == NULL)
   return -ENOBUFS;
  
 - est-interval = parm-interval + 2;
 + est-interval = idx = parm-interval + 2;
   est-bstats = bstats;
   est-rate_est = rate_est;
   est-stats_lock = stats_lock;
 @@ -173,16 +174,14 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
   est-last_packets = bstats-packets;
   est-avpps = rate_est-pps10;
  
 - est-next = elist[est-interval].list;
 - if (est-next == NULL) {
 - init_timer(elist[est-interval].timer);
 - elist[est-interval].timer.data = est-interval;
 - elist[est-interval].timer.expires = jiffies + 
 ((HZest-interval)/4);
 - elist[est-interval].timer.function = est_timer;
 - add_timer(elist[est-interval].timer);
 - }
   write_lock_bh(est_lock);
 - elist[est-interval].list = est;
 + if (!elist[idx].timer.function)

I think, here could be more consistency about ! or == NULL.

 + setup_timer(elist[idx].timer, est_timer, est-interval);

...and about idx instead of est-interval.

 + if (elist[est-interval].list == NULL)

idx?

 + mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
 +
 + est-next = elist[idx].list;
 + elist[idx].list = est;
   write_unlock_bh(est_lock);
   return 0;
  }
 @@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
   struct gen_estimator *est, **pest;
  
   for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
 - int killed = 0;
   pest = 

Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Jarek Poplawski
On Wed, Jun 27, 2007 at 05:25:45PM +0200, Patrick McHardy wrote:
...
 Additionally there are a few more related problems that seem to be
 relicts from the timer when the estimator was qdisc specific and
 could rely on the rtnl or dev-qdisc_lock:
 
 - the check whether the list is empty and a timer needs to be started
   when adding a new estimator doesn't take the lock, so it races
   against concurrent additions, which can result in the timer beeing
   added twice or getting reinitialized after being added.
 
 - the new estimator's next pointer is also set without holding the
   lock, again racing against concurrent additions with possible
   list corruption as a result.
 
 - the timer deletion when killing an estimator is also not under
   the lock and races against timer arming when adding a new estimator.
 
 Fix by holding the lock around the entire list addition and initial
 timer arming. Removal is not done explicitly anymore, instead the
 timer function only rearms the timer when there are still estimators
 present.
...
 @@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
   struct gen_estimator *est, **pest;
  
   for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
 - int killed = 0;
   pest = elist[idx].list;
   while ((est=*pest) != NULL) {

So, maybe this list walking here needs some locking too?

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Jarek Poplawski
On Thu, Jun 28, 2007 at 08:54:48AM +0200, Jarek Poplawski wrote:
...
  @@ -215,10 +213,7 @@ void gen_kill_estimator(struct gnet_stats_basic 
  *bstats,
  write_unlock_bh(est_lock);
   
  kfree(est);
  -   killed++;
  }
  -   if (killed  elist[idx].list == NULL)
  -   del_timer(elist[idx].timer);
 
 I think this is needed. The old timer could be pending, while
 the gen_new_estimator() is run just after this e.g. in
 gen_replace_estimator().

Sorry! I've forgotten there is mod_timer now, so, it's OK!

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Patrick McHardy

Jarek Poplawski wrote:

@@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
struct gen_estimator *est, **pest;
 
 	for (idx=0; idx = EST_MAX_INTERVAL; idx++) {

-   int killed = 0;
pest = elist[idx].list;
while ((est=*pest) != NULL) {


So, maybe this list walking here needs some locking too?


It depends on whether estimators should be able to rely on
the rtnl in the future or be completely responsible for their
own locking. My patch yesterday was made under the assumption
that they shouldn't rely on external locking, which seemed to
be the right thing for a generic implementation. OTOH its
still specific to networking, so relying on the rtnl doesn't
sound too unreasonable too. I'm beginning to thing I made
the wrong choice with my patch.

I'm busy right now, would you mind looking into a patch that
only deals with the timer races, but still relies on the
rtnl?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Jarek Poplawski
On Thu, Jun 28, 2007 at 02:23:36PM +0200, Patrick McHardy wrote:
 Jarek Poplawski wrote:
 @@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic 
 *bstats,
 struct gen_estimator *est, **pest;
  
 for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
 -   int killed = 0;
 pest = elist[idx].list;
 while ((est=*pest) != NULL) {
 
 So, maybe this list walking here needs some locking too?
 
 It depends on whether estimators should be able to rely on
 the rtnl in the future or be completely responsible for their
 own locking. My patch yesterday was made under the assumption
 that they shouldn't rely on external locking, which seemed to
 be the right thing for a generic implementation. OTOH its
 still specific to networking, so relying on the rtnl doesn't
 sound too unreasonable too. I'm beginning to thing I made
 the wrong choice with my patch.
 
 I'm busy right now, would you mind looking into a patch that
 only deals with the timer races, but still relies on the
 rtnl?

In that case this patch looks OK  enough.

My earlier proposals are only of cosmetical value.

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Patrick McHardy
Jarek Poplawski wrote:
 On Thu, Jun 28, 2007 at 02:23:36PM +0200, Patrick McHardy wrote:
 
Jarek Poplawski wrote:

@@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic 
*bstats,
struct gen_estimator *est, **pest;

for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
-   int killed = 0;
pest = elist[idx].list;
while ((est=*pest) != NULL) {

So, maybe this list walking here needs some locking too?

It depends on whether estimators should be able to rely on
the rtnl in the future or be completely responsible for their
own locking. My patch yesterday was made under the assumption
that they shouldn't rely on external locking, which seemed to
be the right thing for a generic implementation. OTOH its
still specific to networking, so relying on the rtnl doesn't
sound too unreasonable too. I'm beginning to thing I made
the wrong choice with my patch.

I'm busy right now, would you mind looking into a patch that
only deals with the timer races, but still relies on the
rtnl?
 
 
 In that case this patch looks OK  enough.


Its overkill in that case. The concurrent additions and removals
can't happen.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-28 Thread Jarek Poplawski
On Thu, Jun 28, 2007 at 02:55:51PM +0200, Patrick McHardy wrote:
...
 Its overkill in that case. The concurrent additions and removals
 can't happen.
 

Then the changelog needs one more change. Plus, maybe - btw,
1 line about this at the beginning of the file?

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-27 Thread Patrick McHardy

Patrick McHardy wrote:

Jarek Poplawski wrote:

I look at this just now, and maybe it's enough for asking,
but definitely not enough for patch. I'll try to check this
more in the evening, so I could send something tomorrow.

So if it's not only about kindness, feel free to do it
sooner and I've no doubts  - better.


I can take care of it, no problem. 



OK, this patch should fix the Jarek noticed (and a few more).
It does not fix the original HTB problem though.



[NET]: gen_estimator: fix locking and timer related bugs

As noticed by Jarek Poplawski [EMAIL PROTECTED], the timer removal in
gen_kill_estimator races with the timer function rearming the timer.

Additionally there are a few more related problems that seem to be
relicts from the time when the estimator was qdisc specific and
could rely on the rtnl or dev-qdisc_lock:

- the check whether the list is empty and a timer needs to be started
  when adding a new estimator doesn't take the lock, so it races
  against concurrent additions, which can result in the timer getting
  added twice or getting reinitialized after getting added.

- the new estimator's next pointer is also set without holding the
  lock, again racing against concurrent additions with possible
  list corruption as a result.

- the timer deletion when killing an estimator is also not under
  the lock and races against timer arming when adding a new estimator.

Fix by holding the lock around the entire list addition and initial
timer arming. Timer removal is not done explicitly anymore, instead
the timer function only rearms the timer when there are still
estimators present.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 59b5997f78c3cf3366886969ac8e6b38100b30e9
tree 9b43ce1af21ad1c8817f1f2bc8291c0f60457a4e
parent 48d8d7ee5dd17c64833e0343ab4ae8ef01cc2648
author Patrick McHardy [EMAIL PROTECTED] Wed, 27 Jun 2007 17:06:02 +0200
committer Patrick McHardy [EMAIL PROTECTED] Wed, 27 Jun 2007 17:06:02 +0200

 net/core/gen_estimator.c |   10 +++---
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 17daf4c..49a0bd3 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -127,8 +127,8 @@ static void est_timer(unsigned long arg)
e-rate_est-pps = (e-avpps+0x1FF)10;
spin_unlock(e-stats_lock);
}
-
-   mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
+   if (elist[idx].list != NULL)
+   mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
read_unlock(est_lock);
 }
 
@@ -173,6 +173,7 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
est-last_packets = bstats-packets;
est-avpps = rate_est-pps10;
 
+   write_lock_bh(est_lock);
est-next = elist[est-interval].list;
if (est-next == NULL) {
init_timer(elist[est-interval].timer);
@@ -181,7 +182,6 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
elist[est-interval].timer.function = est_timer;
add_timer(elist[est-interval].timer);
}
-   write_lock_bh(est_lock);
elist[est-interval].list = est;
write_unlock_bh(est_lock);
return 0;
@@ -202,7 +202,6 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
struct gen_estimator *est, **pest;
 
for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
-   int killed = 0;
pest = elist[idx].list;
while ((est=*pest) != NULL) {
if (est-rate_est != rate_est || est-bstats != bstats) 
{
@@ -215,10 +214,7 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
write_unlock_bh(est_lock);
 
kfree(est);
-   killed++;
}
-   if (killed  elist[idx].list == NULL)
-   del_timer(elist[idx].timer);
}
 }
 


Re: [NET]: gen_estimator: fix locking and timer related bugs [Re: [Bugme-new] [Bug 8668] New: HTB Deadlock]

2007-06-27 Thread Patrick McHardy
Patrick McHardy wrote:
 [NET]: gen_estimator: fix locking and timer related bugs
 


That one still left a race, we could be reinitalizing the timer
while it is still running. This patch additionally makes sure
each timer is only initialized once.

[NET]: gen_estimator: fix locking and timer related bugs

As noticed by Jarek Poplawski [EMAIL PROTECTED], the timer removal in
gen_kill_estimator races with the timer function rearming the timer.

Additionally there are a few more related problems that seem to be
relicts from the timer when the estimator was qdisc specific and
could rely on the rtnl or dev-qdisc_lock:

- the check whether the list is empty and a timer needs to be started
  when adding a new estimator doesn't take the lock, so it races
  against concurrent additions, which can result in the timer beeing
  added twice or getting reinitialized after being added.

- the new estimator's next pointer is also set without holding the
  lock, again racing against concurrent additions with possible
  list corruption as a result.

- the timer deletion when killing an estimator is also not under
  the lock and races against timer arming when adding a new estimator.

Fix by holding the lock around the entire list addition and initial
timer arming. Removal is not done explicitly anymore, instead the
timer function only rearms the timer when there are still estimators
present.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit b6a0c468c258d96c6f132fc71ca74225235bc223
tree 6f61004cf4810a4826aa5c7477e4d455ae3a5698
parent 48d8d7ee5dd17c64833e0343ab4ae8ef01cc2648
author Patrick McHardy [EMAIL PROTECTED] Wed, 27 Jun 2007 17:06:02 +0200
committer Patrick McHardy [EMAIL PROTECTED] Wed, 27 Jun 2007 17:24:13 +0200

 net/core/gen_estimator.c |   27 +++
 1 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 17daf4c..88a7805 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -127,8 +127,8 @@ static void est_timer(unsigned long arg)
e-rate_est-pps = (e-avpps+0x1FF)10;
spin_unlock(e-stats_lock);
}
-
-   mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
+   if (elist[idx].list != NULL)
+   mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
read_unlock(est_lock);
 }
 
@@ -152,6 +152,7 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
 {
struct gen_estimator *est;
struct gnet_estimator *parm = RTA_DATA(opt);
+   int idx;
 
if (RTA_PAYLOAD(opt)  sizeof(*parm))
return -EINVAL;
@@ -163,7 +164,7 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
if (est == NULL)
return -ENOBUFS;
 
-   est-interval = parm-interval + 2;
+   est-interval = idx = parm-interval + 2;
est-bstats = bstats;
est-rate_est = rate_est;
est-stats_lock = stats_lock;
@@ -173,16 +174,14 @@ int gen_new_estimator(struct gnet_stats_basic *bstats,
est-last_packets = bstats-packets;
est-avpps = rate_est-pps10;
 
-   est-next = elist[est-interval].list;
-   if (est-next == NULL) {
-   init_timer(elist[est-interval].timer);
-   elist[est-interval].timer.data = est-interval;
-   elist[est-interval].timer.expires = jiffies + 
((HZest-interval)/4);
-   elist[est-interval].timer.function = est_timer;
-   add_timer(elist[est-interval].timer);
-   }
write_lock_bh(est_lock);
-   elist[est-interval].list = est;
+   if (!elist[idx].timer.function)
+   setup_timer(elist[idx].timer, est_timer, est-interval);
+   if (elist[est-interval].list == NULL)
+   mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
+
+   est-next = elist[idx].list;
+   elist[idx].list = est;
write_unlock_bh(est_lock);
return 0;
 }
@@ -202,7 +201,6 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
struct gen_estimator *est, **pest;
 
for (idx=0; idx = EST_MAX_INTERVAL; idx++) {
-   int killed = 0;
pest = elist[idx].list;
while ((est=*pest) != NULL) {
if (est-rate_est != rate_est || est-bstats != bstats) 
{
@@ -215,10 +213,7 @@ void gen_kill_estimator(struct gnet_stats_basic *bstats,
write_unlock_bh(est_lock);
 
kfree(est);
-   killed++;
}
-   if (killed  elist[idx].list == NULL)
-   del_timer(elist[idx].timer);
}
 }