Re: [PATCH] bonding: replace system timer with work queue

2007-03-01 Thread Jay Vosburgh
Andrew Morton <[EMAIL PROTECTED]> wrote:

>On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <[EMAIL PROTECTED]> 
>wrote:
>> ==========
>> bonding: replace system timer with work queue
>> 
>> This patch replaces system timer with work queue in monitor functions.
>> The reason for this change is that bonding handlers calls various
>> sleeping functions from the timer handler which is not allowed.
>
>Which sleeping functions?  I'd have expected the kernel to spew runtime
>warnings when this happens, but I don't recall any such reports.

This affects one specific mode (balance-alb) in one specific
case (moving MAC addresses around, which happens during failover or
initialization), and a full fix is more complicated than just a switch
to work queues, although that is part of the full fix.  There are three
things going on: calls to sleeping functions with locks held, the same
calls from the timer context, and rtnl hold issues.

The actual functions affected are various things called by
notifier NETDEV_CHANGEADDR callbacks started by dev_set_mac_address() as
well as some of the driver level set_mac_address functions that may
sleep.

Andy Gospodarek <[EMAIL PROTECTED]> and I have been working
jointly on a two phased fix for these problems: he's working up the
short term fix, which includes the changeover to workqueues, and I've
been working on the long term fix, which involves refactoring the
bonding link monitoring and failover system.  Jaroslav's patch looks to
be a subset of the patch Andy is working on.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-03-01 Thread Stephen Hemminger

Andrew Morton wrote:

On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <[EMAIL PROTECTED]> 
wrote:

  

Hi,

	please, review and apply to mm tree for further testing. The patch 
is also available at 
ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .



Please cc netdev@vger.kernel.org on net-related patches, thanks.

  

Thank you,
Jaroslav

==
bonding: replace system timer with work queue

This patch replaces system timer with work queue in monitor functions.
The reason for this change is that bonding handlers calls various
sleeping functions from the timer handler which is not allowed.



Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.


  

Because we cannot share the main workqueue threads (rtnl_lock is used
also in linkwatch_event) - new bond workqueue thread is created.

Signed-off-by: Jaroslav Kysela <[EMAIL PROTECTED]>

diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
linux-2.6.20/drivers/net/bonding/bond_3ad.c
--- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_3ad.c 2007-02-28 09:19:43.831369202 
+0100
@@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
  * times out, and it selects an aggregator for the ports that are yet not

  * related to any aggregator, and selects the active aggregator for a bond.
  */
-void bond_3ad_state_machine_handler(struct bonding *bond)
+void bond_3ad_state_machine_handler(struct work_struct *work)
 {
+   struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
ad_work.work);
+   struct bonding *bond = (struct bonding *)((char *)ad_info - 
offsetof(struct bonding, ad_info));



We can use containers_of here too?

  

-void bond_alb_monitor(struct bonding *bond)
+void bond_alb_monitor(struct work_struct *work)
 {
-   struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+   struct alb_bond_info *bond_info = container_of(work, struct 
alb_bond_info, alb_work.work);
+   struct bonding *bond = (struct bonding *)((char *)bond_info - 
offsetof(struct bonding, alb_info));



And here.

  

+   cancel_rearming_delayed_workqueue(bond_wq, 
&(BOND_AD_INFO(bond).ad_work));



As I mentioned earlier this call to cancel_rearming_delayed_workqueue 
can deadlock

with netlink_watch. This happens if:

dev_close
   rtnl_lock carrier lost on device
   bond_close netlink related workqueue event waiting 
for rtnl

  cancel_workqueue
 spinning waiting for workq to drain

The agreed upon semantics is to never do any operation that waits for workq
to drain with RTNL held.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-03-01 Thread Stephen Hemminger

Andrew Morton wrote:

On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela [EMAIL PROTECTED] 
wrote:

  

Hi,

	please, review and apply to mm tree for further testing. The patch 
is also available at 
ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .



Please cc netdev@vger.kernel.org on net-related patches, thanks.

  

Thank you,
Jaroslav

==
bonding: replace system timer with work queue

This patch replaces system timer with work queue in monitor functions.
The reason for this change is that bonding handlers calls various
sleeping functions from the timer handler which is not allowed.



Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.


  

Because we cannot share the main workqueue threads (rtnl_lock is used
also in linkwatch_event) - new bond workqueue thread is created.

Signed-off-by: Jaroslav Kysela [EMAIL PROTECTED]

diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
linux-2.6.20/drivers/net/bonding/bond_3ad.c
--- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_3ad.c 2007-02-28 09:19:43.831369202 
+0100
@@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
  * times out, and it selects an aggregator for the ports that are yet not

  * related to any aggregator, and selects the active aggregator for a bond.
  */
-void bond_3ad_state_machine_handler(struct bonding *bond)
+void bond_3ad_state_machine_handler(struct work_struct *work)
 {
+   struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
ad_work.work);
+   struct bonding *bond = (struct bonding *)((char *)ad_info - 
offsetof(struct bonding, ad_info));



We can use containers_of here too?

  

-void bond_alb_monitor(struct bonding *bond)
+void bond_alb_monitor(struct work_struct *work)
 {
-   struct alb_bond_info *bond_info = (BOND_ALB_INFO(bond));
+   struct alb_bond_info *bond_info = container_of(work, struct 
alb_bond_info, alb_work.work);
+   struct bonding *bond = (struct bonding *)((char *)bond_info - 
offsetof(struct bonding, alb_info));



And here.

  

+   cancel_rearming_delayed_workqueue(bond_wq, 
(BOND_AD_INFO(bond).ad_work));



As I mentioned earlier this call to cancel_rearming_delayed_workqueue 
can deadlock

with netlink_watch. This happens if:

dev_close
   rtnl_lock carrier lost on device
   bond_close netlink related workqueue event waiting 
for rtnl

  cancel_workqueue
 spinning waiting for workq to drain

The agreed upon semantics is to never do any operation that waits for workq
to drain with RTNL held.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-03-01 Thread Jay Vosburgh
Andrew Morton [EMAIL PROTECTED] wrote:

On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela [EMAIL PROTECTED] 
wrote:
 ==
 bonding: replace system timer with work queue
 
 This patch replaces system timer with work queue in monitor functions.
 The reason for this change is that bonding handlers calls various
 sleeping functions from the timer handler which is not allowed.

Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.

This affects one specific mode (balance-alb) in one specific
case (moving MAC addresses around, which happens during failover or
initialization), and a full fix is more complicated than just a switch
to work queues, although that is part of the full fix.  There are three
things going on: calls to sleeping functions with locks held, the same
calls from the timer context, and rtnl hold issues.

The actual functions affected are various things called by
notifier NETDEV_CHANGEADDR callbacks started by dev_set_mac_address() as
well as some of the driver level set_mac_address functions that may
sleep.

Andy Gospodarek [EMAIL PROTECTED] and I have been working
jointly on a two phased fix for these problems: he's working up the
short term fix, which includes the changeover to workqueues, and I've
been working on the long term fix, which involves refactoring the
bonding link monitoring and failover system.  Jaroslav's patch looks to
be a subset of the patch Andy is working on.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Andrew Morton
On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <[EMAIL PROTECTED]> 
wrote:

> Hi,
> 
>   please, review and apply to mm tree for further testing. The patch 
> is also available at 
> ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .

Please cc netdev@vger.kernel.org on net-related patches, thanks.

>   Thank you,
>   Jaroslav
> 
> ======
> bonding: replace system timer with work queue
> 
> This patch replaces system timer with work queue in monitor functions.
> The reason for this change is that bonding handlers calls various
> sleeping functions from the timer handler which is not allowed.

Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.


> Because we cannot share the main workqueue threads (rtnl_lock is used
> also in linkwatch_event) - new bond workqueue thread is created.
> 
> Signed-off-by: Jaroslav Kysela <[EMAIL PROTECTED]>
> 
> diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
> linux-2.6.20/drivers/net/bonding/bond_3ad.c
> --- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c  2007-02-04 
> 19:44:54.0 +0100
> +++ linux-2.6.20/drivers/net/bonding/bond_3ad.c   2007-02-28 
> 09:19:43.831369202 +0100
> @@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
>   * times out, and it selects an aggregator for the ports that are yet not
>   * related to any aggregator, and selects the active aggregator for a bond.
>   */
> -void bond_3ad_state_machine_handler(struct bonding *bond)
> +void bond_3ad_state_machine_handler(struct work_struct *work)
>  {
> + struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
> ad_work.work);
> + struct bonding *bond = (struct bonding *)((char *)ad_info - 
> offsetof(struct bonding, ad_info));

We can use containers_of here too?

> -void bond_alb_monitor(struct bonding *bond)
> +void bond_alb_monitor(struct work_struct *work)
>  {
> - struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
> + struct alb_bond_info *bond_info = container_of(work, struct 
> alb_bond_info, alb_work.work);
> + struct bonding *bond = (struct bonding *)((char *)bond_info - 
> offsetof(struct bonding, alb_info));

And here.

> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_AD_INFO(bond).ad_work));
>   break;
>   case BOND_MODE_TLB:
>   case BOND_MODE_ALB:
> - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_ALB_INFO(bond).alb_work));
>   break;
>   default:
>   break;
> @@ -4289,6 +4272,14 @@ static int bond_init(struct net_device *
>   rwlock_init(>lock);
>   rwlock_init(>curr_slave_lock);
>  
> + /* initialize work */
> + INIT_DELAYED_WORK(>mii_work, (void *)_mii_monitor);
> + if (params->mode == BOND_MODE_ACTIVEBACKUP) {
> + INIT_DELAYED_WORK(>arp_work, (void 
> *)_activebackup_arp_mon);
> + } else {
> + INIT_DELAYED_WORK(>arp_work, (void 
> *)_loadbalance_arp_mon);
> + }

Can we lose the unneeded braces, the unneeded typecasts and fit the code
into 80 cols?



yup.

>   bond->params = *params; /* copy params struct */
>  
>   /* Initialize pointers */
> @@ -4782,6 +4773,12 @@ static int __init bonding_init(void)
>   goto err;
>   }
>  
> + bond_wq = create_singlethread_workqueue("bond");
> + if (bond_wq == NULL) {
> + res = -ENOMEM;
> + goto err;
> + }
> +
>   res = bond_create_sysfs();
>   if (res)
>   goto err;
> @@ -4807,6 +4804,7 @@ static void __exit bonding_exit(void)
>  
>   rtnl_lock();
>   bond_free_all();
> + destroy_workqueue(bond_wq);
>   bond_destroy_sysfs();
>   rtnl_unlock();

Are you sure that all pending delayed works have been cancelled when we
destroy this workqueue?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Stephen Hemminger
On Wed, 28 Feb 2007 10:12:01 +0100 (CET)
Jaroslav Kysela <[EMAIL PROTECTED]> wrote:

> Hi,
> 
>   please, review and apply to mm tree for further testing. The patch 
> is also available at 
> ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .
> 
>   Thank you,
>   Jaroslav
> 

You should submit network patches to the entry in the MAINTAINERS file.

BONDING DRIVER
P:  Chad Tindel
M:  [EMAIL PROTECTED]
P:  Jay Vosburgh
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
W:  http://sourceforge.net/projects/bonding/
S:  Supported


> @@ -3569,20 +3552,20 @@ static int bond_close(struct net_device 
>*/
>  
>   if (bond->params.miimon) {  /* link check interval, in milliseconds. */
> - del_timer_sync(>mii_timer);
> + cancel_rearming_delayed_workqueue(bond_wq, >mii_work);
>   }
>  
>   if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
> - del_timer_sync(>arp_timer);
> + cancel_rearming_delayed_workqueue(bond_wq, >arp_work);
>   }
>  
>   switch (bond->params.mode) {
>   case BOND_MODE_8023AD:
> - del_timer_sync(&(BOND_AD_INFO(bond).ad_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_AD_INFO(bond).ad_work));
>   break;
>   case BOND_MODE_TLB:
>   case BOND_MODE_ALB:
> - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_ALB_INFO(bond).alb_work));
>   break;
>   default:
>   break;


This part will deadlock since it is not safe to cancel a workqueue
entry with RTNL mutex held. The cancel operation has to wait for the workqueue
to run, and the entry being run maybe stuck waiting for the RTNL.



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Jaroslav Kysela
Hi,

please, review and apply to mm tree for further testing. The patch 
is also available at 
ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .

Thank you,
Jaroslav

==
bonding: replace system timer with work queue

This patch replaces system timer with work queue in monitor functions.
The reason for this change is that bonding handlers calls various
sleeping functions from the timer handler which is not allowed.
Because we cannot share the main workqueue threads (rtnl_lock is used
also in linkwatch_event) - new bond workqueue thread is created.

Signed-off-by: Jaroslav Kysela <[EMAIL PROTECTED]>

diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
linux-2.6.20/drivers/net/bonding/bond_3ad.c
--- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_3ad.c 2007-02-28 09:19:43.831369202 
+0100
@@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
  * times out, and it selects an aggregator for the ports that are yet not
  * related to any aggregator, and selects the active aggregator for a bond.
  */
-void bond_3ad_state_machine_handler(struct bonding *bond)
+void bond_3ad_state_machine_handler(struct work_struct *work)
 {
+   struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
ad_work.work);
+   struct bonding *bond = (struct bonding *)((char *)ad_info - 
offsetof(struct bonding, ad_info));
struct port *port;
struct aggregator *aggregator;
 
@@ -2149,7 +2151,7 @@ void bond_3ad_state_machine_handler(stru
}
 
 re_arm:
-   mod_timer(&(BOND_AD_INFO(bond).ad_timer), jiffies + ad_delta_in_ticks);
+   queue_delayed_work(bond_wq, &(BOND_AD_INFO(bond).ad_work), 
ad_delta_in_ticks);
 out:
read_unlock(>lock);
 }
diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.h 
linux-2.6.20/drivers/net/bonding/bond_3ad.h
--- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.h2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_3ad.h 2007-02-28 09:25:21.921287093 
+0100
@@ -261,7 +261,7 @@ struct ad_bond_info {
int lacp_fast;  /* whether fast periodic tx should be
 * requested
 */
-   struct timer_list ad_timer;
+   struct delayed_work ad_work;
struct packet_type ad_pkt_type;
 };
 
@@ -276,7 +276,7 @@ struct ad_slave_info {
 void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int 
lacp_fast);
 int  bond_3ad_bind_slave(struct slave *slave);
 void bond_3ad_unbind_slave(struct slave *slave);
-void bond_3ad_state_machine_handler(struct bonding *bond);
+void bond_3ad_state_machine_handler(struct work_struct *work);
 void bond_3ad_adapter_speed_changed(struct slave *slave);
 void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_alb.c 
linux-2.6.20/drivers/net/bonding/bond_alb.c
--- linux-2.6.20.orig/drivers/net/bonding/bond_alb.c2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_alb.c 2007-02-28 09:26:10.857038588 
+0100
@@ -28,7 +28,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -1367,9 +1367,10 @@ out:
return 0;
 }
 
-void bond_alb_monitor(struct bonding *bond)
+void bond_alb_monitor(struct work_struct *work)
 {
-   struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
+   struct alb_bond_info *bond_info = container_of(work, struct 
alb_bond_info, alb_work.work);
+   struct bonding *bond = (struct bonding *)((char *)bond_info - 
offsetof(struct bonding, alb_info));
struct slave *slave;
int i;
 
@@ -1471,7 +1472,7 @@ void bond_alb_monitor(struct bonding *bo
}
 
 re_arm:
-   mod_timer(&(bond_info->alb_timer), jiffies + alb_delta_in_ticks);
+   queue_delayed_work(bond_wq, &(bond_info->alb_work), alb_delta_in_ticks);
 out:
read_unlock(>lock);
 }
diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_alb.h 
linux-2.6.20/drivers/net/bonding/bond_alb.h
--- linux-2.6.20.orig/drivers/net/bonding/bond_alb.h2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_alb.h 2007-02-28 09:25:50.607486221 
+0100
@@ -84,7 +84,7 @@ struct tlb_slave_info {
 };
 
 struct alb_bond_info {
-   struct timer_list   alb_timer;
+   struct delayed_work alb_work;
struct tlb_client_info  *tx_hashtbl; /* Dynamically allocated */
spinlock_t  tx_hashtbl_lock;
u32 unbalanced_load;
@@ -125,7 +125,7 @@ void bond_alb_deinit_slave(struct bondin
 void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, 
char link);
 voi

[PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Jaroslav Kysela
Hi,

please, review and apply to mm tree for further testing. The patch 
is also available at 
ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .

Thank you,
Jaroslav

==
bonding: replace system timer with work queue

This patch replaces system timer with work queue in monitor functions.
The reason for this change is that bonding handlers calls various
sleeping functions from the timer handler which is not allowed.
Because we cannot share the main workqueue threads (rtnl_lock is used
also in linkwatch_event) - new bond workqueue thread is created.

Signed-off-by: Jaroslav Kysela [EMAIL PROTECTED]

diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
linux-2.6.20/drivers/net/bonding/bond_3ad.c
--- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_3ad.c 2007-02-28 09:19:43.831369202 
+0100
@@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
  * times out, and it selects an aggregator for the ports that are yet not
  * related to any aggregator, and selects the active aggregator for a bond.
  */
-void bond_3ad_state_machine_handler(struct bonding *bond)
+void bond_3ad_state_machine_handler(struct work_struct *work)
 {
+   struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
ad_work.work);
+   struct bonding *bond = (struct bonding *)((char *)ad_info - 
offsetof(struct bonding, ad_info));
struct port *port;
struct aggregator *aggregator;
 
@@ -2149,7 +2151,7 @@ void bond_3ad_state_machine_handler(stru
}
 
 re_arm:
-   mod_timer((BOND_AD_INFO(bond).ad_timer), jiffies + ad_delta_in_ticks);
+   queue_delayed_work(bond_wq, (BOND_AD_INFO(bond).ad_work), 
ad_delta_in_ticks);
 out:
read_unlock(bond-lock);
 }
diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.h 
linux-2.6.20/drivers/net/bonding/bond_3ad.h
--- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.h2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_3ad.h 2007-02-28 09:25:21.921287093 
+0100
@@ -261,7 +261,7 @@ struct ad_bond_info {
int lacp_fast;  /* whether fast periodic tx should be
 * requested
 */
-   struct timer_list ad_timer;
+   struct delayed_work ad_work;
struct packet_type ad_pkt_type;
 };
 
@@ -276,7 +276,7 @@ struct ad_slave_info {
 void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int 
lacp_fast);
 int  bond_3ad_bind_slave(struct slave *slave);
 void bond_3ad_unbind_slave(struct slave *slave);
-void bond_3ad_state_machine_handler(struct bonding *bond);
+void bond_3ad_state_machine_handler(struct work_struct *work);
 void bond_3ad_adapter_speed_changed(struct slave *slave);
 void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_alb.c 
linux-2.6.20/drivers/net/bonding/bond_alb.c
--- linux-2.6.20.orig/drivers/net/bonding/bond_alb.c2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_alb.c 2007-02-28 09:26:10.857038588 
+0100
@@ -28,7 +28,7 @@
 #include linux/pkt_sched.h
 #include linux/spinlock.h
 #include linux/slab.h
-#include linux/timer.h
+#include linux/workqueue.h
 #include linux/ip.h
 #include linux/ipv6.h
 #include linux/if_arp.h
@@ -1367,9 +1367,10 @@ out:
return 0;
 }
 
-void bond_alb_monitor(struct bonding *bond)
+void bond_alb_monitor(struct work_struct *work)
 {
-   struct alb_bond_info *bond_info = (BOND_ALB_INFO(bond));
+   struct alb_bond_info *bond_info = container_of(work, struct 
alb_bond_info, alb_work.work);
+   struct bonding *bond = (struct bonding *)((char *)bond_info - 
offsetof(struct bonding, alb_info));
struct slave *slave;
int i;
 
@@ -1471,7 +1472,7 @@ void bond_alb_monitor(struct bonding *bo
}
 
 re_arm:
-   mod_timer((bond_info-alb_timer), jiffies + alb_delta_in_ticks);
+   queue_delayed_work(bond_wq, (bond_info-alb_work), alb_delta_in_ticks);
 out:
read_unlock(bond-lock);
 }
diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_alb.h 
linux-2.6.20/drivers/net/bonding/bond_alb.h
--- linux-2.6.20.orig/drivers/net/bonding/bond_alb.h2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.20/drivers/net/bonding/bond_alb.h 2007-02-28 09:25:50.607486221 
+0100
@@ -84,7 +84,7 @@ struct tlb_slave_info {
 };
 
 struct alb_bond_info {
-   struct timer_list   alb_timer;
+   struct delayed_work alb_work;
struct tlb_client_info  *tx_hashtbl; /* Dynamically allocated */
spinlock_t  tx_hashtbl_lock;
u32 unbalanced_load;
@@ -125,7 +125,7 @@ void bond_alb_deinit_slave(struct bondin
 void

Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Stephen Hemminger
On Wed, 28 Feb 2007 10:12:01 +0100 (CET)
Jaroslav Kysela [EMAIL PROTECTED] wrote:

 Hi,
 
   please, review and apply to mm tree for further testing. The patch 
 is also available at 
 ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .
 
   Thank you,
   Jaroslav
 

You should submit network patches to the entry in the MAINTAINERS file.

BONDING DRIVER
P:  Chad Tindel
M:  [EMAIL PROTECTED]
P:  Jay Vosburgh
M:  [EMAIL PROTECTED]
L:  [EMAIL PROTECTED]
W:  http://sourceforge.net/projects/bonding/
S:  Supported


 @@ -3569,20 +3552,20 @@ static int bond_close(struct net_device 
*/
  
   if (bond-params.miimon) {  /* link check interval, in milliseconds. */
 - del_timer_sync(bond-mii_timer);
 + cancel_rearming_delayed_workqueue(bond_wq, bond-mii_work);
   }
  
   if (bond-params.arp_interval) {  /* arp interval, in milliseconds. */
 - del_timer_sync(bond-arp_timer);
 + cancel_rearming_delayed_workqueue(bond_wq, bond-arp_work);
   }
  
   switch (bond-params.mode) {
   case BOND_MODE_8023AD:
 - del_timer_sync((BOND_AD_INFO(bond).ad_timer));
 + cancel_rearming_delayed_workqueue(bond_wq, 
 (BOND_AD_INFO(bond).ad_work));
   break;
   case BOND_MODE_TLB:
   case BOND_MODE_ALB:
 - del_timer_sync((BOND_ALB_INFO(bond).alb_timer));
 + cancel_rearming_delayed_workqueue(bond_wq, 
 (BOND_ALB_INFO(bond).alb_work));
   break;
   default:
   break;


This part will deadlock since it is not safe to cancel a workqueue
entry with RTNL mutex held. The cancel operation has to wait for the workqueue
to run, and the entry being run maybe stuck waiting for the RTNL.



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Andrew Morton
On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela [EMAIL PROTECTED] 
wrote:

 Hi,
 
   please, review and apply to mm tree for further testing. The patch 
 is also available at 
 ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .

Please cc netdev@vger.kernel.org on net-related patches, thanks.

   Thank you,
   Jaroslav
 
 ==
 bonding: replace system timer with work queue
 
 This patch replaces system timer with work queue in monitor functions.
 The reason for this change is that bonding handlers calls various
 sleeping functions from the timer handler which is not allowed.

Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.


 Because we cannot share the main workqueue threads (rtnl_lock is used
 also in linkwatch_event) - new bond workqueue thread is created.
 
 Signed-off-by: Jaroslav Kysela [EMAIL PROTECTED]
 
 diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
 linux-2.6.20/drivers/net/bonding/bond_3ad.c
 --- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c  2007-02-04 
 19:44:54.0 +0100
 +++ linux-2.6.20/drivers/net/bonding/bond_3ad.c   2007-02-28 
 09:19:43.831369202 +0100
 @@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
   * times out, and it selects an aggregator for the ports that are yet not
   * related to any aggregator, and selects the active aggregator for a bond.
   */
 -void bond_3ad_state_machine_handler(struct bonding *bond)
 +void bond_3ad_state_machine_handler(struct work_struct *work)
  {
 + struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
 ad_work.work);
 + struct bonding *bond = (struct bonding *)((char *)ad_info - 
 offsetof(struct bonding, ad_info));

We can use containers_of here too?

 -void bond_alb_monitor(struct bonding *bond)
 +void bond_alb_monitor(struct work_struct *work)
  {
 - struct alb_bond_info *bond_info = (BOND_ALB_INFO(bond));
 + struct alb_bond_info *bond_info = container_of(work, struct 
 alb_bond_info, alb_work.work);
 + struct bonding *bond = (struct bonding *)((char *)bond_info - 
 offsetof(struct bonding, alb_info));

And here.

 + cancel_rearming_delayed_workqueue(bond_wq, 
 (BOND_AD_INFO(bond).ad_work));
   break;
   case BOND_MODE_TLB:
   case BOND_MODE_ALB:
 - del_timer_sync((BOND_ALB_INFO(bond).alb_timer));
 + cancel_rearming_delayed_workqueue(bond_wq, 
 (BOND_ALB_INFO(bond).alb_work));
   break;
   default:
   break;
 @@ -4289,6 +4272,14 @@ static int bond_init(struct net_device *
   rwlock_init(bond-lock);
   rwlock_init(bond-curr_slave_lock);
  
 + /* initialize work */
 + INIT_DELAYED_WORK(bond-mii_work, (void *)bond_mii_monitor);
 + if (params-mode == BOND_MODE_ACTIVEBACKUP) {
 + INIT_DELAYED_WORK(bond-arp_work, (void 
 *)bond_activebackup_arp_mon);
 + } else {
 + INIT_DELAYED_WORK(bond-arp_work, (void 
 *)bond_loadbalance_arp_mon);
 + }

Can we lose the unneeded braces, the unneeded typecasts and fit the code
into 80 cols?

does all that

yup.

   bond-params = *params; /* copy params struct */
  
   /* Initialize pointers */
 @@ -4782,6 +4773,12 @@ static int __init bonding_init(void)
   goto err;
   }
  
 + bond_wq = create_singlethread_workqueue(bond);
 + if (bond_wq == NULL) {
 + res = -ENOMEM;
 + goto err;
 + }
 +
   res = bond_create_sysfs();
   if (res)
   goto err;
 @@ -4807,6 +4804,7 @@ static void __exit bonding_exit(void)
  
   rtnl_lock();
   bond_free_all();
 + destroy_workqueue(bond_wq);
   bond_destroy_sysfs();
   rtnl_unlock();

Are you sure that all pending delayed works have been cancelled when we
destroy this workqueue?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/