Re: [RFC] wrr (weighted round-robin) bonding

2006-10-20 Thread Dawid Ciezarkiewicz
On Thursday, 19 October 2006 21:04, Andy Gospodarek wrote:
 It would seem to me that extending an existing mode would be more
 desirable than adding yet another mode to worry about.  I don't even
 like the fact that there are as many as there are, but I understand why
 they are there.  

Ack. I will probably update wrr bonding patch to replace rr mode.

 I recently extended rr mode to allow an additional parameter called that
 rr_repeat that would allow someone to send more than a single frame out
 of each device before moving to the next one.  It seemed this could be
 helpful when dealing with switches that constantly re-learned source MAC
 addresses.  Network performance would suffer whenever rr_repeat was 1,
 but box performance might be better if there weren't so many locks
 taken.

Thanks. I'll consider adding such functionality.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-20 Thread Jay Vosburgh
Dawid Ciezarkiewicz [EMAIL PROTECTED] wrote:

On Thursday, 19 October 2006 21:04, Andy Gospodarek wrote:
 It would seem to me that extending an existing mode would be more
 desirable than adding yet another mode to worry about.  I don't even
 like the fact that there are as many as there are, but I understand why
 they are there.  

Ack. I will probably update wrr bonding patch to replace rr mode.

Also, if acceptance into the mainline is your ultimate goal, the
ioctl control interface will be a very difficult sell.  You'd want to
look into some other control mechanism, most likely an additional sysfs
entry.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-20 Thread Dawid Ciezarkiewicz
On Friday, 20 October 2006 21:53, Jay Vosburgh wrote:
 Dawid Ciezarkiewicz [EMAIL PROTECTED] wrote:
 
 On Thursday, 19 October 2006 21:04, Andy Gospodarek wrote:
  It would seem to me that extending an existing mode would be more
  desirable than adding yet another mode to worry about.  I don't even
  like the fact that there are as many as there are, but I understand why
  they are there.  
 
 Ack. I will probably update wrr bonding patch to replace rr mode.
 
   Also, if acceptance into the mainline is your ultimate goal, the
 ioctl control interface will be a very difficult sell.  You'd want to
 look into some other control mechanism, most likely an additional sysfs
 entry.

Oh. I'm quite puzzled here. What is current policy? I'd like sysfs interfaces 
better than ioctl - they are much cleaner etc. - but I thought ioctl will be 
better here because current bonding control uses ioctl and extending it is 
much simpler.

More generally: should I use sysfs always when adding anything to kernel and 
use it even when extending older functionality?

If wrr bonding have chances for inclusion into mainline I'll do any necessary 
changes to let it meet requirements. I'd like to know if functionality from 
Andy's patch is desired to be a part of such round-robin bonding extension. I 
think I could easily integrate it with wrr.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-20 Thread Andy Gospodarek
On Fri, Oct 20, 2006 at 10:52:00PM +0200, Dawid Ciezarkiewicz wrote:
 
 Oh. I'm quite puzzled here. What is current policy? I'd like sysfs interfaces 
 better than ioctl - they are much cleaner etc. - but I thought ioctl will be 
 better here because current bonding control uses ioctl and extending it is 
 much simpler.

I can't tell you for sure if there is a 'policy' about this, but I would
encourage you to focus energy on sysfs rather than the ioctl interface.
In the future I would guess more will begin to use sysfs rather than
ifenslave.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-19 Thread Andy Gospodarek
On Tue, Oct 17, 2006 at 10:16:21AM +0200, Dawid Ciezarkiewicz wrote:
 
 In fact - as default weight is being set to 1, without changing it wrr 
 bonding 
 mode works like plain round-robin one. But it have little more overhead 
 (recharging tokens), and code is a bit more complicated. I was not sure if 
 some tools could assume that in mode 0 all interfaces work with same weights 
 and because of that behave strange with this patch in use. 
 
 It was written as a solution for some problem, and I'm still not sure if such 
 change will always be patch to linux kernel or may some day go into mainline. 
 For compatibility I've decided to have those modes separated.
 
 Because of that I haven't replaced mode 0. If this patch will be considered 
 useful, and my concerns are not a problem - I'd like to replace 0 mode if 
 possible.
 -

It would seem to me that extending an existing mode would be more
desirable than adding yet another mode to worry about.  I don't even
like the fact that there are as many as there are, but I understand why
they are there.  

I recently extended rr mode to allow an additional parameter called that
rr_repeat that would allow someone to send more than a single frame out
of each device before moving to the next one.  It seemed this could be
helpful when dealing with switches that constantly re-learned source MAC
addresses.  Network performance would suffer whenever rr_repeat was 1,
but box performance might be better if there weren't so many locks
taken.

This patch is pretty bad (in-fact even the math could be done better to
avoid the expensive modulo), but since I did did it as a proof of
concept I wasn't too worried about it at the time.  The functionality
might be interesting to add to your weighted rr concept.  It's also
against an older kernel, but it should apply to an upstream one with
minimal, if any, porting.


--- linux/drivers/net/bonding/bond_main.c.orig  2006-10-11 10:41:07.611562000 
-0400
+++ linux/drivers/net/bonding/bond_main.c   2006-10-11 13:40:54.767425000 
-0400
@@ -543,6 +543,7 @@
 /* monitor all links that often (in milliseconds). =0 disables monitoring */
 #define BOND_LINK_MON_INTERV   0
 #define BOND_LINK_ARP_INTERV   0
+#define BOND_RR_REPEAT 1
 
 static int max_bonds   = BOND_DEFAULT_MAX_BONDS;
 static int miimon  = BOND_LINK_MON_INTERV;
@@ -555,6 +556,8 @@ static char *lacp_rate  = NULL;
 static char *xmit_hash_policy = NULL;
 static int arp_interval = BOND_LINK_ARP_INTERV;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, };
+static int rr_repeat   = BOND_RR_REPEAT;
+static int rr_repeat_count = 0;
 
 MODULE_PARM(max_bonds, i);
 MODULE_PARM_DESC(max_bonds, Max number of bonded devices);
@@ -578,6 +581,8 @@ MODULE_PARM(arp_interval, i);
 MODULE_PARM_DESC(arp_interval, arp interval in milliseconds);
 MODULE_PARM(arp_ip_target, 1- __MODULE_STRING(BOND_MAX_ARP_TARGETS) s);
 MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form);
+MODULE_PARM(rr_repeat, i);
+MODULE_PARM_DESC(rr_repeat, number of frames to send on round-robin bonds 
before switching interfaces);
 
 /*- Global variables */
 
@@ -4390,21 +4395,27 @@ static int bond_xmit_roundrobin(struct s
goto out;
}
 
-   bond_for_each_slave_from(bond, slave, i, start_at) {
-   if (IS_UP(slave-dev) 
-   (slave-link == BOND_LINK_UP) 
-   (slave-state == BOND_STATE_ACTIVE)) {
-   res = bond_dev_queue_xmit(bond, skb, slave-dev);
+   /* just xmit if we haven't hit the repeat val */
+   if (!(++rr_repeat_count % rr_repeat)) {
 
-   write_lock(bond-curr_slave_lock);
-   bond-curr_active_slave = slave-next;
-   write_unlock(bond-curr_slave_lock);
+   rr_repeat_count = 0;
+   bond_for_each_slave_from(bond, slave, i, start_at) {
+   if (IS_UP(slave-dev) 
+   (slave-link == BOND_LINK_UP) 
+   (slave-state == BOND_STATE_ACTIVE)) {
+   res = bond_dev_queue_xmit(bond, skb, 
slave-dev);
 
-   break;
+   write_lock(bond-curr_slave_lock);
+   bond-curr_active_slave = slave-next;
+   write_unlock(bond-curr_slave_lock);
+
+   break;
+   }
}
+   } else {
+   res = bond_dev_queue_xmit(bond, skb, slave-dev);
}
 
-
 out:
if (res) {
/* no suitable interface, frame not sent */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-17 Thread Dawid Ciezarkiewicz
On Monday, 16 October 2006 23:30, Andy Gospodarek wrote:
 On Mon, Oct 16, 2006 at 09:07:57PM +0200, Dawid Ciezarkiewicz wrote:
   
 Before getting into the technical bits of the patch, what's the
   reason for wanting to do this, and why is this rather complex manual
   weight assignment better than an automatic system based on, e.g., link
   speed of the slaves?
  
  In short:
  It was designed as a solution for wireless links bonding - where link 
quality 
  can change rather quickly in time. By using wrr bonding, userspace tools 
can 
  measure current bandwidth and change bonding slave weights in realtime.
 
 Since this is so similar to mode 0, it would seem there would be a way
 to extend it rather than creating yet another mode that is so similar.
 What would be the reason not to enhance that mode?

In fact - as default weight is being set to 1, without changing it wrr bonding 
mode works like plain round-robin one. But it have little more overhead 
(recharging tokens), and code is a bit more complicated. I was not sure if 
some tools could assume that in mode 0 all interfaces work with same weights 
and because of that behave strange with this patch in use. 

It was written as a solution for some problem, and I'm still not sure if such 
change will always be patch to linux kernel or may some day go into mainline. 
For compatibility I've decided to have those modes separated.

Because of that I haven't replaced mode 0. If this patch will be considered 
useful, and my concerns are not a problem - I'd like to replace 0 mode if 
possible.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] wrr (weighted round-robin) bonding

2006-10-16 Thread Dawid Ciezarkiewicz
This patch is little thinner then the previous one.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-16 Thread Dawid Ciezarkiewicz
On Monday, 16 October 2006 20:21, Dawid Ciezarkiewicz wrote:
 This patch is little thinner then the previous one.

I'm sorry for that. I've just ... nevermind. Here goes the patch.

Should I post patch for ifenslave here, too?



diff -Nur linux-2.6.17.orig/Documentation/networking/bonding.txt 
linux-2.6.17/Documentation/networking/bonding.txt
--- linux-2.6.17.orig/Documentation/networking/bonding.txt  2006-06-18 
03:49:35.0 +0200
+++ linux-2.6.17/Documentation/networking/bonding.txt   2006-07-28 
15:47:55.0 +0200
@@ -398,6 +398,19 @@
swapped with the new curr_active_slave that was
chosen.
 
+   weighted-rr or 7
+
+   Weighted round-robin bonding. In this mode bonding
+   interface will use weights assigned to it's slaves.
+
+   Each slave can have weight assigned via ioctl (ifenslave).
+   These values will be used at the start of each cycle.
+   Each slave will have token counter restored to it's weight.
+   Then using round-robin mechanism those tokens are used
+   to pay for emitted frames. When all token counters are
+   zeroed - new cycle begins.
+   
+
 primary
 
A string (eth0, eth2, etc) specifying which slave is the
diff -Nur linux-2.6.17.orig/drivers/net/bonding/bond_main.c 
linux-2.6.17/drivers/net/bonding/bond_main.c
--- linux-2.6.17.orig/drivers/net/bonding/bond_main.c   2006-06-18 
03:49:35.0 +0200
+++ linux-2.6.17/drivers/net/bonding/bond_main.c2006-07-28 
15:31:44.0 +0200
@@ -115,7 +115,7 @@
 MODULE_PARM_DESC(mode, Mode of operation : 0 for balance-rr, 
   1 for active-backup, 2 for balance-xor, 
   3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 
-  6 for balance-alb);
+  6 for balance-alb, 7 for weighted-rr);
 module_param(primary, charp, 0);
 MODULE_PARM_DESC(primary, Primary network device to use);
 module_param(lacp_rate, charp, 0);
@@ -162,6 +162,7 @@
 {  802.3ad,  BOND_MODE_8023AD},
 {  balance-tlb,  BOND_MODE_TLB},
 {  balance-alb,  BOND_MODE_ALB},
+{  weighted-rr,  BOND_MODE_WEIGHTED_RR},
 {  NULL,   -1},
 };
 
@@ -194,6 +195,8 @@
return transmit load balancing;
case BOND_MODE_ALB:
return adaptive load balancing;
+   case BOND_MODE_WEIGHTED_RR:
+   return weighted round robin (weighted-rr);
default:
return unknown;
}
@@ -1198,6 +1201,24 @@
return 0;
 }
 
+int bond_set_weight(struct net_device *bond_dev, struct net_device *slave_dev,
+   u16 weight)
+{
+   struct slave* slave;
+   slave = bond_get_slave_by_dev(bond_dev-priv, slave_dev);
+   if (!slave) {
+   return -EINVAL;
+   }
+
+   slave-weight = weight;
+
+   if (weight) {
+   slave-link = BOND_LINK_UP;
+   slave-state = BOND_STATE_ACTIVE;
+   }
+   return 0;
+}
+
 #define BOND_INTERSECT_FEATURES \
(NETIF_F_SG|NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM|\
NETIF_F_TSO|NETIF_F_UFO)
@@ -1336,6 +1352,9 @@
 */
new_slave-original_flags = slave_dev-flags;
 
+   /* slave default weight = 1 */
+   new_slave-weight = 1;
+
/*
 * Save slave's original (permanent) mac address for modes
 * that need it, and for restoring it upon release, and then
@@ -3601,7 +3620,10 @@
}
 
down_write((bonding_rwsem));
-   slave_dev = dev_get_by_name(ifr-ifr_slave);
+   if (cmd != SIOCBONDSETWEIGHT)
+   slave_dev = dev_get_by_name(ifr-ifr_slave);
+   else
+   slave_dev = dev_get_by_name(ifr-ifr_weight_slave);
 
dprintk(slave_dev=%p: \n, slave_dev);
 
@@ -3626,6 +3648,9 @@
case SIOCBONDCHANGEACTIVE:
res = bond_ioctl_change_active(bond_dev, slave_dev);
break;
+   case SIOCBONDSETWEIGHT:
+   res = bond_set_weight(bond_dev, slave_dev, 
ifr-ifr_weight_weight);
+   break;
default:
res = -EOPNOTSUPP;
}
@@ -3881,6 +3906,67 @@
return 0;
 }
 
+static int bond_xmit_weighted_rr(struct sk_buff *skb, struct net_device 
*bond_dev)
+{
+   struct bonding *bond = bond_dev-priv;
+   struct slave *slave, *start_at;
+   int i;
+   int res = 1;
+   int were_weight_tokens_recharged = 0;
+
+   read_lock(bond-lock);
+
+   if (!BOND_IS_OK(bond)) {
+   goto out;
+   }
+
+   read_lock(bond-curr_slave_lock);
+   slave = start_at = bond-curr_active_slave;
+   read_unlock(bond-curr_slave_lock);
+
+   if (!slave) {
+   goto out;
+   }
+
+try_send:
+   bond_for_each_slave_from(bond, slave, i, 

Re: [RFC] wrr (weighted round-robin) bonding

2006-10-16 Thread Jay Vosburgh

Dawid Ciezarkiewicz [EMAIL PROTECTED] wrote:
[...]
+  weighted-rr or 7
+
+  Weighted round-robin bonding. In this mode bonding
+  interface will use weights assigned to it's slaves.
+
+  Each slave can have weight assigned via ioctl (ifenslave).
+  These values will be used at the start of each cycle.
+  Each slave will have token counter restored to it's weight.
+  Then using round-robin mechanism those tokens are used
+  to pay for emitted frames. When all token counters are
+  zeroed - new cycle begins.

Before getting into the technical bits of the patch, what's the
reason for wanting to do this, and why is this rather complex manual
weight assignment better than an automatic system based on, e.g., link
speed of the slaves?

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-16 Thread Dawid Ciezarkiewicz
On Monday, 16 October 2006 20:50, you wrote:
 
 Dawid Ciezarkiewicz [EMAIL PROTECTED] wrote:
 [...]
 +weighted-rr or 7
 +
 +Weighted round-robin bonding. In this mode bonding
 +interface will use weights assigned to it's slaves.
 +
 +Each slave can have weight assigned via ioctl (ifenslave).
 +These values will be used at the start of each cycle.
 +Each slave will have token counter restored to it's weight.
 +Then using round-robin mechanism those tokens are used
 +to pay for emitted frames. When all token counters are
 +zeroed - new cycle begins.
 
   Before getting into the technical bits of the patch, what's the
 reason for wanting to do this, and why is this rather complex manual
 weight assignment better than an automatic system based on, e.g., link
 speed of the slaves?

In short:
It was designed as a solution for wireless links bonding - where link quality 
can change rather quickly in time. By using wrr bonding, userspace tools can 
measure current bandwidth and change bonding slave weights in realtime.

It was written for Lintrack, and you can read about it's usage here:
http://lintrack.org/index.php/about/advantage
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] wrr (weighted round-robin) bonding

2006-10-16 Thread Andy Gospodarek
On Mon, Oct 16, 2006 at 09:07:57PM +0200, Dawid Ciezarkiewicz wrote:
  
  Before getting into the technical bits of the patch, what's the
  reason for wanting to do this, and why is this rather complex manual
  weight assignment better than an automatic system based on, e.g., link
  speed of the slaves?
 
 In short:
 It was designed as a solution for wireless links bonding - where link quality 
 can change rather quickly in time. By using wrr bonding, userspace tools can 
 measure current bandwidth and change bonding slave weights in realtime.

Since this is so similar to mode 0, it would seem there would be a way
to extend it rather than creating yet another mode that is so similar.
What would be the reason not to enhance that mode?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html