Re: [PATCH 3/3] infiniband: rxe: fix 32-bit build warnings
thanks. will be applied to next series. On Mon, Jun 13, 2016 at 3:54 PM, Arnd Bergmannwrote: > The new rxe infinband driver passes around pointers that have been > converted to 64-bit integers. This is valid, but causes compile-time > warnings on all 32-bit architectures: > > infiniband/hw/rxe/rxe_dma.c: In function 'rxe_dma_map_single': > infiniband/hw/rxe/rxe_dma.c:49:9: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > return (u64)cpu_addr; > ^ > infiniband/hw/rxe/rxe_dma.c: In function 'rxe_dma_map_page': > infiniband/hw/rxe/rxe_dma.c:73:9: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > infiniband/hw/rxe/rxe_dma.c: In function 'rxe_map_sg': > infiniband/hw/rxe/rxe_dma.c:99:10: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > infiniband/hw/rxe/rxe_dma.c: In function 'rxe_dma_alloc_coherent': > infiniband/hw/rxe/rxe_dma.c:143:17: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > > This changes the cast to use 'uintptr_t', which can always be > cast to and from pointer, and can be assigned to and from 64-bit > integers. > > Signed-off-by: Arnd Bergmann > --- > drivers/infiniband/hw/rxe/rxe_dma.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/rxe/rxe_dma.c > b/drivers/infiniband/hw/rxe/rxe_dma.c > index f080bc5bda43..7634c1a81b2b 100644 > --- a/drivers/infiniband/hw/rxe/rxe_dma.c > +++ b/drivers/infiniband/hw/rxe/rxe_dma.c > @@ -46,7 +46,7 @@ static u64 rxe_dma_map_single(struct ib_device *dev, > enum dma_data_direction direction) > { > WARN_ON(!valid_dma_direction(direction)); > - return (u64)cpu_addr; > + return (uintptr_t)cpu_addr; > } > > static void rxe_dma_unmap_single(struct ib_device *dev, > @@ -70,7 +70,7 @@ static u64 rxe_dma_map_page(struct ib_device *dev, > goto done; > } > > - addr = (u64)page_address(page); > + addr = (uintptr_t)page_address(page); > if (addr) > addr += offset; > > @@ -96,7 +96,7 @@ static int rxe_map_sg(struct ib_device *dev, struct > scatterlist *sgl, > WARN_ON(!valid_dma_direction(direction)); > > for_each_sg(sgl, sg, nents, i) { > - addr = (u64)page_address(sg_page(sg)); > + addr = (uintptr_t)page_address(sg_page(sg)); > if (!addr) { > ret = 0; > break; > @@ -140,7 +140,7 @@ static void *rxe_dma_alloc_coherent(struct ib_device > *dev, size_t size, > addr = page_address(p); > > if (dma_handle) > - *dma_handle = (u64)addr; > + *dma_handle = (uintptr_t)addr; > > return addr; > } > -- > 2.7.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] infiniband: rxe: avoid 64-bit division
thanks. will be applied to next series On Mon, Jun 13, 2016 at 4:18 PM, Leon Romanovskywrote: > On Mon, Jun 13, 2016 at 02:54:53PM +0200, Arnd Bergmann wrote: >> The rxe driver fails to build on 32-bit because of a 64-bit division: >> >> In function `rxe_qp_from_attr': >> :(.text+0x53158): undefined reference to `__aeabi_uldivmod' >> >> We can easily avoid this division by converting the nanosecond value >> into jiffies directly rather than converting to microseconds first. >> >> Signed-off-by: Arnd Bergmann > > Thanks Arnd, > All three patches are applied on topic/rxe now.
Re: [PATCH 2/3] infiniband: rxe: add UDP_TUNNEL dependency
thanks. will be applied to next series On Mon, Jun 13, 2016 at 3:54 PM, Arnd Bergmannwrote: > The newly added rxe driver links against the UDP tunneling code, > which causes build errors when CONFIG_UDP_TUNNEL is disabled: > > ERROR: "setup_udp_tunnel_sock" [drivers/infiniband/hw/rxe/ib_rxe.ko] > undefined! > ERROR: "udp_tunnel_sock_release" [drivers/infiniband/hw/rxe/ib_rxe.ko] > undefined! > ERROR: "udp_sock_create4" [drivers/infiniband/hw/rxe/ib_rxe.ko] undefined! > > This adds a Kconfig dependency to prevent the invalid configuration. > > Signed-off-by: Arnd Bergmann > --- > drivers/infiniband/hw/rxe/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/infiniband/hw/rxe/Kconfig > b/drivers/infiniband/hw/rxe/Kconfig > index 649b7be11eb8..a199d0df31d0 100644 > --- a/drivers/infiniband/hw/rxe/Kconfig > +++ b/drivers/infiniband/hw/rxe/Kconfig > @@ -1,6 +1,7 @@ > config INFINIBAND_RXE > tristate "Software RDMA over Ethernet (RoCE) driver" > depends on INET && PCI && INFINIBAND > + depends on NET_UDP_TUNNEL > ---help--- > This driver implements the InfiniBand RDMA transport over > the Linux network stack. It enables a system with a > -- > 2.7.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Possible bug in bonding
Hi, As reported last week by Or Gerlitz and confirmed by me, there is a kernel crash when trying to unenslave all ib slaves (which leads to bonding master destruction). I also found that it happens with Ethernet slaves if following the steps: 1. unenslaving all slaves via sysfs 2. deleting the bonding master via sysfs I see that in commit 6603a6f25e4bca922a7dfbf0bf03072d98850176 the order of the bonding master destruction sequence was changed in 2 places (a part of the commitdiff is below). Although I might be missing something it looks like a bug to me to dereference the dev pointer after unregistering it. Am I right? I also see that the kernel crash doesn't happen when reverting from this (part of) the patch. I'd like to have your opinion please. thanks MoniS - commit 6603a6f25e4bca922a7dfbf0bf03072d98850176 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:50 2007 -0700 bonding: Convert more locks to _bh, acquire rtnl, for new locking Convert more lock acquisitions to _bh flavor to avoid deadlock with workqueue activity and add acquisition of RTNL in appropriate places. Affects ALB mode, as well as core bonding functions and sysfs. Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] Signed-off-by: Jeff Garzik [EMAIL PROTECTED] --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1846,9 +1846,9 @@ int bond_release(struct net_device *bond */ void bond_destroy(struct bonding *bond) { + unregister_netdevice(bond-dev); bond_deinit(bond-dev); bond_destroy_sysfs_entry(bond); - unregister_netdevice(bond-dev); } . . . @@ -4473,8 +4473,8 @@ static void bond_free_all(void) bond_mc_list_destroy(bond); /* Release the bonded slaves */ bond_release_all(bond_dev); - bond_deinit(bond_dev); unregister_netdevice(bond_dev); + bond_deinit(bond_dev); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bonding: fix rtnl locking merge error
Jay Vosburgh wrote: Looks like I incorrectly merged one of the rtnl lock changes, so that one function, bonding_show_active_slave, held rtnl but didn't release it, and another, bonding_store_active_slave, never held rtnl but did release it. Fixed so the first function doesn't mess with rtnl, and the second correctly acquires and releases rtnl. Bug reported by Moni Shoua [EMAIL PROTECTED] I ran some shallow tests and it seems that the patch fixes the problem. thanks - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonding / 2.6.24-rc1 issues
Or Gerlitz wrote: Jay, Moni I did some tests with 2.6.24-rc1 and the first patch to bonding that Jay sent last night to netdev. Basic operation and fail over work fine. However, I see some crashes which are somehow related to destroying the bond when the slaves are ipoib ones, I don't see similar crashes when enslaving ethernet devices (Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)), my compressed dot config is attached. The first type of oops is when I just do modprobe -r bonding after enslavement of the ipoib devices: the second type of oops is when I modprobe -r ib_ipoib after enslavement. I was not able to test this one with ethernet as the tg3 code is built into my kernel I couldn't reproduce the first oops. However, after applying Jay's fixes for 2.6.24-rc1 I managed to reproduce the second oops. I will try to look into it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problems with ib-bonding of 2.6.24-rc1
Same thing happens with Ethernet slave - Intel 82546GB Gigabit Ethernet Controller (rev 03) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problems with ib-bonding of 2.6.24-rc1
Moni Shoua wrote: Basically, what I see is that after a while commands like ifconfig or ip stucks. I only use sysfs to configure bonding (which also stucks after a while). Maybe some extra information might help This is what happens in the kernel while the command 'ip' is stucked in user level. bond0 S 8057eb80 0 4799 2 8100060e7ee0 0046 0286 810004259820 81000160d760 810004259a28 00020019 80244f94 Call Trace: [80244f94] queue_delayed_work_on+0xaf/0xbe [880597c7] :bonding:bond_mii_monitor+0x0/0x9a [80244fe8] worker_thread+0x0/0xe4 [80245089] worker_thread+0xa1/0xe4 [8024833b] autoremove_wake_function+0x0/0x2e [80248213] kthread+0x47/0x74 [8020c458] child_rip+0xa/0x12 [802481cc] kthread+0x0/0x74 [8020c44e] child_rip+0x0/0x12 ipD 8057eb80 0 4842 4841 81000615bb88 0082 80289b5c 810006123140 8100016a77e0 810006123348 0003045c45a8 Call Trace: [80289b5c] __link_path_walk+0xb97/0xcec [80563a64] __mutex_lock_slowpath+0x68/0xa3 [805638f6] mutex_lock+0x1a/0x1e [804ecffb] rtnetlink_rcv+0x9/0x1e [804f30e6] netlink_unicast+0x1c3/0x236 [804f3a69] netlink_sendmsg+0x296/0x2a9 [80266235] zone_statistics+0x3f/0x60 [804d9806] sock_sendmsg+0xcb/0xe3 [8024833b] autoremove_wake_function+0x0/0x2e [80267960] __do_fault+0x35e/0x398 [804d8d09] move_addr_to_kernel+0x25/0x36 [804d9bd0] sys_sendto+0x128/0x151 [804da686] sys_getsockname+0x72/0xa2 [8020b63e] system_call+0x7e/0x83 I also attach my .config # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc1 # Tue Oct 30 09:49:25 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set CONFIG_BLOCK_COMPAT=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set CONFIG_MCORE2=y # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC
Re: problems with ib-bonding of 2.6.24-rc1
Jay Vosburgh wrote: Moni Shoua [EMAIL PROTECTED] wrote: Basically, what I see is that after a while commands like ifconfig or ip stucks. I only use sysfs to configure bonding (which also stucks after a while). I've fooled with setting various things in bonding in the current linux-2.6 git kernel, and I'm not seeing the failure you describe. Can you provide some step by step instructions, including the type of system, bonding mode, options, number and type of slaves, etc, to induce the failure? -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] Machine --- ARCH: x86_64 OS: Redhat EL Server 5 bonding --- mod. options: none mode: 1 (active-backup) miimon: 100 slaves: IP over InfiniBand Below is a scenario that ends up with what I described [EMAIL PROTECTED] root]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:04:23:B3:25:C4 inet addr:172.30.3.234 Bcast:172.30.255.255 Mask:255.255.0.0 inet6 addr: fe80::204:23ff:feb3:25c4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:333219 errors:0 dropped:0 overruns:0 frame:0 TX packets:130880 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:248023834 (236.5 MiB) TX bytes:35230500 (33.5 MiB) Base address:0xdc00 Memory:fcea-fcec eth1 Link encap:Ethernet HWaddr 00:04:23:B3:25:C5 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Base address:0xdc80 Memory:fcee-fcf0 ib0 Link encap:InfiniBand HWaddr 00:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) ib1 Link encap:InfiniBand HWaddr 00:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:44 errors:0 dropped:0 overruns:0 frame:0 TX packets:44 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7145 (6.9 KiB) TX bytes:7145 (6.9 KiB) sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) [EMAIL PROTECTED] root]# modprobe bonding [EMAIL PROTECTED] root]# echo 1 /sys/class/net/bond0/bonding/mode [EMAIL PROTECTED] root]# echo 100 /sys/class/net/bond0/bonding/miimon [EMAIL PROTECTED] root]# echo +ib0 /sys/class/net/bond0/bonding/slaves [EMAIL PROTECTED] root]# echo +ib1 /sys/class/net/bond0/bonding/slaves [EMAIL PROTECTED] root]# ifconfig bond0 192.168.3.234 [EMAIL PROTECTED] root]# ip a s 1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:04:23:b3:25:c4 brd ff:ff:ff:ff:ff:ff inet 172.30.3.234/16 brd 172.30.255.255 scope global eth0 inet6 fe80::204:23ff:feb3:25c4/64 scope link valid_lft forever preferred_lft forever 3: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000 link/ether 00:04:23:b3:25:c5 brd ff:ff:ff:ff:ff:ff 4: sit0: NOARP mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 5: ib0: BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP mtu 1500 qdisc pfifo_fast master bond0 qlen 128 link/infiniband 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:e8:a9 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff 6: ib1: BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP mtu 1500 qdisc pfifo_fast master bond0 qlen 128 link/infiniband 00:00:04:05:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:e8:aa brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff 8: bond0: BROADCAST,MULTICAST,MASTER,UP,LOWER_UP mtu 1500 qdisc noqueue link/infiniband 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:e8:a9 brd 00:ff:ff:ff:ff:12:40
problems with ib-bonding of 2.6.24-rc1
Hi, I've been doing some tests for bonding of 2.6.24-rc1 and noticed some problems. My first goal was to see how bonding works with IPoIB slaves but I also tried it with Ethernet. Basically, what I see is that after a while commands like ifconfig or ip stucks. I only use sysfs to configure bonding (which also stucks after a while). After stripping the list of commits below from the code I see no problems. Does anybody else have the same problem? thanks MoniS commit d0e81b7e2246a41d068ecaf15aac9de570816d63 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:51 2007 -0700 bonding: Acquire correct locks in alb for promisc change -- commit 6603a6f25e4bca922a7dfbf0bf03072d98850176 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:50 2007 -0700 bonding: Convert more locks to _bh, acquire rtnl, for new locking -- commit 059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:49 2007 -0700 bonding: Convert locks to _bh, rework alb locking for new locking -- commit 0b0eef66419e9abe6fd62bc958ab7cd0a18f858e Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:48 2007 -0700 bonding: Convert miimon to new locking -- commit cf5f9044934658dd3ffc628a60cd37c70f8168b1 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:47 2007 -0700 bonding: Convert balance-rr transmit to new locking -- commit 1b76b31693d4a6088dec104ff6a6ead54081a3c2 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:45 2007 -0700 Convert bonding timers to workqueues -- commit 3a4fa0a25da81600ea0bcd75692ae8ca6050d165 Author: Robert P. J. Day [EMAIL PROTECTED] Date: Fri Oct 19 23:10:43 2007 +0200 Fix misspellings of system, controller, interrupt and necessary. -- commit 1c3f0b8e07de78a86f2dce911f5e245845ce40a8 Author: Mathieu Desnoyers [EMAIL PROTECTED] Date: Thu Oct 18 23:41:04 2007 -0700 Change struct marker users - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH linux-2.6] bonding: two small fixes for IPoIB support
Jay Vosburgh wrote: Two small fixes to IPoIB support for bonding: 1- copy header_ops from slave to bonding for IPoIB slaves 2- move release and destroy logic to UNREGISTER from GOING_DOWN notifier to avoid double release Set bonding to version 3.2.1. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 11 +-- drivers/net/bonding/bonding.h |4 ++-- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index db80f24..6f85cc3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1263,6 +1263,7 @@ static void bond_setup_by_slave(struct net_device *bond_dev, struct bonding *bond = bond_dev-priv; bond_dev-neigh_setup = slave_dev-neigh_setup; + bond_dev-header_ops= slave_dev-header_ops; bond_dev-type = slave_dev-type; bond_dev-hard_header_len = slave_dev-hard_header_len; @@ -3351,7 +3352,10 @@ static int bond_slave_netdev_event(unsigned long event, struct net_device *slave switch (event) { case NETDEV_UNREGISTER: if (bond_dev) { - bond_release(bond_dev, slave_dev); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); } break; case NETDEV_CHANGE: @@ -3366,11 +3370,6 @@ static int bond_slave_netdev_event(unsigned long event, struct net_device *slave * ... Or is it this? */ break; - case NETDEV_GOING_DOWN: - dprintk(slave %s is going down\n, slave_dev-name); - if (bond-setup_by_slave) - bond_release_and_destroy(bond_dev, slave_dev); - break; case NETDEV_CHANGEMTU: /* * TODO: Should slaves be allowed to diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index a8bbd56..b818060 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -22,8 +22,8 @@ #include bond_3ad.h #include bond_alb.h -#define DRV_VERSION 3.2.0 -#define DRV_RELDATE September 13, 2007 +#define DRV_VERSION 3.2.1 +#define DRV_RELDATE October 15, 2007 #define DRV_NAME bonding #define DRV_DESCRIPTION Ethernet Channel Bonding Driver Jay, Thanks for this work. Jeff, Thanks for applying. I noticed that this patch isn't applied. It includes important fixes. Can you please apply it also? MoniS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V7 0/8] net/bonding: ADD IPoIB support for the bonding driver
This is the 7th version of this patch series. See link to V6 below. Changes from the previous version - * Some patches required modifications to remove offsets so they can be applied with git-apply * Patch #3 was first modified by Jay and later by me to make it work with header_ops * patch #8 was changed to fix the problem that caused 'ifconfig down' to stuck (dev_close was called twice) Jay, I removed the Acked-by lines from patches 3 8. Can you please add them back after you approve? thanks MoniS Link to V6: http://lists.openfabrics.org/pipermail/general/2007-September/041139.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V7 1/8] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com Acked-by: Roland Dreier [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 +++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 6545fa7..1b3327a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -349,6 +349,7 @@ #endif struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -365,7 +366,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index e072f3c..cae026c 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -517,7 +517,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++dev-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -817,6 +817,13 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + neigh = *to_ipoib_neigh(n); + if (neigh) { + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -824,13 +831,10 @@ static void ipoib_neigh_cleanup(struct n spin_lock_irqsave(priv-lock, flags); - neigh = *to_ipoib_neigh(n); - if (neigh) { - if (neigh-ah) - ah = neigh-ah; - list_del(neigh-list); - ipoib_neigh_free(n-dev, neigh); - } + if (neigh-ah) + ah = neigh-ah; + list_del(neigh-list); + ipoib_neigh_free(n-dev, neigh); spin_unlock_irqrestore(priv-lock, flags); @@ -838,7 +842,8 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -847,6 +852,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 827820e..9bcfc7a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -705,7 +705,8 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour
Re: [ofa-general] [PATCH V7 0/8] net/bonding: ADD IPoIB support for the bonding driver
Moni Shoua wrote: This is the 7th version of this patch series. See link to V6 below. I forgot to mention that the patches are relative to jgarzik/netdev-2.6.git#master. I couldn't compile the 2.6.24 or the upstream branches so I used master branch to test the fixes. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V7 2/8] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com Acked-by: Roland Dreier [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index cae026c..362610d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -692,9 +692,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V7 5/8] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |5 +++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 32dc75e..d7e43ba 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1281,8 +1281,9 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, + bond_dev-name); } /* already enslaved */ diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 6f49ca7..ca4e429 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V7 6/8] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua monis at voltaire.com Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |3 ++- drivers/net/bonding/bond_sysfs.c | 10 -- drivers/net/bonding/bonding.h|1 + 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index d7e43ba..3f082dc 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1225,7 +1225,8 @@ static int bond_compute_features(struct struct slave *slave; struct net_device *bond_dev = bond-dev; unsigned long features = bond_dev-features; - unsigned short max_hard_header_len = ETH_HLEN; + unsigned short max_hard_header_len = max((u16)ETH_HLEN, + bond_dev-hard_header_len); int i; features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES); diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index ca4e429..583c568 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu = original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { @@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index 5011ba9..ad9c632 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -156,6 +156,7 @@ struct slave { s8 link;/* one of BOND_LINK_ */ s8 state; /* one of BOND_STATE_ */ u32original_flags; + u32original_mtu; u32link_failure_count; u16speed; u8 duplex; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH V6 7/8] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua monis at voltaire.com Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 24 +--- drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3f082dc..c017042 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1103,8 +1103,14 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) { + dprintk(delaying gratuitous arp on %s\n, + bond-curr_active_slave-dev-name); + bond-send_grat_arp = 1; + } else + bond_send_gratuitous_arp(bond); } } @@ -2074,6 +2080,17 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n); + else { + dprintk(sending delayed gratuitous arp on on %s\n, + bond-curr_active_slave-dev-name); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp = 0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2475,7 +2492,7 @@ static void bond_send_gratuitous_arp(str if (bond-master_ip) { bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4281,6 +4298,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index ad9c632..e0e06a8 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -187,6 +187,7 @@ struct bonding { struct timer_list arp_timer; s8 kill_timers; s8 do_set_mac_addr; + s8 send_grat_arp; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V7 8/8] net/bonding: Destroy bonding master when last slave is gone
When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 37 - drivers/net/bonding/bond_sysfs.c |9 + drivers/net/bonding/bonding.h|3 +++ 3 files changed, 44 insertions(+), 5 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c017042..23edf18 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1257,6 +1257,7 @@ static int bond_compute_features(struct static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + struct bonding *bond = bond_dev-priv; bond_dev-neigh_setup = slave_dev-neigh_setup; bond_dev-type = slave_dev-type; @@ -1266,6 +1267,7 @@ static void bond_setup_by_slave(struct n memcpy(bond_dev-broadcast, slave_dev-broadcast, slave_dev-addr_len); + bond-setup_by_slave = 1; } /* enslave device slave to bond device master */ @@ -1829,6 +1831,35 @@ int bond_release(struct net_device *bond } /* +* Destroy a bonding device. +* Must be under rtnl_lock when this function is called. +*/ +void bond_destroy(struct bonding *bond) +{ + bond_deinit(bond-dev); + bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond-dev); +} + +/* +* First release a slave and than destroy the bond if no more slaves iare left. +* Must be under rtnl_lock when this function is called. +*/ +int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev) +{ + struct bonding *bond = bond_dev-priv; + int ret; + + ret = bond_release(bond_dev, slave_dev); + if ((ret == 0) (bond-slave_cnt == 0)) { + printk(KERN_INFO DRV_NAME : %s: destroying bond %s.\n, + bond_dev-name, bond_dev-name); + bond_destroy(bond); + } + return ret; +} + +/* * This function releases all slaves. */ static int bond_release_all(struct net_device *bond_dev) @@ -3311,7 +3342,10 @@ static int bond_slave_netdev_event(unsig switch (event) { case NETDEV_UNREGISTER: if (bond_dev) { - bond_release(bond_dev, slave_dev); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); } break; case NETDEV_CHANGE: @@ -4299,6 +4333,7 @@ static int bond_init(struct net_device * bond-primary_slave = NULL; bond-dev = bond_dev; bond-send_grat_arp = 0; + bond-setup_by_slave = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 583c568..b5d2a13 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc printk(KERN_INFO DRV_NAME : %s is being deleted...\n, bond-dev-name); - bond_deinit(bond-dev); - bond_destroy_sysfs_entry(bond); - unregister_netdevice(bond-dev); + bond_destroy(bond); rtnl_unlock(); goto out; } @@ -363,7 +361,10 @@ static ssize_t bonding_store_slaves(stru printk(KERN_INFO DRV_NAME : %s: Removing slave %s\n, bond-dev-name, dev-name); rtnl_lock(); - res = bond_release(bond-dev, dev); + if (bond-setup_by_slave) + res = bond_release_and_destroy(bond-dev, dev); + else + res = bond_release(bond-dev, dev); rtnl_unlock(); if (res) { ret = res; diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index e0e06a8..85e579b 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -188,6 +188,7 @@ struct bonding { s8 kill_timers; s8 do_set_mac_addr; s8 send_grat_arp; + s8 setup_by_slave
Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
Roland Dreier wrote: It happens only when ib interfaces are slaves of a bonding device. I thought before that the stuck is in napi_disable() but it's almost right. I put prints before and after call to napi_disable and see that it is called twice. I'll try to investigate in this direction. ib0: stopping interface ib0: before napi_disable ib0: after napi_disable ib0: downing ib_dev ib0: All sends and receives done. ib0: stopping interface ib0: before napi_disable Yes, two napi_disable()s in a row without a matching napi_enable() will deadlock. I guess the question is why the ipoib interface is being stopped twice. If you just take the net-2.6.24 tree (without bonding patches), does bonding for ethernet interfaces work OK, or is there a similar problem with double napi_disable()? How about bonding of ethernet after this batch of bonding patches? - R. Ok, I think I know what happens here. When bonding gets an NETDEV_GOING_DONW event it releases the slave and by the way closes the slave device (this is a new code). ifconfig on the other hand closes the deivice one more time and this is why we see 2 napi_disable() in a row. The fix in my opinion is in bonding - it should react to NETDEV_UNREGISTER and not to NETDEV_GOING_DONW. I want to test this point and if it's good I'll submit new patches. thanks MoniS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
I will be near my lab only tomorrow... I will check this and let you know. On 10/11/07, Roland Dreier [EMAIL PROTECTED] wrote: It happens only when ib interfaces are slaves of a bonding device. I thought before that the stuck is in napi_disable() but it's almost right. I put prints before and after call to napi_disable and see that it is called twice. I'll try to investigate in this direction. ib0: stopping interface ib0: before napi_disable ib0: after napi_disable ib0: downing ib_dev ib0: All sends and receives done. ib0: stopping interface ib0: before napi_disable Yes, two napi_disable()s in a row without a matching napi_enable() will deadlock. I guess the question is why the ipoib interface is being stopped twice. If you just take the net-2.6.24 tree (without bonding patches), does bonding for ethernet interfaces work OK, or is there a similar problem with double napi_disable()? How about bonding of ethernet after this batch of bonding patches? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
Roland Dreier wrote: I also ran a test for the code in the branch of 2.6.24 and found a problem. I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?) For what it's worth, I took the upstream 2.6.23 git tree and merged in Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried that. I brought up an IPoIB interface, sent a few pings, and did ifconfig down, and it worked fine. Can you try the same thing without the bonding patches to see if your setup works OK too? Also can you give more details about what you do to get ifconfig down stuck? - R. Without bonding ifconfig down works fine. It happens only when ib interfaces are slaves of a bonding device. I thought before that the stuck is in napi_disable() but it's almost right. I put prints before and after call to napi_disable and see that it is called twice. I'll try to investigate in this direction. ib0: stopping interface ib0: before napi_disable ib0: after napi_disable ib0: downing ib_dev ib0: All sends and receives done. ib0: stopping interface ib0: before napi_disable There is also a dump of the kernel log after 'echo t /proc/sysrq-trigger' (for ifconfig) SysRq : Show State ifconfig S 0 6311 6099 810034f49d18 0086 810037e747c0 810037e747c0 00013481e000 81003a851a78 81003a851840 3b0c8c00 802358ee Call Trace: [8023cc89] lock_timer_base+0x24/0x49 [80403754] schedule_timeout+0x8a/0xad [8023d241] process_timeout+0x0/0x5 [8023d6ec] msleep_interruptible+0x11/0x39 [884081a7] :ib_ipoib:ipoib_stop+0x64/0x12c [8039fc07] dev_close+0x3e/0x56 [803a1c31] dev_change_flags+0xa7/0x15f [803e5bee] devinet_ioctl+0x293/0x5ed [803e775b] inet_ioctl+0x7f/0x9d [80395b2e] sock_ioctl+0x0/0x1fe [80395d08] sock_ioctl+0x1da/0x1fe [802947d9] do_ioctl+0x29/0x6f [80294a75] vfs_ioctl+0x256/0x267 [80294adf] sys_ioctl+0x59/0x7a [8020bc0e] system_call+0x7e/0x83 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
Jay Vosburgh wrote: David Miller [EMAIL PROTECTED] wrote: From: Jeff Garzik [EMAIL PROTECTED] Date: Tue, 09 Oct 2007 20:56:35 -0400 Jeff Garzik wrote: applied patches 1-9 the only thing that was a hiccup during submission is that your email subject lines did not contain a notion of ordering [PATCH 1/9] But other than that, the git-send-email went flawlessly. unfortunately it does not seem to build flawlessly: Yeah it doesn't handle Stephen Hemmingers headerops change in net-2.6.24 Gaah. I'll sort it out and repost. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] Hi Jay, Jeff Thanks for the help with making the patch work compile under 2.6.24. However, patch #3 has a missing line in bond_setup_by_slave that should look like this bond_dev-header_ops= slave_dev-header_ops; I rewrote the patch and also fixed patch #8 that became broken. I would send the new patches now but there is more I also ran a test for the code in the branch of 2.6.24 and found a problem. I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?) I am trying to solve it now so I'd like to wait a short time before applying these patches. I guess that I'll need to add something. thanks MoniS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V6 0/9] net/bonding: ADD IPoIB support for the bonding driver
Jay Vosburgh wrote: Jeff Garzik [EMAIL PROTECTED] wrote: Moni Shoua wrote: Jay Vosburgh wrote: ACK patches 3 - 9. Roland, are you comfortable with the IB changes in patches 1 and 2? Jeff, when Roland acks patches 1 and 2, please apply all 9. -J Hi Jeff, Roland acked the IPoIB patches. If you haven't done so already can you please apply them? I'm not sure when 2.6.24 is going to open and I'm afraid to miss it. hrm, I don't see them in my inbox for some reason. can someone bounce them to me? or give me a git tree to pull from? Moni, can you repost the patch series to Jeff, and put the appropriate Acked-by lines in for myself (patches 3 - 8) and Roland (patches 1 and 2)? You can probably leave off the netdev and openfabrics lists, but cc me. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] Hi Jeff, I don't commits of the patches in http://git.kernel.org/?p=linux/kernel/git/jgarzik/netdev-2.6.git;a=summary (I hope that I'm looking in the right place). Did you get them? thanks MoniS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V6 0/9] net/bonding: ADD IPoIB support for the bonding driver
Jay Vosburgh wrote: ACK patches 3 - 9. Roland, are you comfortable with the IB changes in patches 1 and 2? Jeff, when Roland acks patches 1 and 2, please apply all 9. -J Hi Jeff, Roland acked the IPoIB patches. If you haven't done so already can you please apply them? I'm not sure when 2.6.24 is going to open and I'm afraid to miss it. thanks - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] [PATCH V6 0/9] net/bonding: ADD IPoIB support for the bonding driver
Jay, I think that all comments to the patches were discussed and handled. If you agree, can you please push then to the networking tree so they will be merged into 2.6.24? This includes the IPoIB patches (agreed with Roland). Note that there are *no* patches to net/core (like in V5). thanks MoniS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister
Roland Dreier wrote: The action in bonding to a detach of slave is to unregister the master (see patch 10). This can't be done from the context of unregister_netdevice itself (it is protected by rtnl_lock). I'm confused. Your patch has: + ipoib_slave_detach(cpriv-dev); unregister_netdev(cpriv-dev); And ipoib_slave_detach() is: +static inline void ipoib_slave_detach(struct net_device *dev) +{ + rtnl_lock(); + netdev_slave_detach(dev); + rtnl_unlock(); +} so you are calling netdev_slave_detach() with the rtnl lock held. Why can't you make the same call from the start of unregister_netdevice()? Anyway, if the rtnl lock is a problem, can you just add the call to netdev_slave_detach() to unregister_netdev() before it takes the rtnl lock? - R. Your comment made me do a little rethinking. In bonding, device is released by calling unregister_netdevice() that doesn't take the rtnl_lock (unlike unregister_netdev() that does). I guess that this made me confused to think that this is not possible. So, I guess I could put the detach notification in unregister_netedev() and the reaction to the notification in the bonding driver would not block. However, I looked one more time at the code of unregister_netdevice() and found out that nothing prevents from calling unregister_netdevice() again when the notification NETDEV_GOING_DOWN is sent. I tried that and it works. I have a new set of patches without sending a slave detach and I will send it soon. Thanks for the comment Roland. It makes this patch simpler. I'd also like to give a credit to Jay for the idea of using NETDEV_GOING_DOWN notification instead of NETDEV_CHANGE+IFF_SLAVE_DETACH. He suggested it a while ago but I wrongly thought that it wouldn't work. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 0/9] net/bonding: ADD IPoIB support for the bonding driver
This patch series is the sixth version (see below link to V5) of the suggested changes to the bonding driver so it would be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. Patches 1-8 were originally submitted in V5 and patch 9 is an addition by Jay. Major changes from the previous version: 1. Remove the patches to net/core. Bonding will use the NETDEV_GOING_DOWN notification instead of NETDEV_CHANGE+IFF_SLAVE_DETACH. This reduces the amount of patches from 11 to 9. Links to earlier discussion: 1. A discussion in netdev about bonding support for IPoIB. http://lists.openwall.net/netdev/2006/11/30/46 2. V5 series http://lists.openfabrics.org/pipermail/general/2007-September/040996.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 1/9] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 +++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++- 3 files changed, 20 insertions(+), 11 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-18 17:09:26.534874404 +0200 @@ -328,6 +328,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:23:54.725744661 +0200 @@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++priv-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -830,6 +830,13 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + neigh = *to_ipoib_neigh(n); + if (neigh) { + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -837,13 +844,10 @@ static void ipoib_neigh_cleanup(struct n spin_lock_irqsave(priv-lock, flags); - neigh = *to_ipoib_neigh(n); - if (neigh) { - if (neigh-ah) - ah = neigh-ah; - list_del(neigh-list); - ipoib_neigh_free(n-dev, neigh); - } + if (neigh-ah) + ah = neigh-ah; + list_del(neigh-list); + ipoib_neigh_free(n-dev, neigh); spin_unlock_irqrestore(priv-lock, flags); @@ -851,7 +855,8 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -860,6 +865,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 17:09:26.536874045 +0200 @@ -727,7 +727,8 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour
[PATCH V6 2/9] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:09:26.535874225 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:10:22.375853147 +0200 @@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 3/9] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 39 +++ 1 files changed, 39 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:08:59.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 +0300 @@ -1237,6 +1237,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different + from other slaves (%d), can not enslave it.\n, + slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 4/9] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 87 +++- drivers/net/bonding/bonding.h |1 2 files changed, 60 insertions(+), 28 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:13.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:41.971632881 +0300 @@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* when bonding does not set the slave MAC address, the bond MAC +* address is the one of the active slave. +*/ + if (new_active !bond-do_set_mac_addr) + memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, + new_active-dev-addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond } if (slave_dev-set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - : %s: Error: The slave device you specified does - not support setting the MAC address. - Your kernel likely does not support slave - devices.\n, bond_dev-name); - res = -EOPNOTSUPP; - goto err_undo_flags; + if (bond-slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + : %s: Warning: The first slave device you + specified does not support setting the MAC + address. This bond MAC address would be that + of the active slave.\n, bond_dev-name); + bond-do_set_mac_addr = 0; + } else if (bond-do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + : %s: Error: The slave device you specified + does not support setting the MAC addres,. + but this bond uses this practice. \n + , bond_dev-name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } } new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL); @@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN); - /* -* Set slave to master's mac address. The application already -* set the master's mac address to that of the first slave -*/ - memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); - addr.sa_family = slave_dev-type; - res = dev_set_mac_address(slave_dev, addr); - if (res) { - dprintk(Error %d calling set_mac_address\n, res); - goto err_free; + if (bond-do_set_mac_addr) { + /* +* Set slave to master's mac address. The application already +* set the master's mac address to that of the first slave +*/ + memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); + addr.sa_family = slave_dev-type; + res = dev_set_mac_address(slave_dev, addr); + if (res) { + dprintk(Error %d calling set_mac_address\n, res); + goto err_free; + } } res = netdev_set_master(slave_dev, bond_dev); @@ -1612,9 +1631,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev-type; - dev_set_mac_address(slave_dev, addr); + if (bond-do_set_mac_addr) { + memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev-type; + dev_set_mac_address(slave_dev, addr); + } err_free: kfree(new_slave); @@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original (permanent) mac address */ - memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN
[PATCH V6 5/9] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c |5 +++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 5 insertions(+), 6 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:41.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.431862446 +0300 @@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, + bond_dev-name); } /* already enslaved */ Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:08:58.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 10:55:48.432862269 +0300 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 6/9] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c |3 ++- drivers/net/bonding/bond_sysfs.c | 10 -- drivers/net/bonding/bonding.h|1 + 3 files changed, 11 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-09-24 12:52:33.0 +0200 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-09-24 12:57:33.411459811 +0200 @@ -1224,7 +1224,8 @@ static int bond_compute_features(struct struct slave *slave; struct net_device *bond_dev = bond-dev; unsigned long features = bond_dev-features; - unsigned short max_hard_header_len = ETH_HLEN; + unsigned short max_hard_header_len = max((u16)ETH_HLEN, + bond_dev-hard_header_len); int i; features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES); Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-09-24 12:55:09.0 +0200 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-09-24 13:00:23.752680721 +0200 @@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu = original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { @@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-09-24 12:55:09.0 +0200 +++ net-2.6/drivers/net/bonding/bonding.h 2007-09-24 12:57:33.412459636 +0200 @@ -156,6 +156,7 @@ struct slave { s8 link;/* one of BOND_LINK_ */ s8 state; /* one of BOND_STATE_ */ u32original_flags; + u32original_mtu; u32link_failure_count; u16speed; u8 duplex; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH V6 7/9] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 24 +--- drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 11:04:37.221123652 +0300 @@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) { + dprintk(delaying gratuitous arp on %s\n, + bond-curr_active_slave-dev-name); + bond-send_grat_arp = 1; + } else + bond_send_gratuitous_arp(bond); } } @@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n); + else { + dprintk(sending delayed gratuitous arp on on %s\n, + bond-curr_active_slave-dev-name); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp = 0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(str if (bond-master_ip) { bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4293,6 +4310,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-15 11:05:41.516451497 +0300 @@ -187,6 +187,7 @@ struct bonding { struct timer_list arp_timer; s8 kill_timers; s8 do_set_mac_addr; + s8 send_grat_arp; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 8/9] net/bonding: Destroy bonding master when last slave is gone
When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 37 + drivers/net/bonding/bond_sysfs.c |9 + drivers/net/bonding/bonding.h|3 +++ 3 files changed, 45 insertions(+), 4 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-09-24 14:01:24.055441842 +0200 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-09-24 14:05:05.658979207 +0200 @@ -1256,6 +1256,7 @@ static int bond_compute_features(struct static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + struct bonding *bond = bond_dev-priv; bond_dev-hard_header = slave_dev-hard_header; bond_dev-rebuild_header= slave_dev-rebuild_header; bond_dev-hard_header_cache = slave_dev-hard_header_cache; @@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct n memcpy(bond_dev-broadcast, slave_dev-broadcast, slave_dev-addr_len); + bond-setup_by_slave = 1; } /* enslave device slave to bond device master */ @@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond } /* +* Destroy a bonding device. +* Must be under rtnl_lock when this function is called. +*/ +void bond_destroy(struct bonding *bond) +{ + bond_deinit(bond-dev); + bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond-dev); +} + +/* +* First release a slave and than destroy the bond if no more slaves iare left. +* Must be under rtnl_lock when this function is called. +*/ +int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev) +{ + struct bonding *bond = bond_dev-priv; + int ret; + + ret = bond_release(bond_dev, slave_dev); + if ((ret == 0) (bond-slave_cnt == 0)) { + printk(KERN_INFO DRV_NAME %s: destroying bond %s.\n, + bond_dev-name); + bond_destroy(bond); + } + return ret; +} + +/* * This function releases all slaves. */ static int bond_release_all(struct net_device *bond_dev) @@ -3337,6 +3368,11 @@ static int bond_slave_netdev_event(unsig * ... Or is it this? */ break; + case NETDEV_GOING_DOWN: + dprintk(slave %s is going down\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + break; case NETDEV_CHANGEMTU: /* * TODO: Should slaves be allowed to @@ -4311,6 +4347,7 @@ static int bond_init(struct net_device * bond-primary_slave = NULL; bond-dev = bond_dev; bond-send_grat_arp = 0; + bond-setup_by_slave = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-09-24 14:01:24.055441842 +0200 +++ net-2.6/drivers/net/bonding/bonding.h 2007-09-24 14:01:24.627340013 +0200 @@ -188,6 +188,7 @@ struct bonding { s8 kill_timers; s8 do_set_mac_addr; s8 send_grat_arp; + s8 setup_by_slave; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; @@ -295,6 +296,8 @@ static inline void bond_unset_master_alb struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); int bond_create(char *name, struct bond_params *params, struct bonding **newbond); +void bond_destroy(struct bonding *bond); +int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev); void bond_deinit(struct net_device *bond_dev); int bond_create_sysfs(void); void bond_destroy_sysfs(void); Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-09-24 14:01:23.523536550 +0200 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-09-24 14:01:24.628339835 +0200 @@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc printk(KERN_INFO DRV_NAME : %s
[PATCH 9/9] bonding: Optionally allow ethernet slaves to keep own MAC
Update the don't change MAC of slaves functionality added in previous changes to be a generic option, rather than something tied to IB devices, as it's occasionally useful for regular ethernet devices as well. Adds fail_over_mac option (which is automatically enabled for IB slaves), applicable only to active-backup mode. Includes documentation update. Updates bonding driver version to 3.2.0. Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] --- Documentation/networking/bonding.txt | 33 +++ drivers/net/bonding/bond_main.c | 57 + drivers/net/bonding/bond_sysfs.c | 49 + drivers/net/bonding/bonding.h|6 ++-- 4 files changed, 121 insertions(+), 24 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 1da5666..1134062 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -281,6 +281,39 @@ downdelay will be rounded down to the nearest multiple. The default value is 0. +fail_over_mac + + Specifies whether active-backup mode should set all slaves to + the same MAC address (the traditional behavior), or, when + enabled, change the bond's MAC address when changing the + active interface (i.e., fail over the MAC address itself). + + Fail over MAC is useful for devices that cannot ever alter + their MAC address, or for devices that refuse incoming + broadcasts with their own source MAC (which interferes with + the ARP monitor). + + The down side of fail over MAC is that every device on the + network must be updated via gratuitous ARP, vs. just updating + a switch or set of switches (which often takes place for any + traffic, not just ARP traffic, if the switch snoops incoming + traffic to update its tables) for the traditional method. If + the gratuitous ARP is lost, communication may be disrupted. + + When fail over MAC is used in conjuction with the mii monitor, + devices which assert link up prior to being able to actually + transmit and receive are particularly susecptible to loss of + the gratuitous ARP, and an appropriate updelay setting may be + required. + + A value of 0 disables fail over MAC, and is the default. A + value of 1 enables fail over MAC. This option is enabled + automatically if the first slave added cannot change its MAC + address. This option may be modified via sysfs only when no + slaves are present in the bond. + + This option was added in bonding version 3.2.0. + lacp_rate Option specifying the rate in which we'll ask our link partner diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 77caca3..c01ff9d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -97,6 +97,7 @@ static char *xmit_hash_policy = NULL; static int arp_interval = BOND_LINK_ARP_INTERV; static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, }; static char *arp_validate = NULL; +static int fail_over_mac = 0; struct bond_params bonding_defaults; module_param(max_bonds, int, 0); @@ -130,6 +131,8 @@ module_param_array(arp_ip_target, charp, NULL, 0); MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form); module_param(arp_validate, charp, 0); MODULE_PARM_DESC(arp_validate, validate src/dst of ARP probes: none (default), active, backup or all); +module_param(fail_over_mac, int, 0); +MODULE_PARM_DESC(fail_over_mac, For active-backup, do not set all slaves to the same MAC. 0 of off (default), 1 for on.); /*- Global variables */ @@ -1099,7 +1102,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) /* when bonding does not set the slave MAC address, the bond MAC * address is the one of the active slave. */ - if (new_active !bond-do_set_mac_addr) + if (new_active bond-params.fail_over_mac) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); if (bond-curr_active_slave @@ -1371,16 +1374,16 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) if (slave_dev-set_mac_address == NULL) { if (bond-slave_cnt == 0) { printk(KERN_WARNING DRV_NAME - : %s: Warning: The first slave device you - specified does not support setting the MAC - address. This bond MAC address would be that - of the active slave.\n, bond_dev-name); - bond-do_set_mac_addr = 0; - } else
Re: [ofa-general] Re: [PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister
Roland Dreier wrote: + ipoib_slave_detach(cpriv-dev); unregister_netdev(cpriv-dev); Maybe you already answered this before, but I'm still not clear why this notifier call can't just be added to the start of unregister_netdevice(), so we can avoid having driver needing to know anything about bonding internals? - R. The action in bonding to a detach of slave is to unregister the master (see patch 10). This can't be done from the context of unregister_netdevice itself (it is protected by rtnl_lock). That's why I had to notify the detach before unregister begins. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 1/11] net/core: add a netdev notification for slave detach
A slave of a bonding master that wants to send a notification before going down should call netdev_slave_detach(). The handling of this notification will be done outside the context of unregister_netdevice() which is sometimes necessary, as with IPoIB slave for example. Signed-off-by: Moni Shoua monis at voltaire.com --- include/linux/if.h |1 + net/core/dev.c | 20 2 files changed, 21 insertions(+) Index: net-2.6/net/core/dev.c === --- net-2.6.orig/net/core/dev.c 2007-09-20 08:04:47.164051688 +0200 +++ net-2.6/net/core/dev.c 2007-09-20 09:20:21.493060579 +0200 @@ -2588,6 +2588,25 @@ int netdev_set_master(struct net_device return 0; } +/** + * netdev_slave_detach - notify that slave is about to detach from master + * @slave: slave device + * + * Raise a flag that slave is about to detach from master + * and notify the netdev chain. + * The caller must hold the rtnl_mutex. + */ + +int netdev_slave_detach(struct net_device *slave) +{ + int ret = 0; + if (slave-flags IFF_SLAVE) { + slave-priv_flags |= IFF_SLAVE_DETACH; + ret = call_netdevice_notifiers(NETDEV_CHANGE, slave); + } + return ret; +} + static void __dev_set_promiscuity(struct net_device *dev, int inc) { unsigned short old_flags = dev-flags; @@ -4120,6 +4139,7 @@ EXPORT_SYMBOL(dev_set_mac_address); EXPORT_SYMBOL(free_netdev); EXPORT_SYMBOL(netdev_boot_setup_check); EXPORT_SYMBOL(netdev_set_master); +EXPORT_SYMBOL(netdev_slave_detach); EXPORT_SYMBOL(netdev_state_change); EXPORT_SYMBOL(netif_receive_skb); EXPORT_SYMBOL(netif_rx); Index: net-2.6/include/linux/if.h === --- net-2.6.orig/include/linux/if.h 2007-09-20 08:04:47.164051688 +0200 +++ net-2.6/include/linux/if.h 2007-09-20 08:15:29.577729301 +0200 @@ -61,6 +61,7 @@ #define IFF_MASTER_ALB 0x10/* bonding master, balance-alb. */ #define IFF_BONDING0x20/* bonding master or slave */ #define IFF_SLAVE_NEEDARP 0x40 /* need ARPs for validation */ +#define IFF_SLAVE_DETACH 0x80 /* slave is about to unregister */ #define IF_GET_IFACE 0x0001 /* for querying only */ #define IF_GET_PROTO 0x0002 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister
When the bonding device enslaves IPoIB devices it takes pointers to functions in the ib_ipoib module. This is fine as long as the ib_ipoib nodule remains loaded while the references to its functions exist. So, to help bonding do a cleanup on time, when the IPoIB net device is a slave of a bonding master, let the master know that the IPoIB device is about to unregister (but before calling unregister). Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib.h |7 +++ drivers/infiniband/ulp/ipoib/ipoib_main.c |3 +++ drivers/infiniband/ulp/ipoib/ipoib_vlan.c |1 + 3 files changed, 11 insertions(+) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-20 08:35:34.0 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-20 14:20:16.495147879 +0200 @@ -48,6 +48,7 @@ #include linux/in.h #include net/dst.h +#include linux/netdevice.h MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(IP-over-InfiniBand net driver); @@ -921,6 +922,7 @@ void ipoib_dev_cleanup(struct net_device /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, priv-child_intfs, list) { + ipoib_slave_detach(cpriv-dev); unregister_netdev(cpriv-dev); ipoib_dev_cleanup(cpriv-dev); free_netdev(cpriv-dev); @@ -1208,6 +1210,7 @@ static void ipoib_remove_one(struct ib_d ib_unregister_event_handler(priv-event_handler); flush_scheduled_work(); + ipoib_slave_detach(priv-dev); unregister_netdev(priv-dev); ipoib_dev_cleanup(priv-dev); free_netdev(priv-dev); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_vlan.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2007-09-20 09:26:11.0 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2007-09-20 09:27:20.182709679 +0200 @@ -157,6 +157,7 @@ int ipoib_vlan_delete(struct net_device mutex_lock(ppriv-vlan_mutex); list_for_each_entry_safe(priv, tpriv, ppriv-child_intfs, list) { if (priv-pkey == pkey) { + ipoib_slave_detach(priv-dev); unregister_netdev(priv-dev); ipoib_dev_cleanup(priv-dev); list_del(priv-list); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-09-20 12:18:56.0 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-20 14:21:47.385972207 +0200 @@ -570,6 +570,13 @@ static inline void ipoib_cm_handle_rx_wc #endif +static inline void ipoib_slave_detach(struct net_device *dev) +{ + rtnl_lock(); + netdev_slave_detach(dev); + rtnl_unlock(); +} + #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG void ipoib_create_debug_files(struct net_device *dev); void ipoib_delete_debug_files(struct net_device *dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 3/11] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 +++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++- 3 files changed, 20 insertions(+), 11 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-18 17:09:26.534874404 +0200 @@ -328,6 +328,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:23:54.725744661 +0200 @@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++priv-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -830,6 +830,13 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + neigh = *to_ipoib_neigh(n); + if (neigh) { + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -837,13 +844,10 @@ static void ipoib_neigh_cleanup(struct n spin_lock_irqsave(priv-lock, flags); - neigh = *to_ipoib_neigh(n); - if (neigh) { - if (neigh-ah) - ah = neigh-ah; - list_del(neigh-list); - ipoib_neigh_free(n-dev, neigh); - } + if (neigh-ah) + ah = neigh-ah; + list_del(neigh-list); + ipoib_neigh_free(n-dev, neigh); spin_unlock_irqrestore(priv-lock, flags); @@ -851,7 +855,8 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -860,6 +865,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 17:08:53.245849217 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 17:09:26.536874045 +0200 @@ -727,7 +727,8 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour
[PATCH V5 4/11] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:09:26.535874225 +0200 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-09-18 17:10:22.375853147 +0200 @@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 5/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 39 +++ 1 files changed, 39 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:08:59.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 +0300 @@ -1237,6 +1237,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different + from other slaves (%d), can not enslave it.\n, + slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 6/11] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 87 +++- drivers/net/bonding/bonding.h |1 2 files changed, 60 insertions(+), 28 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:13.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:41.971632881 +0300 @@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* when bonding does not set the slave MAC address, the bond MAC +* address is the one of the active slave. +*/ + if (new_active !bond-do_set_mac_addr) + memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, + new_active-dev-addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond } if (slave_dev-set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - : %s: Error: The slave device you specified does - not support setting the MAC address. - Your kernel likely does not support slave - devices.\n, bond_dev-name); - res = -EOPNOTSUPP; - goto err_undo_flags; + if (bond-slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + : %s: Warning: The first slave device you + specified does not support setting the MAC + address. This bond MAC address would be that + of the active slave.\n, bond_dev-name); + bond-do_set_mac_addr = 0; + } else if (bond-do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + : %s: Error: The slave device you specified + does not support setting the MAC addres,. + but this bond uses this practice. \n + , bond_dev-name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } } new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL); @@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN); - /* -* Set slave to master's mac address. The application already -* set the master's mac address to that of the first slave -*/ - memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); - addr.sa_family = slave_dev-type; - res = dev_set_mac_address(slave_dev, addr); - if (res) { - dprintk(Error %d calling set_mac_address\n, res); - goto err_free; + if (bond-do_set_mac_addr) { + /* +* Set slave to master's mac address. The application already +* set the master's mac address to that of the first slave +*/ + memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); + addr.sa_family = slave_dev-type; + res = dev_set_mac_address(slave_dev, addr); + if (res) { + dprintk(Error %d calling set_mac_address\n, res); + goto err_free; + } } res = netdev_set_master(slave_dev, bond_dev); @@ -1612,9 +1631,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev-type; - dev_set_mac_address(slave_dev, addr); + if (bond-do_set_mac_addr) { + memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev-type; + dev_set_mac_address(slave_dev, addr); + } err_free: kfree(new_slave); @@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original (permanent) mac address */ - memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN
[PATCH V5 7/11] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c |5 +++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 5 insertions(+), 6 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:41.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.431862446 +0300 @@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, + bond_dev-name); } /* already enslaved */ Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:08:58.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 10:55:48.432862269 +0300 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 0/11] net/bonding: ADD IPoIB support for the bonding driver
This patch series is the fifth version (see below link to V4) of the suggested changes to the bonding driver so it would be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. Patches 1-10 were originally submitted in V4 and patch 11 is an addition by Jay. Jay, The bonding patches you acked remain unchanged while I guess I sitll need to get an official ack by Roland for the IPoIB patches. Is it OK with you to push the entire series to the networking tree? Roland has already agreed to do so. Major changes from the previous version: 1. Style changes 2. IPoIB - notify slave detach on vlan delete 3. Add function to net/core for slave detach instead of having it only in ib/ipoib 4. IPoIB - handle ib device and bonding device the same way in neigh_cleanup function Links to earlier discussion: 1. A discussion in netdev about bonding support for IPoIB. http://lists.openwall.net/netdev/2006/11/30/46 2. V4 series http://lists.openfabrics.org/pipermail/general/2007-August/039825.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 8/11] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c |3 ++- drivers/net/bonding/bond_sysfs.c | 19 +-- drivers/net/bonding/bonding.h|1 + 3 files changed, 16 insertions(+), 7 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:29:11.911298577 +0300 @@ -1224,7 +1224,8 @@ static int bond_compute_features(struct struct slave *slave; struct net_device *bond_dev = bond-dev; unsigned long features = bond_dev-features; - unsigned short max_hard_header_len = ETH_HLEN; + unsigned short max_hard_header_len = max((u16)ETH_HLEN, + bond_dev-hard_header_len); int i; features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES); Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 12:14:41.152469089 +0300 @@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc printk(KERN_INFO DRV_NAME : %s is being deleted...\n, bond-dev-name); - bond_deinit(bond-dev); - bond_destroy_sysfs_entry(bond); - unregister_netdevice(bond-dev); + bond_destroy(bond); rtnl_unlock(); goto out; } @@ -260,6 +258,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +324,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +339,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu = original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,13 +354,17 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { printk(KERN_INFO DRV_NAME : %s: Removing slave %s\n, bond-dev-name, dev-name); rtnl_lock(); - res = bond_release(bond-dev, dev); + if (bond-setup_by_slave) + res = bond_release_and_destroy(bond-dev, dev); + else + res = bond_release(bond-dev, dev); rtnl_unlock(); if (res) { ret = res; @@ -365,9 +372,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:55:34.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-20 14:29:11.912298402 +0300 @@ -156,6 +156,7 @@ struct slave { s8 link;/* one
PATCH V5 9/11] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 24 +--- drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 11:04:37.221123652 +0300 @@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) { + dprintk(delaying gratuitous arp on %s\n, + bond-curr_active_slave-dev-name); + bond-send_grat_arp = 1; + } else + bond_send_gratuitous_arp(bond); } } @@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n); + else { + dprintk(sending delayed gratuitous arp on on %s\n, + bond-curr_active_slave-dev-name); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp = 0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(str if (bond-master_ip) { bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4293,6 +4310,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-15 11:05:41.516451497 +0300 @@ -187,6 +187,7 @@ struct bonding { struct timer_list arp_timer; s8 kill_timers; s8 do_set_mac_addr; + s8 send_grat_arp; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 10/11] net/bonding: Destroy bonding master when last slave is gone
When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Signed-off-by: Moni Shoua monis at voltaire.com --- drivers/net/bonding/bond_main.c | 45 +++- drivers/net/bonding/bonding.h |3 ++ 2 files changed, 47 insertions(+), 1 deletion(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-20 14:43:17.123702132 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:43:17.850571535 +0300 @@ -1256,6 +1256,7 @@ static int bond_compute_features(struct static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + struct bonding *bond = bond_dev-priv; bond_dev-hard_header = slave_dev-hard_header; bond_dev-rebuild_header= slave_dev-rebuild_header; bond_dev-hard_header_cache = slave_dev-hard_header_cache; @@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct n memcpy(bond_dev-broadcast, slave_dev-broadcast, slave_dev-addr_len); + bond-setup_by_slave = 1; } /* enslave device slave to bond device master */ @@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond } /* +* Destroy a bonding device. +* Must be under rtnl_lock when this function is called. +*/ +void bond_destroy(struct bonding *bond) +{ + bond_deinit(bond-dev); + bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond-dev); +} + +/* +* First release a slave and than destroy the bond if no more slaves iare left. +* Must be under rtnl_lock when this function is called. +*/ +int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev) +{ + struct bonding *bond = bond_dev-priv; + int ret; + + ret = bond_release(bond_dev, slave_dev); + if ((ret == 0) (bond-slave_cnt == 0)) { + printk(KERN_INFO DRV_NAME %s: destroying bond for.\n, + bond_dev-name); + bond_destroy(bond); + } + return ret; +} + +/* * This function releases all slaves. */ static int bond_release_all(struct net_device *bond_dev) @@ -3322,7 +3353,11 @@ static int bond_slave_netdev_event(unsig switch (event) { case NETDEV_UNREGISTER: if (bond_dev) { - bond_release(bond_dev, slave_dev); + dprintk(slave %s unregisters\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); } break; case NETDEV_CHANGE: @@ -3331,6 +3366,13 @@ static int bond_slave_netdev_event(unsig * sets up a hierarchical bond, then rmmod's * one of the slave bonding devices? */ + if (slave_dev-priv_flags IFF_SLAVE_DETACH) { + dprintk(slave %s detaching\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); + } break; case NETDEV_DOWN: /* @@ -4311,6 +4353,7 @@ static int bond_init(struct net_device * bond-primary_slave = NULL; bond-dev = bond_dev; bond-send_grat_arp = 0; + bond-setup_by_slave = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-20 14:43:17.123702132 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-20 14:47:52.845180870 +0300 @@ -188,6 +188,7 @@ struct bonding { s8 kill_timers; s8 do_set_mac_addr; s8 send_grat_arp; + s8 setup_by_slave; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; @@ -295,6 +296,8 @@ static inline void bond_unset_master_alb struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); int bond_create(char *name, struct bond_params *params, struct bonding **newbond); +void
[PATCH V5 5/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua monis at voltaire.com Signed-off-by: Or Gerlitz ogerlitz at voltaire.com --- drivers/net/bonding/bond_main.c | 39 +++ 1 files changed, 39 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:08:59.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 +0300 @@ -1237,6 +1237,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different + from other slaves (%d), can not enslave it.\n, + slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/11] bonding: Optionally allow ethernet slaves to keep own MAC
Update the don't change MAC of slaves functionality added in previous changes to be a generic option, rather than something tied to IB devices, as it's occasionally useful for regular ethernet devices as well. Adds fail_over_mac option (which is automatically enabled for IB slaves), applicable only to active-backup mode. Includes documentation update. Updates bonding driver version to 3.2.0. Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] --- Documentation/networking/bonding.txt | 33 +++ drivers/net/bonding/bond_main.c | 57 + drivers/net/bonding/bond_sysfs.c | 49 + drivers/net/bonding/bonding.h|6 ++-- 4 files changed, 121 insertions(+), 24 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 1da5666..1134062 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -281,6 +281,39 @@ downdelay will be rounded down to the nearest multiple. The default value is 0. +fail_over_mac + + Specifies whether active-backup mode should set all slaves to + the same MAC address (the traditional behavior), or, when + enabled, change the bond's MAC address when changing the + active interface (i.e., fail over the MAC address itself). + + Fail over MAC is useful for devices that cannot ever alter + their MAC address, or for devices that refuse incoming + broadcasts with their own source MAC (which interferes with + the ARP monitor). + + The down side of fail over MAC is that every device on the + network must be updated via gratuitous ARP, vs. just updating + a switch or set of switches (which often takes place for any + traffic, not just ARP traffic, if the switch snoops incoming + traffic to update its tables) for the traditional method. If + the gratuitous ARP is lost, communication may be disrupted. + + When fail over MAC is used in conjuction with the mii monitor, + devices which assert link up prior to being able to actually + transmit and receive are particularly susecptible to loss of + the gratuitous ARP, and an appropriate updelay setting may be + required. + + A value of 0 disables fail over MAC, and is the default. A + value of 1 enables fail over MAC. This option is enabled + automatically if the first slave added cannot change its MAC + address. This option may be modified via sysfs only when no + slaves are present in the bond. + + This option was added in bonding version 3.2.0. + lacp_rate Option specifying the rate in which we'll ask our link partner diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 77caca3..c01ff9d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -97,6 +97,7 @@ static char *xmit_hash_policy = NULL; static int arp_interval = BOND_LINK_ARP_INTERV; static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, }; static char *arp_validate = NULL; +static int fail_over_mac = 0; struct bond_params bonding_defaults; module_param(max_bonds, int, 0); @@ -130,6 +131,8 @@ module_param_array(arp_ip_target, charp, NULL, 0); MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form); module_param(arp_validate, charp, 0); MODULE_PARM_DESC(arp_validate, validate src/dst of ARP probes: none (default), active, backup or all); +module_param(fail_over_mac, int, 0); +MODULE_PARM_DESC(fail_over_mac, For active-backup, do not set all slaves to the same MAC. 0 of off (default), 1 for on.); /*- Global variables */ @@ -1099,7 +1102,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) /* when bonding does not set the slave MAC address, the bond MAC * address is the one of the active slave. */ - if (new_active !bond-do_set_mac_addr) + if (new_active bond-params.fail_over_mac) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); if (bond-curr_active_slave @@ -1371,16 +1374,16 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) if (slave_dev-set_mac_address == NULL) { if (bond-slave_cnt == 0) { printk(KERN_WARNING DRV_NAME - : %s: Warning: The first slave device you - specified does not support setting the MAC - address. This bond MAC address would be that - of the active slave.\n, bond_dev-name); - bond-do_set_mac_addr = 0; - } else
Re: [ofa-general] Re: [PATCH 02/11] IB/ipoib: Notify the world before doing unregister
Roland, Jay, Thanks a lot for the comments. I'd like to summarize the points raised so far 1. Reduce the indentation in patch #4 (Roland) I will resend 2. Remove the if (n-dev-flags IFF_MASTER) from patch #3 (Roland) I will resend 3. Consider making ipoib_slave_detach() net/core/dev.c (Roland, Jay) I think that this is a good idea. I can make the patch (and necessary changes to the other patches) assuming this is agreed by all. 4. Change header for patch #1 (Roland) I will resend 5. Use NETDEV_GOING_DOWN and not NETDEV_CHANGE + IFF_SLAVE_DETACH (Jay) The NETDEV_GOING_DOWN event is sent in the contex of unregister_netdevice() Since the action in bonding to the event should be unregister the bonding master it is not possible to do so. bonding needs to know about the slave detach earlier. 6. call notifiers from unregister_netdev() See answer to 5. 7. missing call to notifiers in ipoib_vlan_delete() (Roland) It seems like you're right. I will fix and resend. I think that if there are no other comments, I will submit the entire 11 patches again (with changes) to make it easier to merge into the kernel. Since the most of the content in the patch series is in bonding I thought it would be right that Jay will push all the patches to the networking git. Is it OK with you Roland? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] [PATCH V4 0/10] net/bonding: ADD IPoIB support for the bonding driver
Hi all, This patch series is a bit neglected. Since our goal is to have bonding support for IPoIB in kernel 2.6.24 it is very important for us to get comments soon. We would appreciate if you take some time to look at this and help us push this code upstream. thanks MoniS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V4 10/10] net/bonding: Destroy bonding master when last slave is gone
Jay Vosburgh wrote: Moni Shoua [EMAIL PROTECTED] wrote: Jay Vosburgh wrote: Moni Shoua [EMAIL PROTECTED] wrote: When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Would it not be simpler to run the bonding master through ether_setup() again when the final slave is released (to reset all of the pointers to their ethernet values)? I'm presuming here the pointers of questionable validity are the ones set in the bond_setup_by_slave() copied from the slave_dev-hard_header, et al. Having the bonding master disappear (but only sometimes) after the last slave is removed is a semantic change I'd rather not introduce if it's not necessary. Thanks for the comments. Having the master disappear is one way I could think of to solve the problem of leaving the bonding module with pointers to illegal addresses. The other way is to increase the usage count, with try_module_get(), of the module which owns of the slave. To do that I have to restore the field owner in structure net_device (it was removed in 2.6). What I was asking above is really whether or not it's feasible to simply reset the affected pointers back to the ethernet values from ether_setup(). I would think this should return the bonding master back to the original state it started in before any slaves were added. Unless I'm missing something; I'm willing to believe there's some IB-specific tidbit I'm unaware of that makes this more complicated than it seems. It's possible to reset the bonding master by calling ether_setup but with one exception: its neighbors. When enslaving IPoIB devices, the bonding master neighbors point to a destructor function in the ib_ipoib module. When ib_ipoib goes down the neighbors of the bonding master still exist and when their turn come to die they will try to access this function and the kernel will crash. This is why I want to destroy the bonding master before ib_ipoib is unloaded (to kill its neighbors). For any other issue (i.e. taken pointer), ether_setup would solve the problem. This presumes that I'm correct in thinking that the pointers you're talking about (as being unsafe after removal of last slave) are the ones copied in your new function bond_setup_by_slave(). I don't think it's desirable to acquire a reference to the slave driver module. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V4 8/10] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
Jay Vosburgh wrote: Moni Shoua [EMAIL PROTECTED] wrote: bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |3 ++- drivers/net/bonding/bond_sysfs.c | 19 +-- drivers/net/bonding/bonding.h|1 + 3 files changed, 16 insertions(+), 7 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:29:11.911298577 +0300 @@ -1224,7 +1224,8 @@ static int bond_compute_features(struct struct slave *slave; struct net_device *bond_dev = bond-dev; unsigned long features = bond_dev-features; -unsigned short max_hard_header_len = ETH_HLEN; +unsigned short max_hard_header_len = max((u16)ETH_HLEN, +bond_dev-hard_header_len); Since non-IB bonding masters are run through ether_setup, which sets hard_header_len to ETH_HLEN, the max() is probably unnecessary, and I think this could just be bond_dev-hard_header_len. This is true except for the case where there are no slaves left. In that case max_hard_header_len has equals to the initialization value. bond_for_each_slave(bond, slave, i) { features = netdev_compute_features(features, slave-dev-features); if (slave-dev-hard_header_len max_hard_header_len) max_hard_header_len = slave-dev-hard_header_len; } features |= (bond_dev-features BOND_VLAN_FEATURES); bond_dev-features = features; bond_dev-hard_header_len = max_hard_header_len; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V4 10/10] net/bonding: Destroy bonding master when last slave is gone
Jay Vosburgh wrote: Moni Shoua [EMAIL PROTECTED] wrote: When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Would it not be simpler to run the bonding master through ether_setup() again when the final slave is released (to reset all of the pointers to their ethernet values)? I'm presuming here the pointers of questionable validity are the ones set in the bond_setup_by_slave() copied from the slave_dev-hard_header, et al. Having the bonding master disappear (but only sometimes) after the last slave is removed is a semantic change I'd rather not introduce if it's not necessary. Thanks for the comments. Having the master disappear is one way I could think of to solve the problem of leaving the bonding module with pointers to illegal addresses. The other way is to increase the usage count, with try_module_get(), of the module which owns of the slave. To do that I have to restore the field owner in structure net_device (it was removed in 2.6). Do you prefer the second approach? I wasn't sure about that. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 0/10] net/bonding: ADD IPoIB support for the bonding driver
This patch series is the fourth version (see below link to V3) of the suggested changes to the bonding driver so it would be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. The motivation is to enable the bonding driver on its HA mode to work with the IP over Infiniband (IPoIB) driver. With these patches I was able to enslave IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with fail-over and fail-back working fine. The working environment was the net-2.6 git. More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which is used by native IB ULPs whose addressing scheme is based on IP (e.g. iSER, SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA for these ULPs. This holds as when the ULP is informed by the IB HW on the failure of the current IB connection, it just need to reconnect, where the bonding device will now issue the IB ARP over the active IPoIB slave. This series also includes patches to the IPoIB driver that fix some fix some neighboring related issues. Major changes from the previous version: 1) Addressing the issue of safety when unloading the IPoIB module before the bonding module 2) style changes Links to earlier discussion: 1. A discussion in netdev about bonding support for IPoIB. http://lists.openwall.net/netdev/2006/11/30/46 2. A discussion in openfabrics regarding changes in the IPoIB that enable using it as a slave for bonding. http://lists.openfabrics.org/pipermail/general/2007-July/038914.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 1/10] IB/ipoib: Export call to call_netdevice_notifiers and add new private flag
Export the call to raw_notifier_call_chain so modules can send notifications on netdev events to the netdev_chain. Add IFF_SLAVE_DETACH to the list of priv_flags for net_device. This flag is set by a slave that is about to unregisster from the kernel. Both changes are used in bonding slaves that wish to inform the bonding master about coming detachment. Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- include/linux/if.h |1 + net/core/dev.c |1 + 2 files changed, 2 insertions(+) Index: net-2.6/net/core/dev.c === --- net-2.6.orig/net/core/dev.c 2007-08-15 10:09:02.0 +0300 +++ net-2.6/net/core/dev.c 2007-08-15 10:53:00.832543390 +0300 @@ -1148,6 +1148,7 @@ int call_netdevice_notifiers(unsigned lo { return raw_notifier_call_chain(netdev_chain, val, v); } +EXPORT_SYMBOL(call_netdevice_notifiers); /* When 0 there are consumers of rx skb time stamps */ static atomic_t netstamp_needed = ATOMIC_INIT(0); Index: net-2.6/include/linux/if.h === --- net-2.6.orig/include/linux/if.h 2007-08-20 14:30:39.0 +0300 +++ net-2.6/include/linux/if.h 2007-08-20 14:31:06.625174369 +0300 @@ -61,6 +61,7 @@ #define IFF_MASTER_ALB 0x10/* bonding master, balance-alb. */ #define IFF_BONDING0x20/* bonding master or slave */ #define IFF_SLAVE_NEEDARP 0x40 /* need ARPs for validation */ +#define IFF_SLAVE_DETACH 0x80 /* slave is about to unregister */ #define IF_GET_IFACE 0x0001 /* for querying only */ #define IF_GET_PROTO 0x0002 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 2/10] IB/ipoib: Notify the world before doing unregister
When the bonding device enslaves IPoIB devices it takes pointers to functions in the ib_ipoib module. This is fine as long as the ib_ipoib nodule remains loaded while the references to its functions exist. So, to help bonding do a cleanup on time, when the IPoIB net device is a slave of a bonding master, let the master know that the IPoIB device is about to unregister (but before calling unregister). Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 +++ 1 files changed, 15 insertions(+) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-20 14:29:29.522209580 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-20 14:43:03.432162133 +0300 @@ -48,6 +48,7 @@ #include linux/in.h #include net/dst.h +#include linux/netdevice.h MODULE_AUTHOR(Roland Dreier); MODULE_DESCRIPTION(IP-over-InfiniBand net driver); @@ -772,6 +773,18 @@ static void ipoib_timeout(struct net_dev /* XXX reset QP, etc. */ } +static int ipoib_slave_detach(struct net_device *dev) +{ + int ret = 0; + if (dev-flags IFF_SLAVE) { + dev-priv_flags |= IFF_SLAVE_DETACH; + rtnl_lock(); + ret = call_netdevice_notifiers(NETDEV_CHANGE, dev); + rtnl_unlock(); + } + return ret; +} + static int ipoib_hard_header(struct sk_buff *skb, struct net_device *dev, unsigned short type, @@ -921,6 +934,7 @@ void ipoib_dev_cleanup(struct net_device /* Delete any child interfaces first */ list_for_each_entry_safe(cpriv, tcpriv, priv-child_intfs, list) { + ipoib_slave_detach(cpriv-dev); unregister_netdev(cpriv-dev); ipoib_dev_cleanup(cpriv-dev); free_netdev(cpriv-dev); @@ -1208,6 +1222,7 @@ static void ipoib_remove_one(struct ib_d ib_unregister_event_handler(priv-event_handler); flush_scheduled_work(); + ipoib_slave_detach(priv-dev); unregister_netdev(priv-dev); ipoib_dev_cleanup(priv-dev); free_netdev(priv-dev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 3/10] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 +++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++- 3 files changed, 20 insertions(+), 4 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-08-15 10:09:00.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-08-15 10:53:52.756348574 +0300 @@ -328,6 +328,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-15 10:53:28.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-15 10:53:52.757348397 +0300 @@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++priv-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -830,6 +830,17 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + if (n-dev-flags IFF_MASTER) { + /* n-dev is not an IPoIB device and we have + to take priv from elsewhere */ + neigh = *to_ipoib_neigh(n); + if (neigh) { + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; + } ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -851,7 +862,8 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -860,6 +872,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-08-15 10:09:00.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-08-15 10:53:52.758348220 +0300 @@ -727,7 +727,8 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour, + skb-dev); if (neigh
[PATCH V4 4/10] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-15 10:53:52.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-08-15 10:54:03.959364640 +0300 @@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 5/10] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 39 +++ 1 files changed, 39 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:08:59.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 +0300 @@ -1237,6 +1237,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different + from other slaves (%d), can not enslave it.\n, + slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 6/10] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 87 +++- drivers/net/bonding/bonding.h |1 2 files changed, 60 insertions(+), 28 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:13.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:41.971632881 +0300 @@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* when bonding does not set the slave MAC address, the bond MAC +* address is the one of the active slave. +*/ + if (new_active !bond-do_set_mac_addr) + memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, + new_active-dev-addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond } if (slave_dev-set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - : %s: Error: The slave device you specified does - not support setting the MAC address. - Your kernel likely does not support slave - devices.\n, bond_dev-name); - res = -EOPNOTSUPP; - goto err_undo_flags; + if (bond-slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + : %s: Warning: The first slave device you + specified does not support setting the MAC + address. This bond MAC address would be that + of the active slave.\n, bond_dev-name); + bond-do_set_mac_addr = 0; + } else if (bond-do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + : %s: Error: The slave device you specified + does not support setting the MAC addres,. + but this bond uses this practice. \n + , bond_dev-name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } } new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL); @@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN); - /* -* Set slave to master's mac address. The application already -* set the master's mac address to that of the first slave -*/ - memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); - addr.sa_family = slave_dev-type; - res = dev_set_mac_address(slave_dev, addr); - if (res) { - dprintk(Error %d calling set_mac_address\n, res); - goto err_free; + if (bond-do_set_mac_addr) { + /* +* Set slave to master's mac address. The application already +* set the master's mac address to that of the first slave +*/ + memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); + addr.sa_family = slave_dev-type; + res = dev_set_mac_address(slave_dev, addr); + if (res) { + dprintk(Error %d calling set_mac_address\n, res); + goto err_free; + } } res = netdev_set_master(slave_dev, bond_dev); @@ -1612,9 +1631,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev-type; - dev_set_mac_address(slave_dev, addr); + if (bond-do_set_mac_addr) { + memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev-type; + dev_set_mac_address(slave_dev, addr); + } err_free: kfree(new_slave); @@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original (permanent) mac address */ - memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN); - addr.sa_family
[PATCH V4 7/10] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |5 +++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 5 insertions(+), 6 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:54:41.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.431862446 +0300 @@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, + bond_dev-name); } /* already enslaved */ Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:08:58.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 10:55:48.432862269 +0300 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 8/10] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |3 ++- drivers/net/bonding/bond_sysfs.c | 19 +-- drivers/net/bonding/bonding.h|1 + 3 files changed, 16 insertions(+), 7 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:29:11.911298577 +0300 @@ -1224,7 +1224,8 @@ static int bond_compute_features(struct struct slave *slave; struct net_device *bond_dev = bond-dev; unsigned long features = bond_dev-features; - unsigned short max_hard_header_len = ETH_HLEN; + unsigned short max_hard_header_len = max((u16)ETH_HLEN, + bond_dev-hard_header_len); int i; features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES); Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-08-15 10:55:48.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 12:14:41.152469089 +0300 @@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc printk(KERN_INFO DRV_NAME : %s is being deleted...\n, bond-dev-name); - bond_deinit(bond-dev); - bond_destroy_sysfs_entry(bond); - unregister_netdevice(bond-dev); + bond_destroy(bond); rtnl_unlock(); goto out; } @@ -260,6 +258,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +324,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +339,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu = original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,13 +354,17 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { printk(KERN_INFO DRV_NAME : %s: Removing slave %s\n, bond-dev-name, dev-name); rtnl_lock(); - res = bond_release(bond-dev, dev); + if (bond-setup_by_slave) + res = bond_release_and_destroy(bond-dev, dev); + else + res = bond_release(bond-dev, dev); rtnl_unlock(); if (res) { ret = res; @@ -365,9 +372,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:55:34.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-20 14:29:11.912298402 +0300 @@ -156,6 +156,7 @@ struct slave { s8 link;/* one
PATCH V4 9/10] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 24 +--- drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 11:04:37.221123652 +0300 @@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) { + dprintk(delaying gratuitous arp on %s\n, + bond-curr_active_slave-dev-name); + bond-send_grat_arp = 1; + } else + bond_send_gratuitous_arp(bond); } } @@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, + bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n); + else { + dprintk(sending delayed gratuitous arp on on %s\n, + bond-curr_active_slave-dev-name); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp = 0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(str if (bond-master_ip) { bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4293,6 +4310,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-15 10:56:33.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-15 11:05:41.516451497 +0300 @@ -187,6 +187,7 @@ struct bonding { struct timer_list arp_timer; s8 kill_timers; s8 do_set_mac_addr; + s8 send_grat_arp; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 10/10] net/bonding: Destroy bonding master when last slave is gone
When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 45 +++- drivers/net/bonding/bonding.h |3 ++ 2 files changed, 47 insertions(+), 1 deletion(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-20 14:43:17.123702132 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:43:17.850571535 +0300 @@ -1256,6 +1256,7 @@ static int bond_compute_features(struct static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { + struct bonding *bond = bond_dev-priv; bond_dev-hard_header = slave_dev-hard_header; bond_dev-rebuild_header= slave_dev-rebuild_header; bond_dev-hard_header_cache = slave_dev-hard_header_cache; @@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct n memcpy(bond_dev-broadcast, slave_dev-broadcast, slave_dev-addr_len); + bond-setup_by_slave = 1; } /* enslave device slave to bond device master */ @@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond } /* +* Destroy a bonding device. +* Must be under rtnl_lock when this function is called. +*/ +void bond_destroy(struct bonding *bond) +{ + bond_deinit(bond-dev); + bond_destroy_sysfs_entry(bond); + unregister_netdevice(bond-dev); +} + +/* +* First release a slave and than destroy the bond if no more slaves iare left. +* Must be under rtnl_lock when this function is called. +*/ +int bond_release_and_destroy(struct net_device *bond_dev, struct net_device *slave_dev) +{ + struct bonding *bond = bond_dev-priv; + int ret; + + ret = bond_release(bond_dev, slave_dev); + if ((ret == 0) (bond-slave_cnt == 0)) { + printk(KERN_INFO DRV_NAME %s: destroying bond for.\n, + bond_dev-name); + bond_destroy(bond); + } + return ret; +} + +/* * This function releases all slaves. */ static int bond_release_all(struct net_device *bond_dev) @@ -3322,7 +3353,11 @@ static int bond_slave_netdev_event(unsig switch (event) { case NETDEV_UNREGISTER: if (bond_dev) { - bond_release(bond_dev, slave_dev); + dprintk(slave %s unregisters\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); } break; case NETDEV_CHANGE: @@ -3331,6 +3366,13 @@ static int bond_slave_netdev_event(unsig * sets up a hierarchical bond, then rmmod's * one of the slave bonding devices? */ + if (slave_dev-priv_flags IFF_SLAVE_DETACH) { + dprintk(slave %s detaching\n, slave_dev-name); + if (bond-setup_by_slave) + bond_release_and_destroy(bond_dev, slave_dev); + else + bond_release(bond_dev, slave_dev); + } break; case NETDEV_DOWN: /* @@ -4311,6 +4353,7 @@ static int bond_init(struct net_device * bond-primary_slave = NULL; bond-dev = bond_dev; bond-send_grat_arp = 0; + bond-setup_by_slave = 0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-08-20 14:43:17.123702132 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-08-20 14:47:52.845180870 +0300 @@ -188,6 +188,7 @@ struct bonding { s8 kill_timers; s8 do_set_mac_addr; s8 send_grat_arp; + s8 setup_by_slave; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; @@ -295,6 +296,8 @@ static inline void bond_unset_master_alb struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr); int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct net_device *slave_dev); int bond_create(char *name, struct bond_params *params, struct bonding **newbond); +void bond_destroy
Re: [ofa-general] Re: Re: Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
It's always wrong to copy symbols from another module without referencing it. Michael, It seems like the preferred approach is to prevent ib_ipoib from being unloaded while bonding is on top it, right? It seems like it would handle all safety issues (not just neigh cleanup). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
Roland Dreier wrote: 1. When bonding enslaves an IPoIB device the bonding neighbor holds a reference to a cleanup function in the IPoIB drives. This makes it unsafe to unload the IPoIB module if there are bonding neighbors in the air. So, to avoid this race one must unload bonding before unloading IPoIB. I think we really want to resolve this somehow. Getting an oops by doing modprobe -r ipoib isn't that friendly. You are right and we want to resolve that. One way is to clean the neigh destructor function from all IPoIB neighs. The other way is to prevent ipoib unload if device is a slave or is referenced from somewhere else. I guess I would like an advice here. Also, what happened to the problem of having an address handle belonging to the wrong device on bond failover? Did you figure out a way to fix that one? This is what patch 2 handles. - R. ___ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 +++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |2 +- 3 files changed, 19 insertions(+), 4 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-07-25 14:56:13.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-07-25 14:57:48.095724495 +0300 @@ -328,6 +328,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 14:56:13.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 15:03:11.291291271 +0300 @@ -510,7 +510,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++priv-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -817,6 +817,16 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + if (n-dev-flags IFF_MASTER) { + /* n-dev is not an IPoIB device and we have to take priv from elsewhere */ + neigh = *to_ipoib_neigh(n); + if (neigh){ + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; + } ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -838,7 +848,9 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) + { struct ipoib_neigh *neigh; @@ -847,6 +859,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 14:56:13.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 14:57:48.097724142 +0300 @@ -727,7 +727,7 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (neigh) { kref_get(mcast-ah-ref); - To unsubscribe from this list: send the line
[PATCH V3 2/7] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 14:57:48.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 15:02:55.525131034 +0300 @@ -685,9 +685,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 38 ++ 1 files changed, 38 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:02:10.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:24:30.913343981 +0300 @@ -1277,6 +1277,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1351,6 +1371,24 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different from + other slaves (%d), can not enslave it.\n, slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 88 +++- drivers/net/bonding/bonding.h |1 2 files changed, 61 insertions(+), 28 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-29 16:24:30.913343981 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:36:53.234602471 +0300 @@ -1127,6 +1127,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* when bonding does not set the slave MAC address, the bond MAC +* address is the one of the active slave. +*/ + if (new_active !bond-do_set_mac_addr) + memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, + new_active-dev-addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1390,13 +1398,22 @@ int bond_enslave(struct net_device *bond } if (slave_dev-set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - : %s: Error: The slave device you specified does - not support setting the MAC address. - Your kernel likely does not support slave - devices.\n, bond_dev-name); - res = -EOPNOTSUPP; - goto err_undo_flags; + if (bond-slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + : %s: Warning: The first slave device you + specified does not support setting the MAC + address. This bond MAC address would be that + of the active slave.\n, bond_dev-name); + bond-do_set_mac_addr = 0; + } else if (bond-do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + : %s: Error: The slave device you specified + does not support setting the MAC addres,. + but this bond uses this practice. \n + , bond_dev-name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } } new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL); @@ -1417,16 +1434,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN); - /* -* Set slave to master's mac address. The application already -* set the master's mac address to that of the first slave -*/ - memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); - addr.sa_family = slave_dev-type; - res = dev_set_mac_address(slave_dev, addr); - if (res) { - dprintk(Error %d calling set_mac_address\n, res); - goto err_free; + if (bond-do_set_mac_addr) { + /* +* Set slave to master's mac address. The application already +* set the master's mac address to that of the first slave +*/ + memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); + addr.sa_family = slave_dev-type; + res = dev_set_mac_address(slave_dev, addr); + if (res) { + dprintk(Error %d calling set_mac_address\n, res); + goto err_free; + } } res = netdev_set_master(slave_dev, bond_dev); @@ -1651,9 +1670,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev-type; - dev_set_mac_address(slave_dev, addr); + if (bond-do_set_mac_addr) { + memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev-type; + dev_set_mac_address(slave_dev, addr); + } err_free: kfree(new_slave); @@ -1831,10 +1852,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original (permanent) mac address */ - memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN); - addr.sa_family
[PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |4 ++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 4 insertions(+), 6 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:04:50.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:06:17.175820632 +0300 @@ -1325,8 +1325,8 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, bond_dev-name); } /* already enslaved */ Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-07-25 14:18:12.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-07-25 15:06:17.176820452 +0300 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
This patch series is the third version (see below link to V2) of the suggested changes to the bonding driver so it would be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. The motivation is to enable the bonding driver on its HA mode to work with the IP over Infiniband (IPoIB) driver. With these patches I was able to enslave IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with fail-over and fail-back working fine. The working environment was the net-2.6 git. More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which is used by native IB ULPs whose addressing scheme is based on IP (e.g. iSER, SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA for these ULPs. This holds as when the ULP is informed by the IB HW on the failure of the current IB connection, it just need to reconnect, where the bonding device will now issue the IB ARP over the active IPoIB slave. This series also includes patches to the IPoIB driver that fix some fix some neighboring related issues. There are still 2 open issues here: 1. When bonding enslaves an IPoIB device the bonding neighbor holds a reference to a cleanup function in the IPoIB drives. This makes it unsafe to unload the IPoIB module if there are bonding neighbors in the air. So, to avoid this race one must unload bonding before unloading IPoIB. 2. Patch No. 7 is a workaround to a problem where gratuitous were not sent quite often. I didn't find something better that fixes this and I would appreciate advices and comments regarding it. However, this doesn't seem to me as an issue related exclusively to IPoIB. Links to earlier discussion: 1. A discussion in netdev about bonding support for IPoIB. http://lists.openwall.net/netdev/2006/11/30/46 2. A discussion in openfabrics regarding changes in the IPoIB that enable using it as a slave for bonding. http://lists.openfabrics.org/pipermail/general/2007-March/034033.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |2 +- drivers/net/bonding/bond_sysfs.c | 10 -- drivers/net/bonding/bonding.h|1 + 3 files changed, 10 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:06:17.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.012883360 +0300 @@ -1255,7 +1255,7 @@ static int bond_compute_features(struct unsigned long features = BOND_INTERSECT_FEATURES; struct slave *slave; struct net_device *bond_dev = bond-dev; - unsigned short max_hard_header_len = ETH_HLEN; + u16 max_hard_header_len = max((u16)ETH_HLEN, bond_dev-hard_header_len); int i; bond_for_each_slave(bond, slave, i) { Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-07-25 15:06:17.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-07-25 15:20:10.224527636 +0300 @@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu=original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { @@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:03:32.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.223527810 +0300 @@ -156,6 +156,7 @@ struct slave { s8 link;/* one of BOND_LINK_ */ s8 state; /* one of BOND_STATE_ */ u32original_flags; + u32original_mtu; u32link_failure_count; u16speed; u8 duplex; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 25 + drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 4 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:33:25.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-26 18:42:59.296296622 +0300 @@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state)){ + dprintk(delaying gratuitous arp on %s\n,bond-curr_active_slave-dev-name); + bond-send_grat_arp=1; + }else{ + bond_send_gratuitous_arp(bond); + } } } @@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n,__FUNCTION__); + else { + dprintk(sending delayed gratuitous arp on ond-curr_active_slave-dev-name\n); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp=0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str struct slave *slave = bond-curr_active_slave; struct vlan_entry *vlan; struct net_device *vlan_dev; + int i; dprintk(bond_send_grat_arp: bond %s slave %s\n, bond-dev-name, slave ? slave-dev-name : NULL); @@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str return; if (bond-master_ip) { - bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + for (i=0;i3;i++) + bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4331,6 +4347,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp=0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-07-26 18:42:43.652087660 +0300 @@ -203,6 +203,7 @@ struct bonding { struct vlan_group *vlgrp; struct packet_type arp_mon_pt; s8 do_set_mac_addr; + int send_grat_arp; }; /** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html