Re: [RFC][BNX2X] .h files rewrite
On Fri, 2007-11-02 at 16:35 -0700, Max Asbock wrote: I built the newest bnx2x code against the net-2.6 kernel and ran a number of stress tests with netperf and pktgen. I did not encounter any errors. Max Thanks, Eliezer - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23: TG3+VLAN: IPv6 router advertisments missed by kernel
The issue shows up reliably when starting the system though some (re)configuration operations on the network interface makes the issue disapear. One way to get the kernel to see the advertisments is to restart the interface with its vlans or (as below) keeping the interface in promiscuous mode. Regards, Bruno On Thursday 01 November 2007 21:45:42 you wrote: I'm seeing unexpected behavior on my laptop since I updated kernel to 2.6.23.1 from 2.6.22.1. My setup: Cisco Router --- [2 vlans] - Laptop On the link two VLANs are active, native vlan is not used. Laptop nic is: Tigon3 [partno(BCM95751m) rev 4201 PHY(5750)] (PCI Express) 10/100/1000Base-T Ethernet On laptop I have eth0.500 and eth0.658 as active interfaces (eth0 is just up - no address manually assigned) with IPv4 address assigned. IPv6 is only enabled on the router for one of both vlans (500). When booting with 2.6.23.1 the router advertisments coming from the router (vlan 500) seem to get ignored by the kernel (they are detected by 2.6.22) and only enabling promiscuous mode on eth0 makes the kernel detect the router advertisments. (I'm doing tcpdump icmp6 on the vlan interface) This looks like it could be caused by changes in regard to handling vlans with Tigon3 nic. A different machine (other nic and no vlans) sees the router advertisments correctly with 2.6.23.1. (So I don't expect the cause to be on IPv6 side) Bruno Probably relevant .config extract for 2.6.23.1: CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_ASK_IP_FIB_HASH=y CONFIG_IP_FIB_HASH=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_NET_IPGRE=m CONFIG_SYN_COOKIES=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG=cubic CONFIG_IPV6=y CONFIG_INET6_TUNNEL=m CONFIG_IPV6_TUNNEL=m CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_IPV6_SUBTREES=y CONFIG_NETFILTER=y CONFIG_NETDEVICES=y CONFIG_NETDEVICES_MULTIQUEUE=y # CONFIG_MACVLAN is not set CONFIG_TUN=m CONFIG_PHYLIB=m CONFIG_BROADCOM_PHY=m CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_NET_PCI=y CONFIG_B44=m CONFIG_NETDEV_1000=y CONFIG_TIGON3=m Same extract for 2.6.22.1: CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_ASK_IP_FIB_HASH=y CONFIG_IP_FIB_HASH=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_NET_IPGRE=m CONFIG_SYN_COOKIES=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG=cubic CONFIG_IPV6=y CONFIG_INET6_TUNNEL=m CONFIG_IPV6_TUNNEL=m CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_IPV6_SUBTREES=y CONFIG_NETFILTER=y CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_TUN=m CONFIG_PHYLIB=m CONFIG_BROADCOM_PHY=m CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_NET_PCI=y CONFIG_B44=m CONFIG_NETDEV_1000=y CONFIG_TIGON3=m - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
David Miller a écrit : From: Andi Kleen [EMAIL PROTECTED] Date: Sun, 4 Nov 2007 00:18:14 +0100 On Thursday 01 November 2007 11:16:20 Eric Dumazet wrote: Some quick comments: +#if defined(CONFIG_SMP) || defined(CONFIG_PROVE_LOCKING) +/* + * Instead of using one rwlock for each inet_ehash_bucket, we use a table of locks + * The size of this table is a power of two and depends on the number of CPUS. + */ This shouldn't be hard coded based on NR_CPUS, but be done on runtime based on num_possible_cpus(). This is better for kernels with a large NR_CPUS, but which typically run on much smaller systems (like distribution kernels) I think this is a good idea. Eric, could you make this change? Yes of course, since using a non constant value for masking is cheap. But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so num_possible_cpus() will be NR_CPUS. Also the EHASH_LOCK_SZ == 0 special case is a little strange. Why did you add that? He explained this in another reply, because ifdefs are ugly. This will vanish if done on runtime anyway. And as a unrelated node have you tried converting the rwlocks into normal spinlocks? spinlocks should be somewhat cheaper because they have less cache protocol overhead and with the huge thash tables in Linux the chain walks should be short anyways so not doing this in parallel is probably not a big issue. At some point I also had a crazy idea of using a special locking scheme that special cases the common case that a hash chain has only one member and doesn't take a look for that at all. I agree. There was movement at one point to get rid of all rwlock's in the kernel, I personally think they are pointless. Any use that makes sense is a case where the code should be rewritten to decrease the lock hold time or convert to RCU. I agree too, rwlocks are more expensive when contention is low, so let do this rwlock-spinlock change on next step (separate patch), because it means changing also lhash_lock. Thanks to Jarek, I added locks cleanup in dccp_fini() [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table As done two years ago on IP route cache table (commit 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Acked-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] include/net/inet_hashtables.h | 71 +--- net/dccp/proto.c |9 +++- net/ipv4/inet_diag.c |9 ++-- net/ipv4/inet_hashtables.c|7 +-- net/ipv4/inet_timewait_sock.c | 13 +++-- net/ipv4/tcp.c|4 - net/ipv4/tcp_ipv4.c | 11 ++-- net/ipv6/inet6_hashtables.c | 19 8 files changed, 106 insertions(+), 37 deletions(-) diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 4427dcd..8461cda 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -37,7 +37,6 @@ * I'll experiment with dynamic table growth later. */ struct inet_ehash_bucket { - rwlock_t lock; struct hlist_head chain; struct hlist_head twchain; }; @@ -100,6 +99,9 @@ struct inet_hashinfo { * TIME_WAIT sockets use a separate chain (twchain). */ struct inet_ehash_bucket*ehash; + rwlock_t*ehash_locks; + unsigned intehash_size; + unsigned intehash_locks_mask; /* Ok, let's try this, I give up, we do need a local binding * TCP hash as well as the others for fast bind/connect. @@ -107,7 +109,7 @@ struct inet_hashinfo { struct inet_bind_hashbucket *bhash; unsigned intbhash_size; - unsigned intehash_size; + /* Note : 4 bytes padding on 64 bit arches */ /* All sockets in TCP_LISTEN state will be in here. This is the only * table where wildcard'd TCP sockets can exist. Hash function here @@ -134,6 +136,62 @@ static inline struct inet_ehash_bucket *inet_ehash_bucket( return hashinfo-ehash[hash (hashinfo-ehash_size - 1)]; } +static inline rwlock_t
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so num_possible_cpus() will be NR_CPUS. Nope, on x86 num_possible_cpus() is derived from BIOS tables these days. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
Andi Kleen a écrit : But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so num_possible_cpus() will be NR_CPUS. Nope, on x86 num_possible_cpus() is derived from BIOS tables these days. Good to know, thank you Andi for this clarification. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
problems with ib-bonding of 2.6.24-rc1
Hi, I've been doing some tests for bonding of 2.6.24-rc1 and noticed some problems. My first goal was to see how bonding works with IPoIB slaves but I also tried it with Ethernet. Basically, what I see is that after a while commands like ifconfig or ip stucks. I only use sysfs to configure bonding (which also stucks after a while). After stripping the list of commits below from the code I see no problems. Does anybody else have the same problem? thanks MoniS commit d0e81b7e2246a41d068ecaf15aac9de570816d63 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:51 2007 -0700 bonding: Acquire correct locks in alb for promisc change -- commit 6603a6f25e4bca922a7dfbf0bf03072d98850176 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:50 2007 -0700 bonding: Convert more locks to _bh, acquire rtnl, for new locking -- commit 059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:49 2007 -0700 bonding: Convert locks to _bh, rework alb locking for new locking -- commit 0b0eef66419e9abe6fd62bc958ab7cd0a18f858e Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:48 2007 -0700 bonding: Convert miimon to new locking -- commit cf5f9044934658dd3ffc628a60cd37c70f8168b1 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:47 2007 -0700 bonding: Convert balance-rr transmit to new locking -- commit 1b76b31693d4a6088dec104ff6a6ead54081a3c2 Author: Jay Vosburgh [EMAIL PROTECTED] Date: Wed Oct 17 17:37:45 2007 -0700 Convert bonding timers to workqueues -- commit 3a4fa0a25da81600ea0bcd75692ae8ca6050d165 Author: Robert P. J. Day [EMAIL PROTECTED] Date: Fri Oct 19 23:10:43 2007 +0200 Fix misspellings of system, controller, interrupt and necessary. -- commit 1c3f0b8e07de78a86f2dce911f5e245845ce40a8 Author: Mathieu Desnoyers [EMAIL PROTECTED] Date: Thu Oct 18 23:41:04 2007 -0700 Change struct marker users - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NETLINK]: Fix unicast timeouts
[NETLINK]: Fix unicast timeouts Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Cc: Manfred Spraul [EMAIL PROTECTED] Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 251299cd3683f06b5b690e6a3bdd14133303ab2a tree 3fd85bdae19d5f29efe09c328fa2defac9facd6b parent b4f555081fdd27d13e6ff39d455d5aefae9d2c0c author Patrick McHardy [EMAIL PROTECTED] Sun, 04 Nov 2007 17:52:19 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sun, 04 Nov 2007 17:52:19 +0100 include/linux/netlink.h |2 +- ipc/mqueue.c |6 -- net/netlink/af_netlink.c | 10 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 7c1f3b1..d5bfaba 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -192,7 +192,7 @@ extern int netlink_unregister_notifier(struct notifier_block *nb); /* finegrained unicast helpers: */ struct sock *netlink_getsockbyfilp(struct file *filp); int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, - long timeo, struct sock *ssk); + long *timeo, struct sock *ssk); void netlink_detachskb(struct sock *sk, struct sk_buff *skb); int netlink_sendskb(struct sock *sk, struct sk_buff *skb); diff --git a/ipc/mqueue.c b/ipc/mqueue.c index bfa274b..1e04cd4 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -1010,6 +1010,8 @@ asmlinkage long sys_mq_notify(mqd_t mqdes, return -EINVAL; } if (notification.sigev_notify == SIGEV_THREAD) { + long timeo; + /* create the notify skb */ nc = alloc_skb(NOTIFY_COOKIE_LEN, GFP_KERNEL); ret = -ENOMEM; @@ -1038,8 +1040,8 @@ retry: goto out; } - ret = netlink_attachskb(sock, nc, 0, - MAX_SCHEDULE_TIMEOUT, NULL); + timeo = MAX_SCHEDULE_TIMEOUT; + ret = netlink_attachskb(sock, nc, 0, timeo, NULL); if (ret == 1) goto retry; if (ret) { diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2601712..415c972 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -752,7 +752,7 @@ struct sock *netlink_getsockbyfilp(struct file *filp) * 1: repeat lookup - reference dropped while waiting for socket memory. */ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, - long timeo, struct sock *ssk) + long *timeo, struct sock *ssk) { struct netlink_sock *nlk; @@ -761,7 +761,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, if (atomic_read(sk-sk_rmem_alloc) sk-sk_rcvbuf || test_bit(0, nlk-state)) { DECLARE_WAITQUEUE(wait, current); - if (!timeo) { + if (!*timeo) { if (!ssk || netlink_is_kernel(ssk)) netlink_overrun(sk); sock_put(sk); @@ -775,7 +775,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, if ((atomic_read(sk-sk_rmem_alloc) sk-sk_rcvbuf || test_bit(0, nlk-state)) !sock_flag(sk, SOCK_DEAD)) - timeo = schedule_timeout(timeo); + *timeo = schedule_timeout(*timeo); __set_current_state(TASK_RUNNING); remove_wait_queue(nlk-wait, wait); @@ -783,7 +783,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, if (signal_pending(current)) { kfree_skb(skb); - return sock_intr_errno(timeo); + return sock_intr_errno(*timeo); } return 1; } @@ -877,7 +877,7 @@ retry: if (netlink_is_kernel(sk)) return netlink_unicast_kernel(sk, skb); - err = netlink_attachskb(sk, skb, nonblock, timeo, ssk); + err = netlink_attachskb(sk, skb, nonblock, timeo, ssk); if (err == 1) goto retry; if (err)
Re: [PATCH] net: Add 405EX support to new EMAC driver
On Sun, Nov 04, 2007 at 02:37:59PM +1100, Benjamin Herrenschmidt wrote: On Fri, 2007-11-02 at 11:03 -0500, Olof Johansson wrote: On Fri, Nov 02, 2007 at 08:14:43AM +0100, Stefan Roese wrote: This patch adds support for the 405EX to the new EMAC driver. Some as on AXON, the 405EX handles the MDIO via the RGMII bridge. Hi, This isn't feedback on your patch as much as on new-emac in general: Isn't this the case where there should really be device tree properties instead? If you had an ibm,emac-has-axon-stacr property in the device node, then you don't have to modify the driver for every new board out there. Same for the other device properties, of course. I thought this was what having the device tree was all about. :( Somewhat yeah. There are subtle variations here or there we haven't totally indenfified... It might be a better option in our case here to add has-mdio to the rgmii nodes indeed. Part of the problem with those cells is that the chip folks keep changing things subtly from one rev to another though, it's not even totally clear to me yet whether the RGMII registers are totally compatible betwee axon and 405ex, which is why I've pretty much stuck to compatible properties to identify the variants. The device-tree can do both. It's still better than no device-tree since at least you know what cell variant is in there. Well, it's better than compile-time ifdefs. Providing what version of the device you have CAN be done without a device tree too. :-) As for the STACR, Axon isn't the first one to have that bit flipped, I think we should name the property differently, something like stacr-oc-inverted. Sure, it was the habit of having to modify the driver for platforms that don't add any new features I was against. I don't really care what the properties are called :-) We can still use properties that way for new things in fact. As for EMAC on cell, well, I can always put some fixup somewhere. Sounds good (with s/can still/should/). -Olof - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP_DEFER_ACCEPT issues
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year: http://www.mail-archive.com/netdev@vger.kernel.org/msg28916.html it's possible the final message in that thread is how we should define the behaviour, i haven't tried the TCP_SYNCNT idea though. -dean - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
Eric Dumazet wrote, On 11/04/2007 12:31 PM: David Miller a écrit : From: Andi Kleen [EMAIL PROTECTED] Date: Sun, 4 Nov 2007 00:18:14 +0100 On Thursday 01 November 2007 11:16:20 Eric Dumazet wrote: ... Also the EHASH_LOCK_SZ == 0 special case is a little strange. Why did you add that? He explained this in another reply, because ifdefs are ugly. But I hope he was only joking, didn't he? Let's make it clear: ifdefs are in KR, so they are very nice! Just like all C! (K, , and R as well.) You know, I can even imagine, there are people, who have KR around their beds, instead of some other book, so they could be serious about such things. (But, don't worry, it's not me - happily I'm not serious!) This patch looks OK now, but a bit of grumbling shouldn't harm?: ... [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table As done two years ago on IP route cache table (commit 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle - litle + little ... +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) +{ + unsigned int i, size = 256; +#if defined(CONFIG_PROVE_LOCKING) + unsigned int nr_pcpus = 2; +#else + unsigned int nr_pcpus = num_possible_cpus(); +#endif + if (nr_pcpus = 4) + size = 512; + if (nr_pcpus = 8) + size = 1024; + if (nr_pcpus = 16) + size = 2048; + if (nr_pcpus = 32) + size = 4096; It seems, maybe in the future this could look a bit nicer with some log type shifting. + if (sizeof(rwlock_t) != 0) { +#ifdef CONFIG_NUMA + if (size * sizeof(rwlock_t) PAGE_SIZE) + hashinfo-ehash_locks = vmalloc(size * sizeof(rwlock_t)); + else +#endif + hashinfo-ehash_locks = kmalloc(size * sizeof(rwlock_t), + GFP_KERNEL); + if (!hashinfo-ehash_locks) + return ENOMEM; Probably doesn't matter now, but maybe more common?: return -ENOMEM; + for (i = 0; i size; i++) + rwlock_init(hashinfo-ehash_locks[i]); This looks better now, but still is doubtful to me: even if it's safe with current rwlock implementation, can't we imagine some new debugging or statistical code added, which would be called from rwlock_init() without using rwlock_t structure? IMHO, if read_lock() etc. are called in such a case, rwlock_init() should be done as well. Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
Jarek Poplawski wrote, On 11/04/2007 06:58 PM: Eric Dumazet wrote, On 11/04/2007 12:31 PM: ... +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) +{ ... +if (sizeof(rwlock_t) != 0) { ... +for (i = 0; i size; i++) +rwlock_init(hashinfo-ehash_locks[i]); This looks better now, but still is doubtful to me: even if it's safe with current rwlock implementation, can't we imagine some new debugging or statistical code added, which would be called from rwlock_init() without using rwlock_t structure? IMHO, if read_lock() etc. are called in such a case, rwlock_init() should be done as well. Of course I mean: if sizeof(rwlock_t) == 0. Jarek P - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
Jarek Poplawski a écrit : Jarek Poplawski wrote, On 11/04/2007 06:58 PM: Eric Dumazet wrote, On 11/04/2007 12:31 PM: ... +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) +{ ... + if (sizeof(rwlock_t) != 0) { ... + for (i = 0; i size; i++) + rwlock_init(hashinfo-ehash_locks[i]); This looks better now, but still is doubtful to me: even if it's safe with current rwlock implementation, can't we imagine some new debugging or statistical code added, which would be called from rwlock_init() without using rwlock_t structure? IMHO, if read_lock() etc. are called in such a case, rwlock_init() should be done as well. Of course I mean: if sizeof(rwlock_t) == 0. Given those two choices : #if defined(CONFIG_SMP) || defined(CONFIG_PROVE__LOCKING) kmalloc(sizeof(rwlock_t) * size); #endif and if (sizeof(rwlock_t) != 0) { kmalloc(sizeof(rwlock_t) * size); } I prefer the 2nd one. Less error prone, and no need to remember how are spelled the gazillions CONFIG_something we have. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
From: Andi Kleen [EMAIL PROTECTED] Date: Sun, 4 Nov 2007 13:26:38 +0100 But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so num_possible_cpus() will be NR_CPUS. Nope, on x86 num_possible_cpus() is derived from BIOS tables these days. And similarly on SPARC64 is will be set based upon the physical capabilities of the system. This makes a huge different as we have to set NR_CPUS to 4096 in order to handle the cpu numbering of some UltraSPARC-IV machines. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] phylib: Add ID for Marvell 88E1240
Add PHY IDs for Marvell 88E1240. It seems to have close enough programming models to /1112 for basic support at least. Also clean up whitespace in the ID list a bit. Signed-off-by: Olof Johansson [EMAIL PROTECTED] diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index d2ede5f..035fd41 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -265,7 +265,7 @@ static struct phy_driver marvell_drivers[] = { .read_status = genphy_read_status, .ack_interrupt = marvell_ack_interrupt, .config_intr = marvell_config_intr, - .driver = {.owner = THIS_MODULE,}, + .driver = { .owner = THIS_MODULE }, }, { .phy_id = 0x01410c90, @@ -278,7 +278,7 @@ static struct phy_driver marvell_drivers[] = { .read_status = genphy_read_status, .ack_interrupt = marvell_ack_interrupt, .config_intr = marvell_config_intr, - .driver = {.owner = THIS_MODULE,}, + .driver = { .owner = THIS_MODULE }, }, { .phy_id = 0x01410cc0, @@ -291,7 +291,7 @@ static struct phy_driver marvell_drivers[] = { .read_status = genphy_read_status, .ack_interrupt = marvell_ack_interrupt, .config_intr = marvell_config_intr, - .driver = {.owner = THIS_MODULE,}, + .driver = { .owner = THIS_MODULE }, }, { .phy_id = 0x01410cd0, @@ -304,8 +304,21 @@ static struct phy_driver marvell_drivers[] = { .read_status = genphy_read_status, .ack_interrupt = marvell_ack_interrupt, .config_intr = marvell_config_intr, - .driver = {.owner = THIS_MODULE,}, - } + .driver = { .owner = THIS_MODULE }, + }, + { + .phy_id = 0x01410e30, + .phy_id_mask = 0xfff0, + .name = Marvell 88E1240, + .features = PHY_GBIT_FEATURES, + .flags = PHY_HAS_INTERRUPT, + .config_init = m88e_config_init, + .config_aneg = marvell_config_aneg, + .read_status = genphy_read_status, + .ack_interrupt = marvell_ack_interrupt, + .config_intr = marvell_config_intr, + .driver = { .owner = THIS_MODULE }, + }, }; static int __init marvell_init(void) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] phylib: Silence driver registration
It gets quite verbose to see every single PHY driver being registered by default. Signed-off-by: Olof Johansson [EMAIL PROTECTED] diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index c046121..f6e4848 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -706,7 +706,7 @@ int phy_driver_register(struct phy_driver *new_driver) return retval; } - pr_info(%s: Registered new driver\n, new_driver-name); + pr_debug(%s: Registered new driver\n, new_driver-name); return 0; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
Eric Dumazet wrote, On 11/04/2007 10:23 PM: Jarek Poplawski a écrit : Jarek Poplawski wrote, On 11/04/2007 06:58 PM: Eric Dumazet wrote, On 11/04/2007 12:31 PM: ... +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) +{ ... + if (sizeof(rwlock_t) != 0) { ... + for (i = 0; i size; i++) + rwlock_init(hashinfo-ehash_locks[i]); This looks better now, but still is doubtful to me: even if it's safe with current rwlock implementation, can't we imagine some new debugging or statistical code added, which would be called from rwlock_init() without using rwlock_t structure? IMHO, if read_lock() etc. are called in such a case, rwlock_init() should be done as well. Of course I mean: if sizeof(rwlock_t) == 0. Given those two choices : #if defined(CONFIG_SMP) || defined(CONFIG_PROVE__LOCKING) kmalloc(sizeof(rwlock_t) * size); #endif and if (sizeof(rwlock_t) != 0) { kmalloc(sizeof(rwlock_t) * size); } I prefer the 2nd one. Less error prone, and no need to remember how are spelled the gazillions CONFIG_something we have. I've written it's better, too. But this could be improved yet (someday), I hope. Thanks, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
On Sunday 04 November 2007 22:56:21 David Miller wrote: From: Andi Kleen [EMAIL PROTECTED] This makes a huge different as we have to set NR_CPUS to 4096 in order to handle the cpu numbering of some UltraSPARC-IV machines. Really? Hopefully you have a large enough stack then. There are various users who put char str[NR_CPUS] on the stack and a few other data structures also get incredibly big with NR_CPUS arrays. If it's for sparse cpu ids -- x86 handles those with an translation array. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] postfix decrement error in smc911x_reset(); drivers/net/smc911x.c
If timeout causes the loop to end, a postfix decrement causes its value to become 4294967295 (unsigned int), not 0. Signed-off-by: Roel Kluin [EMAIL PROTECTED] --- diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c index dd18af0..41f3c8f 100644 --- a/drivers/net/smc911x.c +++ b/drivers/net/smc911x.c @@ -243,7 +243,7 @@ static void smc911x_reset(struct net_device *dev) do { udelay(10); reg = SMC_GET_PMT_CTRL() PMT_CTRL_READY_; - } while ( timeout-- !reg); + } while ( --timeout !reg); if (timeout == 0) { PRINTK(%s: smc911x_reset timeout waiting for PM restore\n, dev-name); return; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] postfix decrement error in smc911x_reset(); drivers/net/smc911x.c
And there was another one... -- If timeout causes the loop to end, a postfix decrement causes its value to become 4294967295 (unsigned int), not 0. Signed-off-by: Roel Kluin [EMAIL PROTECTED] --- diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c index dd18af0..fac1d2a 100644 --- a/drivers/net/smc911x.c +++ b/drivers/net/smc911x.c @@ -243,7 +243,7 @@ static void smc911x_reset(struct net_device *dev) do { udelay(10); reg = SMC_GET_PMT_CTRL() PMT_CTRL_READY_; - } while ( timeout-- !reg); + } while ( --timeout !reg); if (timeout == 0) { PRINTK(%s: smc911x_reset timeout waiting for PM restore\n, dev-name); return; @@ -267,7 +267,7 @@ static void smc911x_reset(struct net_device *dev) resets++; break; } - } while ( timeout-- (reg HW_CFG_SRST_)); + } while ( --timeout (reg HW_CFG_SRST_)); } if (timeout == 0) { PRINTK(%s: smc911x_reset timeout waiting for reset\n, dev-name); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Endianness problem with u32 classifier hash masks
On Sun, 2007-04-11 at 02:17 +0100, Jarek Poplawski wrote: So, even if not full ntohl(), some byte moving seems to be necessary here. I thinking you were close. I am afraid my brain is congested, even the esspresso didnt help my thinking. It could be done with just fshift on the slow path (config time) of one was to think hard;- I am not too happy with the extra conversion on the fast path, but how about the untested attached patch? cheers, jamal diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 9e98c6e..6dd569b 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -93,7 +93,7 @@ static __inline__ unsigned u32_hash_fold(u32 key, struct tc_u32_sel *sel, u8 fsh { unsigned h = (key sel-hmask)fshift; - return h; + return ntohl(h); } static int u32_classify(struct sk_buff *skb, struct tcf_proto *tp, struct tcf_result *res) @@ -615,7 +615,7 @@ static int u32_change(struct tcf_proto *tp, unsigned long base, u32 handle, n-handle = handle; { u8 i = 0; - u32 mask = s-hmask; + u32 mask = ntohl(s-hmask); if (mask) { while (!(mask 1)) { i++;
Re: [PATCH] postfix decrement error in smc911x_reset(); drivers/net/smc911x.c
Darn, another. Sorry for the noise; I also removed a whitespace in this one -- If timeout causes the loop to end, a postfix decrement causes its value to become 4294967295 (unsigned int), not 0. Signed-off-by: Roel Kluin [EMAIL PROTECTED] --- diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c index dd18af0..6a2d236 100644 --- a/drivers/net/smc911x.c +++ b/drivers/net/smc911x.c @@ -243,7 +243,7 @@ static void smc911x_reset(struct net_device *dev) do { udelay(10); reg = SMC_GET_PMT_CTRL() PMT_CTRL_READY_; - } while ( timeout-- !reg); + } while (--timeout !reg); if (timeout == 0) { PRINTK(%s: smc911x_reset timeout waiting for PM restore\n, dev-name); return; @@ -267,7 +267,7 @@ static void smc911x_reset(struct net_device *dev) resets++; break; } - } while ( timeout-- (reg HW_CFG_SRST_)); + } while (--timeout (reg HW_CFG_SRST_)); } if (timeout == 0) { PRINTK(%s: smc911x_reset timeout waiting for reset\n, dev-name); @@ -413,7 +413,7 @@ static inline void smc911x_drop_pkt(struct net_device *dev) do { udelay(10); reg = SMC_GET_RX_DP_CTRL() RX_DP_CTRL_FFWD_BUSY_; - } while ( timeout-- reg); + } while (--timeout reg); if (timeout == 0) { PRINTK(%s: timeout waiting for RX fast forward\n, dev-name); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/2] ipvs: avoid overcommit on the standby, take III
Two related patches from Rumen G. Bogdanovski to help prevent overcommit on the standby. On the last two attempts I have managed to send somewhat bogus patches. So I started from scratch. I tool the original patches, fixed up what scripts/checkpatch.pl didn't like, then compared the output to my previous attempt, which happily showed the bogus bits that I know about have been fixed. -- -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/2] ipvs: Syncrhonise Closing of Connections
From: Rumen G. Bogdanovski [EMAIL PROTECTED] This patch makes the master daemon to sync the connection when it is about to close. This makes the connections on the backup to close or timeout according their state. Before the sync was performed only if the connection is in ESTABLISHED state which always made the connections to timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch ([IPVS]: use proper timeout instead of fixed value) effectively did nothing more than increasing this to 15 minutes (Established state timeout). So this patch makes use of proper timeout since it syncs the connections on status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout). However if the backup misses CLOSE hopefully it did not miss FIN_WAIT. Otherwise we will just have to wait for the ESTABLISHED state timeout. As it is without this patch. This way the number of the hanging connections on the backup is kept to minimum. And very few of them will be left to timeout with a long timeout. This is important if we want to make use of the fix for the real server overcommit on master/backup fail-over. Regards, Rumen Bogdanovski Signed-off-by: Rumen G. Bogdanovski [EMAIL PROTECTED] Signed-off-by: Simon Horman [EMAIL PROTECTED] --- Thu, 01 Nov 2007 18:25:10 +0900, Horms * Redifed for net-2.6 * Ran through scripts/checkpatch.pl and fixed up everything that it complains about except the use of volatile, as its in keeping with other fields in the structure. If its wrong, lets fix them all together. WARNING: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt #49: FILE: include/net/ip_vs.h:523: + volatile __u16 old_state; /* old state, to be used for Index: net-2.6/include/net/ip_vs.h === --- net-2.6.orig/include/net/ip_vs.h2007-11-05 11:37:45.0 +0900 +++ net-2.6/include/net/ip_vs.h 2007-11-05 11:37:49.0 +0900 @@ -520,6 +520,10 @@ struct ip_vs_conn { spinlock_t lock; /* lock for state transition */ volatile __u16 flags; /* status flags */ volatile __u16 state; /* state info */ + volatile __u16 old_state; /* old state, to be used for +* state transition triggerd +* synchronization +*/ /* Control members */ struct ip_vs_conn *control; /* Master control connection */ Index: net-2.6/net/ipv4/ipvs/ip_vs_core.c === --- net-2.6.orig/net/ipv4/ipvs/ip_vs_core.c 2007-11-05 11:37:45.0 +0900 +++ net-2.6/net/ipv4/ipvs/ip_vs_core.c 2007-11-05 11:37:49.0 +0900 @@ -979,15 +979,23 @@ ip_vs_in(unsigned int hooknum, struct sk ret = NF_ACCEPT; } - /* increase its packet counter and check if it is needed - to be synchronized */ + /* Increase its packet counter and check if it is needed +* to be synchronized +* +* Sync connection if it is about to close to +* encorage the standby servers to update the connections timeout +*/ atomic_inc(cp-in_pkts); if ((ip_vs_sync_state IP_VS_STATE_MASTER) - (cp-protocol != IPPROTO_TCP || -cp-state == IP_VS_TCP_S_ESTABLISHED) - (atomic_read(cp-in_pkts) % sysctl_ip_vs_sync_threshold[1] -== sysctl_ip_vs_sync_threshold[0])) + (((cp-protocol != IPPROTO_TCP || + cp-state == IP_VS_TCP_S_ESTABLISHED) + (atomic_read(cp-in_pkts) % sysctl_ip_vs_sync_threshold[1] + == sysctl_ip_vs_sync_threshold[0])) || +((cp-protocol == IPPROTO_TCP) (cp-old_state != cp-state) + ((cp-state == IP_VS_TCP_S_FIN_WAIT) || + (cp-state == IP_VS_TCP_S_CLOSE) ip_vs_sync_conn(cp); + cp-old_state = cp-state; ip_vs_conn_put(cp); return ret; Index: net-2.6/net/ipv4/ipvs/ip_vs_sync.c === --- net-2.6.orig/net/ipv4/ipvs/ip_vs_sync.c 2007-11-05 11:37:45.0 +0900 +++ net-2.6/net/ipv4/ipvs/ip_vs_sync.c 2007-11-05 11:37:49.0 +0900 @@ -344,7 +344,6 @@ static void ip_vs_process_message(const if (!dest) { /* it is an unbound entry created by * synchronization */ - cp-state = ntohs(s-state); cp-flags = flags | IP_VS_CONN_F_HASHED; } else atomic_dec(dest-refcnt); @@ -359,6 +358,7 @@ static void ip_vs_process_message(const p += SIMPLE_CONN_SIZE;
[patch 1/2] ipvs: Bind connections on stanby if the destination exists
From: Rumen G. Bogdanovski [EMAIL PROTECTED] This patch fixes the problem with node overload on director fail-over. Given the scenario: 2 nodes each accepting 3 connections at a time and 2 directors, director failover occurs when the nodes are fully loaded (6 connections to the cluster) in this case the new director will assign another 6 connections to the cluster, If the same real servers exist there. The problem turned to be in not binding the inherited connections to the real servers (destinations) on the backup director. Therefore: ipvsadm -l reports 0 connections: [EMAIL PROTECTED]:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags - RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc - node473.local:5999 Route 1000 0 0 - node484.local:5999 Route 1000 0 0 while ipvs -lnc is right [EMAIL PROTECTED]:~# ipvsadm -lnc IPVS connection entries pro expire state source virtualdestination TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999 192.168.0.51:5999 TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999 192.168.0.52:5999 So the patch I am sending fixes the problem by binding the received connections to the appropriate service on the backup director, if it exists, else the connection will be handled the old way. So if the master and the backup directors are synchronized in terms of real services there will be no problem with server over-committing since new connections will not be created on the nonexistent real services on the backup. However if the service is created later on the backup, the binding will be performed when the next connection update is received. With this patch the inherited connections will show as inactive on the backup: [EMAIL PROTECTED]:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags - RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc - node473.local:5999 Route 1000 0 1 - node484.local:5999 Route 1000 0 1 [EMAIL PROTECTED]:~$ cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags - RemoteAddress:Port Forward Weight ActiveConn InActConn TCP C0A800DE:176F wlc - C0A80033:176F Route 1000 0 1 - C0A80032:176F Route 1000 0 1 Regards, Rumen Bogdanovski Acked-by: Julian Anastasov [EMAIL PROTECTED] Signed-off-by: Rumen G. Bogdanovski [EMAIL PROTECTED] Signed-off-by: Simon Horman [EMAIL PROTECTED] --- Mon, 05 Nov 2007 11:33:33 +0900 * Various whitespace and indentation changes * Rediffed against net-2.6 * Ran against ./scripts/checkpatch.pl and fixed everything that it complained about Index: net-2.6/include/net/ip_vs.h === --- net-2.6.orig/include/net/ip_vs.h2007-11-05 11:23:58.0 +0900 +++ net-2.6/include/net/ip_vs.h 2007-11-05 11:25:51.0 +0900 @@ -901,6 +901,10 @@ extern int ip_vs_use_count_inc(void); extern void ip_vs_use_count_dec(void); extern int ip_vs_control_init(void); extern void ip_vs_control_cleanup(void); +extern struct ip_vs_dest * +ip_vs_find_dest(__be32 daddr, __be16 dport, +__be32 vaddr, __be16 vport, __u16 protocol); +extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp); /* Index: net-2.6/net/ipv4/ipvs/ip_vs_conn.c === --- net-2.6.orig/net/ipv4/ipvs/ip_vs_conn.c 2007-11-05 11:23:58.0 +0900 +++ net-2.6/net/ipv4/ipvs/ip_vs_conn.c 2007-11-05 11:25:51.0 +0900 @@ -426,6 +426,25 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, s /* + * Check if there is a destination for the connection, if so + * bind the connection to the destination. + */ +struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp) +{ + struct ip_vs_dest *dest; + + if ((cp) (!cp-dest)) { + dest = ip_vs_find_dest(cp-daddr, cp-dport, + cp-vaddr, cp-vport, cp-protocol); + ip_vs_bind_dest(cp, dest); + return dest; + } else + return NULL; +} +EXPORT_SYMBOL(ip_vs_try_bind_dest); + + +/* * Unbind a connection entry with its VS destination * Called by the ip_vs_conn_expire function. */ Index: net-2.6/net/ipv4/ipvs/ip_vs_ctl.c === --- net-2.6.orig/net/ipv4/ipvs/ip_vs_ctl.c 2007-11-05 11:23:58.0 +0900 +++ net-2.6/net/ipv4/ipvs/ip_vs_ctl.c 2007-11-05 11:25:51.0 +0900 @@ -579,6 +579,32 @@ ip_vs_lookup_dest(struct ip_vs_service * return NULL; } +/* + * Find destination by {daddr,dport,vaddr,protocol} + * Cretaed to be used in ip_vs_process_message() in + * the backup
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
From: Andi Kleen [EMAIL PROTECTED] Date: Mon, 5 Nov 2007 00:01:03 +0100 On Sunday 04 November 2007 22:56:21 David Miller wrote: From: Andi Kleen [EMAIL PROTECTED] This makes a huge different as we have to set NR_CPUS to 4096 in order to handle the cpu numbering of some UltraSPARC-IV machines. Really? Hopefully you have a large enough stack then. There are various users who put char str[NR_CPUS] on the stack and a few other data structures also get incredibly big with NR_CPUS arrays. For the stack case there is one debugging case, and that's for sprintf'ing cpusets. That could be easily eliminated. If it's for sparse cpu ids -- x86 handles those with an translation array. I would rather not do this, so much assembler code indexes straight into the per-cpu arrays. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
From: David Miller [EMAIL PROTECTED] Date: Sun, 04 Nov 2007 20:24:54 -0800 (PST) From: Andi Kleen [EMAIL PROTECTED] Date: Mon, 5 Nov 2007 00:01:03 +0100 If it's for sparse cpu ids -- x86 handles those with an translation array. I would rather not do this, so much assembler code indexes straight into the per-cpu arrays. Also, at current rates, I'll need to be able to support 4096 cpus for real not very long from now :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html