Re: [RFC][BNX2X] .h files rewrite

2007-11-04 Thread Eliezer Tamir
On Fri, 2007-11-02 at 16:35 -0700, Max Asbock wrote:

 I built the newest bnx2x code against the net-2.6 kernel and ran a
 number of stress tests with netperf and pktgen. I did not encounter
 any
 errors.
 
 Max
 
 
Thanks,
Eliezer


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23: TG3+VLAN: IPv6 router advertisments missed by kernel

2007-11-04 Thread Bruno Prémont
The issue shows up reliably when starting the system though some 
(re)configuration operations on the network interface makes the issue 
disapear.
One way to get the kernel to see the advertisments is to restart the interface 
with its vlans or (as below) keeping the interface in promiscuous mode.

Regards,
Bruno

On Thursday 01 November 2007 21:45:42 you wrote:
 I'm seeing unexpected behavior on my laptop since I updated kernel to
 2.6.23.1 from 2.6.22.1.

 My setup:
   Cisco Router --- [2 vlans] - Laptop

 On the link two VLANs are active, native vlan is not used.
 Laptop nic is:
Tigon3 [partno(BCM95751m) rev 4201 PHY(5750)] (PCI Express)
10/100/1000Base-T Ethernet

 On laptop I have eth0.500 and eth0.658 as active interfaces (eth0 is just
 up - no address manually assigned) with IPv4 address assigned. IPv6 is only
 enabled on the router for one of both vlans (500).

 When booting with 2.6.23.1 the router advertisments coming from the router
 (vlan 500) seem to get ignored by the kernel (they are detected by 2.6.22)
 and only enabling promiscuous mode on eth0 makes the kernel detect the
 router advertisments. (I'm doing tcpdump icmp6 on the vlan interface)

 This looks like it could be caused by changes in regard to handling vlans
 with Tigon3 nic.
 A different machine (other nic and no vlans) sees the router advertisments
 correctly with 2.6.23.1. (So I don't expect the cause to be on IPv6 side)

 Bruno



 Probably relevant .config extract for 2.6.23.1:
   CONFIG_PACKET=y
   CONFIG_UNIX=y
   CONFIG_INET=y
   CONFIG_IP_MULTICAST=y
   CONFIG_IP_ADVANCED_ROUTER=y
   CONFIG_ASK_IP_FIB_HASH=y
   CONFIG_IP_FIB_HASH=y
   CONFIG_IP_MULTIPLE_TABLES=y
   CONFIG_NET_IPGRE=m
   CONFIG_SYN_COOKIES=y
   CONFIG_INET_DIAG=y
   CONFIG_INET_TCP_DIAG=y
   CONFIG_TCP_CONG_CUBIC=y
   CONFIG_DEFAULT_TCP_CONG=cubic
   CONFIG_IPV6=y
   CONFIG_INET6_TUNNEL=m
   CONFIG_IPV6_TUNNEL=m
   CONFIG_IPV6_MULTIPLE_TABLES=y
   CONFIG_IPV6_SUBTREES=y
   CONFIG_NETFILTER=y

   CONFIG_NETDEVICES=y
   CONFIG_NETDEVICES_MULTIQUEUE=y
   # CONFIG_MACVLAN is not set
   CONFIG_TUN=m
   CONFIG_PHYLIB=m
   CONFIG_BROADCOM_PHY=m
   CONFIG_NET_ETHERNET=y
   CONFIG_MII=y
   CONFIG_NET_PCI=y
   CONFIG_B44=m
   CONFIG_NETDEV_1000=y
   CONFIG_TIGON3=m

 Same extract for 2.6.22.1:
   CONFIG_PACKET=y
   CONFIG_UNIX=y
   CONFIG_INET=y
   CONFIG_IP_MULTICAST=y
   CONFIG_IP_ADVANCED_ROUTER=y
   CONFIG_ASK_IP_FIB_HASH=y
   CONFIG_IP_FIB_HASH=y
   CONFIG_IP_MULTIPLE_TABLES=y
   CONFIG_NET_IPGRE=m
   CONFIG_SYN_COOKIES=y
   CONFIG_INET_DIAG=y
   CONFIG_INET_TCP_DIAG=y
   CONFIG_TCP_CONG_CUBIC=y
   CONFIG_DEFAULT_TCP_CONG=cubic
   CONFIG_IPV6=y
   CONFIG_INET6_TUNNEL=m
   CONFIG_IPV6_TUNNEL=m
   CONFIG_IPV6_MULTIPLE_TABLES=y
   CONFIG_IPV6_SUBTREES=y
   CONFIG_NETFILTER=y

   CONFIG_NETDEVICES=y
   CONFIG_DUMMY=m
   CONFIG_TUN=m
   CONFIG_PHYLIB=m
   CONFIG_BROADCOM_PHY=m
   CONFIG_NET_ETHERNET=y
   CONFIG_MII=y
   CONFIG_NET_PCI=y
   CONFIG_B44=m
   CONFIG_NETDEV_1000=y
   CONFIG_TIGON3=m


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Eric Dumazet

David Miller a écrit :

From: Andi Kleen [EMAIL PROTECTED]
Date: Sun, 4 Nov 2007 00:18:14 +0100


On Thursday 01 November 2007 11:16:20 Eric Dumazet wrote:

Some quick comments:


+#if defined(CONFIG_SMP) || defined(CONFIG_PROVE_LOCKING)
+/*
+ * Instead of using one rwlock for each inet_ehash_bucket, we use a table of 
locks
+ * The size of this table is a power of two and depends on the number of CPUS.
+ */

This shouldn't be hard coded based on NR_CPUS, but be done on runtime
based on num_possible_cpus(). This is better for kernels with a large
NR_CPUS, but which typically run on much smaller systems (like 
distribution kernels) 


I think this is a good idea.  Eric, could you make this change?


Yes of course, since using a non constant value for masking is cheap.

But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so 
num_possible_cpus() will be NR_CPUS.





Also the EHASH_LOCK_SZ == 0 special case is a little strange. Why did
you add that?


He explained this in another reply, because ifdefs are ugly.


This will vanish if done on runtime anyway.



And as a unrelated node have you tried converting the rwlocks 
into normal spinlocks? spinlocks should be somewhat cheaper

because they have less cache protocol overhead and with
the huge thash tables in Linux the chain walks should be short
anyways so not doing this in parallel is probably not a big issue.
At some point I also had a crazy idea of using a special locking
scheme that special cases the common case that a hash chain
has only one member and doesn't take a look for that at all. 


I agree.

There was movement at one point to get rid of all rwlock's in the
kernel, I personally think they are pointless.  Any use that makes
sense is a case where the code should be rewritten to decrease the
lock hold time or convert to RCU.



I agree too, rwlocks are more expensive when contention is low, so let do this 
rwlock-spinlock change on next step (separate patch), because it means 
changing also lhash_lock.


Thanks to Jarek, I added locks cleanup in dccp_fini()

[PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

As done two years ago on IP route cache table (commit 
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
hash bucket for the huge TCP/DCCP hash tables.


On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle 
performance differences. (we hit a different cache line for the rwlock, but 
then the bucket cache line have a better sharing factor among cpus, since we 
dirty it less often). For netstat or ss commands that want a full scan of hash 
table, we perform fewer memory accesses.


Using a 'small' table of hashed rwlocks should be more than enough to provide 
correct SMP concurrency between different buckets, without using too much 
memory. Sizing of this table depends on num_possible_cpus() and various CONFIG 
settings.


This patch provides some locking abstraction that may ease a future work using 
 a different model for TCP/DCCP table.


Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
Acked-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]

 include/net/inet_hashtables.h |   71 +---
 net/dccp/proto.c  |9 +++-
 net/ipv4/inet_diag.c  |9 ++--
 net/ipv4/inet_hashtables.c|7 +--
 net/ipv4/inet_timewait_sock.c |   13 +++--
 net/ipv4/tcp.c|4 -
 net/ipv4/tcp_ipv4.c   |   11 ++--
 net/ipv6/inet6_hashtables.c   |   19 
 8 files changed, 106 insertions(+), 37 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 4427dcd..8461cda 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -37,7 +37,6 @@
  * I'll experiment with dynamic table growth later.
  */
 struct inet_ehash_bucket {
-   rwlock_t  lock;
struct hlist_head chain;
struct hlist_head twchain;
 };
@@ -100,6 +99,9 @@ struct inet_hashinfo {
 * TIME_WAIT sockets use a separate chain (twchain).
 */
struct inet_ehash_bucket*ehash;
+   rwlock_t*ehash_locks;
+   unsigned intehash_size;
+   unsigned intehash_locks_mask;
 
/* Ok, let's try this, I give up, we do need a local binding
 * TCP hash as well as the others for fast bind/connect.
@@ -107,7 +109,7 @@ struct inet_hashinfo {
struct inet_bind_hashbucket *bhash;
 
unsigned intbhash_size;
-   unsigned intehash_size;
+   /* Note : 4 bytes padding on 64 bit arches */
 
/* All sockets in TCP_LISTEN state will be in here.  This is the only
 * table where wildcard'd TCP sockets can exist.  Hash function here
@@ -134,6 +136,62 @@ static inline struct inet_ehash_bucket *inet_ehash_bucket(
return hashinfo-ehash[hash  (hashinfo-ehash_size - 1)];
 }
 
+static inline rwlock_t 

Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Andi Kleen

 But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so 
 num_possible_cpus() will be NR_CPUS.

Nope, on x86 num_possible_cpus() is derived from BIOS tables these days.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Eric Dumazet

Andi Kleen a écrit :
But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so 
num_possible_cpus() will be NR_CPUS.


Nope, on x86 num_possible_cpus() is derived from BIOS tables these days.


Good to know, thank you Andi for this clarification.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problems with ib-bonding of 2.6.24-rc1

2007-11-04 Thread Moni Shoua
Hi,
I've been doing some tests for bonding of 2.6.24-rc1 and noticed some problems.
My first goal was to see how bonding works with IPoIB slaves but I also tried it
with Ethernet.

Basically, what I see is that after a while commands like ifconfig or ip stucks.
I only use sysfs  to configure bonding (which also stucks after a while).

After stripping the list of commits below from the code I see no problems.

Does anybody else have the same problem?

thanks
 MoniS



commit d0e81b7e2246a41d068ecaf15aac9de570816d63
Author: Jay Vosburgh [EMAIL PROTECTED]
Date:   Wed Oct 17 17:37:51 2007 -0700

bonding: Acquire correct locks in alb for promisc change

--
commit 6603a6f25e4bca922a7dfbf0bf03072d98850176
Author: Jay Vosburgh [EMAIL PROTECTED]
Date:   Wed Oct 17 17:37:50 2007 -0700

bonding: Convert more locks to _bh, acquire rtnl, for new locking

--
commit 059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0
Author: Jay Vosburgh [EMAIL PROTECTED]
Date:   Wed Oct 17 17:37:49 2007 -0700

bonding: Convert locks to _bh, rework alb locking for new locking

--
commit 0b0eef66419e9abe6fd62bc958ab7cd0a18f858e
Author: Jay Vosburgh [EMAIL PROTECTED]
Date:   Wed Oct 17 17:37:48 2007 -0700

bonding: Convert miimon to new locking

--
commit cf5f9044934658dd3ffc628a60cd37c70f8168b1
Author: Jay Vosburgh [EMAIL PROTECTED]
Date:   Wed Oct 17 17:37:47 2007 -0700

bonding: Convert balance-rr transmit to new locking

--
commit 1b76b31693d4a6088dec104ff6a6ead54081a3c2
Author: Jay Vosburgh [EMAIL PROTECTED]
Date:   Wed Oct 17 17:37:45 2007 -0700

Convert bonding timers to workqueues

--
commit 3a4fa0a25da81600ea0bcd75692ae8ca6050d165
Author: Robert P. J. Day [EMAIL PROTECTED]
Date:   Fri Oct 19 23:10:43 2007 +0200

Fix misspellings of system, controller, interrupt and necessary.

--
commit 1c3f0b8e07de78a86f2dce911f5e245845ce40a8
Author: Mathieu Desnoyers [EMAIL PROTECTED]
Date:   Thu Oct 18 23:41:04 2007 -0700

Change struct marker users


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NETLINK]: Fix unicast timeouts

2007-11-04 Thread Patrick McHardy
 [NETLINK]: Fix unicast timeouts

Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by
moving the schedule_timeout() call to a new function that doesn't propagate
the remaining timeout back to the caller. This means on each retry we start
with the full timeout again.

ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour
is retained.

Cc: Manfred Spraul [EMAIL PROTECTED]
Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 251299cd3683f06b5b690e6a3bdd14133303ab2a
tree 3fd85bdae19d5f29efe09c328fa2defac9facd6b
parent b4f555081fdd27d13e6ff39d455d5aefae9d2c0c
author Patrick McHardy [EMAIL PROTECTED] Sun, 04 Nov 2007 17:52:19 +0100
committer Patrick McHardy [EMAIL PROTECTED] Sun, 04 Nov 2007 17:52:19 +0100

 include/linux/netlink.h  |2 +-
 ipc/mqueue.c |6 --
 net/netlink/af_netlink.c |   10 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 7c1f3b1..d5bfaba 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -192,7 +192,7 @@ extern int netlink_unregister_notifier(struct 
notifier_block *nb);
 /* finegrained unicast helpers: */
 struct sock *netlink_getsockbyfilp(struct file *filp);
 int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock,
-   long timeo, struct sock *ssk);
+ long *timeo, struct sock *ssk);
 void netlink_detachskb(struct sock *sk, struct sk_buff *skb);
 int netlink_sendskb(struct sock *sk, struct sk_buff *skb);
 
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index bfa274b..1e04cd4 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1010,6 +1010,8 @@ asmlinkage long sys_mq_notify(mqd_t mqdes,
return -EINVAL;
}
if (notification.sigev_notify == SIGEV_THREAD) {
+   long timeo;
+
/* create the notify skb */
nc = alloc_skb(NOTIFY_COOKIE_LEN, GFP_KERNEL);
ret = -ENOMEM;
@@ -1038,8 +1040,8 @@ retry:
goto out;
}
 
-   ret = netlink_attachskb(sock, nc, 0,
-   MAX_SCHEDULE_TIMEOUT, NULL);
+   timeo = MAX_SCHEDULE_TIMEOUT;
+   ret = netlink_attachskb(sock, nc, 0, timeo, NULL);
if (ret == 1)
goto retry;
if (ret) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2601712..415c972 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -752,7 +752,7 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
  * 1: repeat lookup - reference dropped while waiting for socket memory.
  */
 int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock,
-   long timeo, struct sock *ssk)
+ long *timeo, struct sock *ssk)
 {
struct netlink_sock *nlk;
 
@@ -761,7 +761,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, 
int nonblock,
if (atomic_read(sk-sk_rmem_alloc)  sk-sk_rcvbuf ||
test_bit(0, nlk-state)) {
DECLARE_WAITQUEUE(wait, current);
-   if (!timeo) {
+   if (!*timeo) {
if (!ssk || netlink_is_kernel(ssk))
netlink_overrun(sk);
sock_put(sk);
@@ -775,7 +775,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, 
int nonblock,
if ((atomic_read(sk-sk_rmem_alloc)  sk-sk_rcvbuf ||
 test_bit(0, nlk-state)) 
!sock_flag(sk, SOCK_DEAD))
-   timeo = schedule_timeout(timeo);
+   *timeo = schedule_timeout(*timeo);
 
__set_current_state(TASK_RUNNING);
remove_wait_queue(nlk-wait, wait);
@@ -783,7 +783,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb, 
int nonblock,
 
if (signal_pending(current)) {
kfree_skb(skb);
-   return sock_intr_errno(timeo);
+   return sock_intr_errno(*timeo);
}
return 1;
}
@@ -877,7 +877,7 @@ retry:
if (netlink_is_kernel(sk))
return netlink_unicast_kernel(sk, skb);
 
-   err = netlink_attachskb(sk, skb, nonblock, timeo, ssk);
+   err = netlink_attachskb(sk, skb, nonblock, timeo, ssk);
if (err == 1)
goto retry;
if (err)


Re: [PATCH] net: Add 405EX support to new EMAC driver

2007-11-04 Thread Olof Johansson
On Sun, Nov 04, 2007 at 02:37:59PM +1100, Benjamin Herrenschmidt wrote:
 
 On Fri, 2007-11-02 at 11:03 -0500, Olof Johansson wrote:
  On Fri, Nov 02, 2007 at 08:14:43AM +0100, Stefan Roese wrote:
   This patch adds support for the 405EX to the new EMAC driver. Some as on
   AXON, the 405EX handles the MDIO via the RGMII bridge.
  
  Hi,
  
  This isn't feedback on your patch as much as on new-emac in general:
  
  Isn't this the case where there should really be device tree properties
  instead? If you had an ibm,emac-has-axon-stacr property in the device
  node, then you don't have to modify the driver for every new board out
  there. Same for the other device properties, of course.
  
  I thought this was what having the device tree was all about. :(
 
 Somewhat yeah. There are subtle variations here or there we haven't
 totally indenfified... It might be a better option in our case here to
 add has-mdio to the rgmii nodes indeed.
 
 Part of the problem with those cells is that the chip folks keep
 changing things subtly from one rev to another though, it's not even
 totally clear to me yet whether the RGMII registers are totally
 compatible betwee axon and 405ex, which is why I've pretty much stuck to
 compatible properties to identify the variants.
 
 The device-tree can do both. It's still better than no device-tree since
 at least you know what cell variant is in there.

Well, it's better than compile-time ifdefs. Providing what version of
the device you have CAN be done without a device tree too. :-)

 As for the STACR, Axon isn't the first one to have that bit flipped, I
 think we should name the property differently, something like
 stacr-oc-inverted.

Sure, it was the habit of having to modify the driver for platforms that
don't add any new features I was against. I don't really care what the
properties are called :-)

 We can still use properties that way for new things in fact. As for EMAC
 on cell, well, I can always put some fixup somewhere.

Sounds good (with s/can still/should/).


-Olof
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP_DEFER_ACCEPT issues

2007-11-04 Thread dean gaudet
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year:

http://www.mail-archive.com/netdev@vger.kernel.org/msg28916.html

it's possible the final message in that thread is how we should define the 
behaviour, i haven't tried the TCP_SYNCNT idea though.

-dean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Jarek Poplawski
Eric Dumazet wrote, On 11/04/2007 12:31 PM:

 David Miller a écrit :
 From: Andi Kleen [EMAIL PROTECTED]
 Date: Sun, 4 Nov 2007 00:18:14 +0100

 On Thursday 01 November 2007 11:16:20 Eric Dumazet wrote:

...

 Also the EHASH_LOCK_SZ == 0 special case is a little strange. Why did
 you add that?
 He explained this in another reply, because ifdefs are ugly.


But I hope he was only joking, didn't he?

Let's make it clear: ifdefs are in KR, so they are very nice! Just like
all C! (K, , and R as well.)

You know, I can even imagine, there are people, who have KR around their
beds, instead of some other book, so they could be serious about such 
things. (But, don't worry, it's not me - happily I'm not serious!)

This patch looks OK now, but a bit of grumbling shouldn't harm?:

...

 [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
 
 As done two years ago on IP route cache table (commit 
 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
 hash bucket for the huge TCP/DCCP hash tables.
 
 On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle

- litle
+ little

... 

 +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 +{
 + unsigned int i, size = 256;
 +#if defined(CONFIG_PROVE_LOCKING)
 + unsigned int nr_pcpus = 2;
 +#else
 + unsigned int nr_pcpus = num_possible_cpus();
 +#endif
 + if (nr_pcpus = 4)
 + size = 512;
 + if (nr_pcpus = 8)
 + size = 1024;
 + if (nr_pcpus = 16)
 + size = 2048;
 + if (nr_pcpus = 32)
 + size = 4096;


It seems, maybe in the future this could look a bit nicer with some log
type shifting.

 + if (sizeof(rwlock_t) != 0) {
 +#ifdef CONFIG_NUMA
 + if (size * sizeof(rwlock_t)  PAGE_SIZE)
 + hashinfo-ehash_locks = vmalloc(size * 
 sizeof(rwlock_t));
 + else
 +#endif
 + hashinfo-ehash_locks = kmalloc(size * sizeof(rwlock_t),
 + GFP_KERNEL);
 + if (!hashinfo-ehash_locks)
 + return ENOMEM;


Probably doesn't matter now, but maybe more common?:
return -ENOMEM;

 + for (i = 0; i  size; i++)
 + rwlock_init(hashinfo-ehash_locks[i]);


This looks better now, but still is doubtful to me: even if it's safe with
current rwlock implementation, can't we imagine some new debugging or
statistical code added, which would be called from rwlock_init() without
using rwlock_t structure? IMHO, if read_lock() etc. are called in such a
case, rwlock_init() should be done as well.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Jarek Poplawski
Jarek Poplawski wrote, On 11/04/2007 06:58 PM:

 Eric Dumazet wrote, On 11/04/2007 12:31 PM:

...

 +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 +{

...

 +if (sizeof(rwlock_t) != 0) {

...

 +for (i = 0; i  size; i++)
 +rwlock_init(hashinfo-ehash_locks[i]);
 
 
 This looks better now, but still is doubtful to me: even if it's safe with
 current rwlock implementation, can't we imagine some new debugging or
 statistical code added, which would be called from rwlock_init() without
 using rwlock_t structure? IMHO, if read_lock() etc. are called in such a
 case, rwlock_init() should be done as well.


Of course I mean: if sizeof(rwlock_t) == 0.

 
Jarek P
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Eric Dumazet

Jarek Poplawski a écrit :

Jarek Poplawski wrote, On 11/04/2007 06:58 PM:


Eric Dumazet wrote, On 11/04/2007 12:31 PM:


...


+static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
+{


...


+   if (sizeof(rwlock_t) != 0) {


...


+   for (i = 0; i  size; i++)
+   rwlock_init(hashinfo-ehash_locks[i]);


This looks better now, but still is doubtful to me: even if it's safe with
current rwlock implementation, can't we imagine some new debugging or
statistical code added, which would be called from rwlock_init() without
using rwlock_t structure? IMHO, if read_lock() etc. are called in such a
case, rwlock_init() should be done as well.



Of course I mean: if sizeof(rwlock_t) == 0.


Given those two choices :

#if defined(CONFIG_SMP) || defined(CONFIG_PROVE__LOCKING)
kmalloc(sizeof(rwlock_t) * size);
#endif

and

   if (sizeof(rwlock_t) != 0) {
   kmalloc(sizeof(rwlock_t) * size);
   }

I prefer the 2nd one. Less error prone, and no need to remember how are 
spelled the gazillions CONFIG_something we have.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: Sun, 4 Nov 2007 13:26:38 +0100

 
  But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so 
  num_possible_cpus() will be NR_CPUS.
 
 Nope, on x86 num_possible_cpus() is derived from BIOS tables these days.

And similarly on SPARC64 is will be set based upon the
physical capabilities of the system.

This makes a huge different as we have to set NR_CPUS to 4096
in order to handle the cpu numbering of some UltraSPARC-IV
machines.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] phylib: Add ID for Marvell 88E1240

2007-11-04 Thread Olof Johansson
Add PHY IDs for Marvell 88E1240. It seems to have close enough programming
models to /1112 for basic support at least.

Also clean up whitespace in the ID list a bit.


Signed-off-by: Olof Johansson [EMAIL PROTECTED]

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index d2ede5f..035fd41 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -265,7 +265,7 @@ static struct phy_driver marvell_drivers[] = {
.read_status = genphy_read_status,
.ack_interrupt = marvell_ack_interrupt,
.config_intr = marvell_config_intr,
-   .driver = {.owner = THIS_MODULE,},
+   .driver = { .owner = THIS_MODULE },
},
{
.phy_id = 0x01410c90,
@@ -278,7 +278,7 @@ static struct phy_driver marvell_drivers[] = {
.read_status = genphy_read_status,
.ack_interrupt = marvell_ack_interrupt,
.config_intr = marvell_config_intr,
-   .driver = {.owner = THIS_MODULE,},
+   .driver = { .owner = THIS_MODULE },
},
{
.phy_id = 0x01410cc0,
@@ -291,7 +291,7 @@ static struct phy_driver marvell_drivers[] = {
.read_status = genphy_read_status,
.ack_interrupt = marvell_ack_interrupt,
.config_intr = marvell_config_intr,
-   .driver = {.owner = THIS_MODULE,},
+   .driver = { .owner = THIS_MODULE },
},
{
.phy_id = 0x01410cd0,
@@ -304,8 +304,21 @@ static struct phy_driver marvell_drivers[] = {
.read_status = genphy_read_status,
.ack_interrupt = marvell_ack_interrupt,
.config_intr = marvell_config_intr,
-   .driver = {.owner = THIS_MODULE,},
-   }
+   .driver = { .owner = THIS_MODULE },
+   },
+   {
+   .phy_id = 0x01410e30,
+   .phy_id_mask = 0xfff0,
+   .name = Marvell 88E1240,
+   .features = PHY_GBIT_FEATURES,
+   .flags = PHY_HAS_INTERRUPT,
+   .config_init = m88e_config_init,
+   .config_aneg = marvell_config_aneg,
+   .read_status = genphy_read_status,
+   .ack_interrupt = marvell_ack_interrupt,
+   .config_intr = marvell_config_intr,
+   .driver = { .owner = THIS_MODULE },
+   },
 };
 
 static int __init marvell_init(void)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] phylib: Silence driver registration

2007-11-04 Thread Olof Johansson
It gets quite verbose to see every single PHY driver being registered
by default.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index c046121..f6e4848 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -706,7 +706,7 @@ int phy_driver_register(struct phy_driver *new_driver)
return retval;
}
 
-   pr_info(%s: Registered new driver\n, new_driver-name);
+   pr_debug(%s: Registered new driver\n, new_driver-name);
 
return 0;
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Jarek Poplawski
Eric Dumazet wrote, On 11/04/2007 10:23 PM:

 Jarek Poplawski a écrit :
 Jarek Poplawski wrote, On 11/04/2007 06:58 PM:

 Eric Dumazet wrote, On 11/04/2007 12:31 PM:
 ...

 +static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 +{
 ...

 +  if (sizeof(rwlock_t) != 0) {
 ...

 +  for (i = 0; i  size; i++)
 +  rwlock_init(hashinfo-ehash_locks[i]);
 This looks better now, but still is doubtful to me: even if it's safe with
 current rwlock implementation, can't we imagine some new debugging or
 statistical code added, which would be called from rwlock_init() without
 using rwlock_t structure? IMHO, if read_lock() etc. are called in such a
 case, rwlock_init() should be done as well.

 Of course I mean: if sizeof(rwlock_t) == 0.
 
 Given those two choices :
 
 #if defined(CONFIG_SMP) || defined(CONFIG_PROVE__LOCKING)
  kmalloc(sizeof(rwlock_t) * size);
 #endif
 
 and
 
 if (sizeof(rwlock_t) != 0) {
 kmalloc(sizeof(rwlock_t) * size);
 }
 
 I prefer the 2nd one. Less error prone, and no need to remember how are 
 spelled the gazillions CONFIG_something we have.


I've written it's better, too. But this could be improved yet (someday),
I hope.

Thanks,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread Andi Kleen
On Sunday 04 November 2007 22:56:21 David Miller wrote:
 From: Andi Kleen [EMAIL PROTECTED]

 This makes a huge different as we have to set NR_CPUS to 4096
 in order to handle the cpu numbering of some UltraSPARC-IV
 machines.

Really? Hopefully you have a large enough stack then. There 
are various users who put char str[NR_CPUS] on the stack
and a few other data structures also get incredibly big with
NR_CPUS arrays.

If it's for sparse cpu ids -- x86 handles those with an 
translation array.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] postfix decrement error in smc911x_reset(); drivers/net/smc911x.c

2007-11-04 Thread Roel Kluin
If timeout causes the loop to end, a postfix decrement causes its value to 
become 4294967295 (unsigned int), not 0.

Signed-off-by: Roel Kluin [EMAIL PROTECTED]
---
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index dd18af0..41f3c8f 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -243,7 +243,7 @@ static void smc911x_reset(struct net_device *dev)
do {
udelay(10);
reg = SMC_GET_PMT_CTRL()  PMT_CTRL_READY_;
-   } while ( timeout--  !reg);
+   } while ( --timeout  !reg);
if (timeout == 0) {
PRINTK(%s: smc911x_reset timeout waiting for PM 
restore\n, dev-name);
return;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] postfix decrement error in smc911x_reset(); drivers/net/smc911x.c

2007-11-04 Thread Roel Kluin
And there was another one...
--
If timeout causes the loop to end, a postfix decrement causes its value to 
become 4294967295 (unsigned int), not 0.

Signed-off-by: Roel Kluin [EMAIL PROTECTED]
---
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index dd18af0..fac1d2a 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -243,7 +243,7 @@ static void smc911x_reset(struct net_device *dev)
do {
udelay(10);
reg = SMC_GET_PMT_CTRL()  PMT_CTRL_READY_;
-   } while ( timeout--  !reg);
+   } while ( --timeout  !reg);
if (timeout == 0) {
PRINTK(%s: smc911x_reset timeout waiting for PM 
restore\n, dev-name);
return;
@@ -267,7 +267,7 @@ static void smc911x_reset(struct net_device *dev)
resets++;
break;
}
-   } while ( timeout--  (reg  HW_CFG_SRST_));
+   } while ( --timeout  (reg  HW_CFG_SRST_));
}
if (timeout == 0) {
PRINTK(%s: smc911x_reset timeout waiting for reset\n, 
dev-name);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Endianness problem with u32 classifier hash masks

2007-11-04 Thread jamal
On Sun, 2007-04-11 at 02:17 +0100, Jarek Poplawski wrote:

 So, even if not full ntohl(), some byte moving seems to be
 necessary here.

I thinking you were close. I am afraid my brain is congested, even the
esspresso didnt help my thinking. 
It could be done with just fshift on the slow path (config time) of one
was to think hard;- I am not too happy with the extra conversion on the
fast path, but how about the untested attached patch?

cheers,
jamal
 
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 9e98c6e..6dd569b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -93,7 +93,7 @@ static __inline__ unsigned u32_hash_fold(u32 key, struct tc_u32_sel *sel, u8 fsh
 {
 	unsigned h = (key  sel-hmask)fshift;
 
-	return h;
+	return ntohl(h);
 }
 
 static int u32_classify(struct sk_buff *skb, struct tcf_proto *tp, struct tcf_result *res)
@@ -615,7 +615,7 @@ static int u32_change(struct tcf_proto *tp, unsigned long base, u32 handle,
 	n-handle = handle;
 {
 	u8 i = 0;
-	u32 mask = s-hmask;
+	u32 mask = ntohl(s-hmask);
 	if (mask) {
 		while (!(mask  1)) {
 			i++;


Re: [PATCH] postfix decrement error in smc911x_reset(); drivers/net/smc911x.c

2007-11-04 Thread Roel Kluin
Darn, another. Sorry for the noise; I also removed a whitespace in this one
--
If timeout causes the loop to end, a postfix decrement causes its value to 
become 4294967295 (unsigned int), not 0.

Signed-off-by: Roel Kluin [EMAIL PROTECTED]
---
diff --git a/drivers/net/smc911x.c b/drivers/net/smc911x.c
index dd18af0..6a2d236 100644
--- a/drivers/net/smc911x.c
+++ b/drivers/net/smc911x.c
@@ -243,7 +243,7 @@ static void smc911x_reset(struct net_device *dev)
do {
udelay(10);
reg = SMC_GET_PMT_CTRL()  PMT_CTRL_READY_;
-   } while ( timeout--  !reg);
+   } while (--timeout  !reg);
if (timeout == 0) {
PRINTK(%s: smc911x_reset timeout waiting for PM 
restore\n, dev-name);
return;
@@ -267,7 +267,7 @@ static void smc911x_reset(struct net_device *dev)
resets++;
break;
}
-   } while ( timeout--  (reg  HW_CFG_SRST_));
+   } while (--timeout  (reg  HW_CFG_SRST_));
}
if (timeout == 0) {
PRINTK(%s: smc911x_reset timeout waiting for reset\n, 
dev-name);
@@ -413,7 +413,7 @@ static inline void smc911x_drop_pkt(struct net_device *dev)
do {
udelay(10);
reg = SMC_GET_RX_DP_CTRL()  RX_DP_CTRL_FFWD_BUSY_;
-   } while ( timeout--  reg);
+   } while (--timeout  reg);
if (timeout == 0) {
PRINTK(%s: timeout waiting for RX fast forward\n, 
dev-name);
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/2] ipvs: avoid overcommit on the standby, take III

2007-11-04 Thread horms
Two related patches from Rumen G. Bogdanovski
to help prevent overcommit on the standby.


On the last two attempts I have managed to send somewhat bogus patches.
So I started from scratch. I tool the original patches, fixed
up what scripts/checkpatch.pl didn't like, then compared the output
to my previous attempt, which happily showed the bogus bits
that I know about have been fixed.

-- 

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] ipvs: Syncrhonise Closing of Connections

2007-11-04 Thread horms
From: Rumen G. Bogdanovski [EMAIL PROTECTED]

This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.

Regards,
Rumen Bogdanovski

Signed-off-by: Rumen G. Bogdanovski [EMAIL PROTECTED]
Signed-off-by: Simon Horman [EMAIL PROTECTED]

--- 
Thu, 01 Nov 2007 18:25:10 +0900, Horms
* Redifed for net-2.6
* Ran through scripts/checkpatch.pl and fixed up everything
  that it complains about except the use of volatile, as
  its in keeping with other fields in the structure.
  If its wrong, lets fix them all together.

WARNING: Use of volatile is usually wrong: see
Documentation/volatile-considered-harmful.txt
#49: FILE: include/net/ip_vs.h:523:
+   volatile __u16  old_state;  /* old state, to be used for

Index: net-2.6/include/net/ip_vs.h
===
--- net-2.6.orig/include/net/ip_vs.h2007-11-05 11:37:45.0 +0900
+++ net-2.6/include/net/ip_vs.h 2007-11-05 11:37:49.0 +0900
@@ -520,6 +520,10 @@ struct ip_vs_conn {
spinlock_t  lock;   /* lock for state transition */
volatile __u16  flags;  /* status flags */
volatile __u16  state;  /* state info */
+   volatile __u16  old_state;  /* old state, to be used for
+* state transition triggerd
+* synchronization
+*/
 
/* Control members */
struct ip_vs_conn   *control;   /* Master control connection */
Index: net-2.6/net/ipv4/ipvs/ip_vs_core.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_core.c 2007-11-05 11:37:45.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_core.c  2007-11-05 11:37:49.0 +0900
@@ -979,15 +979,23 @@ ip_vs_in(unsigned int hooknum, struct sk
ret = NF_ACCEPT;
}
 
-   /* increase its packet counter and check if it is needed
-  to be synchronized */
+   /* Increase its packet counter and check if it is needed
+* to be synchronized
+*
+* Sync connection if it is about to close to
+* encorage the standby servers to update the connections timeout
+*/
atomic_inc(cp-in_pkts);
if ((ip_vs_sync_state  IP_VS_STATE_MASTER) 
-   (cp-protocol != IPPROTO_TCP ||
-cp-state == IP_VS_TCP_S_ESTABLISHED) 
-   (atomic_read(cp-in_pkts) % sysctl_ip_vs_sync_threshold[1]
-== sysctl_ip_vs_sync_threshold[0]))
+   (((cp-protocol != IPPROTO_TCP ||
+  cp-state == IP_VS_TCP_S_ESTABLISHED) 
+ (atomic_read(cp-in_pkts) % sysctl_ip_vs_sync_threshold[1]
+  == sysctl_ip_vs_sync_threshold[0])) ||
+((cp-protocol == IPPROTO_TCP)  (cp-old_state != cp-state) 
+ ((cp-state == IP_VS_TCP_S_FIN_WAIT) ||
+  (cp-state == IP_VS_TCP_S_CLOSE)
ip_vs_sync_conn(cp);
+   cp-old_state = cp-state;
 
ip_vs_conn_put(cp);
return ret;
Index: net-2.6/net/ipv4/ipvs/ip_vs_sync.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_sync.c 2007-11-05 11:37:45.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_sync.c  2007-11-05 11:37:49.0 +0900
@@ -344,7 +344,6 @@ static void ip_vs_process_message(const 
if (!dest) {
/* it is an unbound entry created by
 * synchronization */
-   cp-state = ntohs(s-state);
cp-flags = flags | IP_VS_CONN_F_HASHED;
} else
atomic_dec(dest-refcnt);
@@ -359,6 +358,7 @@ static void ip_vs_process_message(const 
p += SIMPLE_CONN_SIZE;
 
 

[patch 1/2] ipvs: Bind connections on stanby if the destination exists

2007-11-04 Thread horms
From: Rumen G. Bogdanovski [EMAIL PROTECTED]

This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
ipvsadm -l reports 0 connections:
[EMAIL PROTECTED]:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  - RemoteAddress:Port   Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  - node473.local:5999   Route   1000   0  0
  - node484.local:5999   Route   1000   0  0

while ipvs -lnc is right
[EMAIL PROTECTED]:~# ipvsadm -lnc
IPVS connection entries
pro expire state   source virtualdestination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

[EMAIL PROTECTED]:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  - RemoteAddress:Port   Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  - node473.local:5999   Route   1000   0  1
  - node484.local:5999   Route   1000   0  1

[EMAIL PROTECTED]:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  - RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  - C0A80033:176F  Route   1000   0  1
  - C0A80032:176F  Route   1000   0  1


Regards,
Rumen Bogdanovski

Acked-by: Julian Anastasov [EMAIL PROTECTED]
Signed-off-by: Rumen G. Bogdanovski [EMAIL PROTECTED]
Signed-off-by: Simon Horman [EMAIL PROTECTED]

--- 
Mon, 05 Nov 2007 11:33:33 +0900
* Various whitespace and indentation changes
* Rediffed against net-2.6
* Ran against ./scripts/checkpatch.pl and fixed everything that
  it complained about

Index: net-2.6/include/net/ip_vs.h
===
--- net-2.6.orig/include/net/ip_vs.h2007-11-05 11:23:58.0 +0900
+++ net-2.6/include/net/ip_vs.h 2007-11-05 11:25:51.0 +0900
@@ -901,6 +901,10 @@ extern int ip_vs_use_count_inc(void);
 extern void ip_vs_use_count_dec(void);
 extern int ip_vs_control_init(void);
 extern void ip_vs_control_cleanup(void);
+extern struct ip_vs_dest *
+ip_vs_find_dest(__be32 daddr, __be16 dport,
+__be32 vaddr, __be16 vport, __u16 protocol);
+extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp);
 
 
 /*
Index: net-2.6/net/ipv4/ipvs/ip_vs_conn.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_conn.c 2007-11-05 11:23:58.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_conn.c  2007-11-05 11:25:51.0 +0900
@@ -426,6 +426,25 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, s
 
 
 /*
+ * Check if there is a destination for the connection, if so
+ * bind the connection to the destination.
+ */
+struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
+{
+   struct ip_vs_dest *dest;
+
+   if ((cp)  (!cp-dest)) {
+   dest = ip_vs_find_dest(cp-daddr, cp-dport,
+  cp-vaddr, cp-vport, cp-protocol);
+   ip_vs_bind_dest(cp, dest);
+   return dest;
+   } else
+   return NULL;
+}
+EXPORT_SYMBOL(ip_vs_try_bind_dest);
+
+
+/*
  * Unbind a connection entry with its VS destination
  * Called by the ip_vs_conn_expire function.
  */
Index: net-2.6/net/ipv4/ipvs/ip_vs_ctl.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_ctl.c  2007-11-05 11:23:58.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_ctl.c   2007-11-05 11:25:51.0 +0900
@@ -579,6 +579,32 @@ ip_vs_lookup_dest(struct ip_vs_service *
return NULL;
 }
 
+/*
+ * Find destination by {daddr,dport,vaddr,protocol}
+ * Cretaed to be used in ip_vs_process_message() in
+ * the backup 

Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: Mon, 5 Nov 2007 00:01:03 +0100

 On Sunday 04 November 2007 22:56:21 David Miller wrote:
  From: Andi Kleen [EMAIL PROTECTED]
 
  This makes a huge different as we have to set NR_CPUS to 4096
  in order to handle the cpu numbering of some UltraSPARC-IV
  machines.
 
 Really? Hopefully you have a large enough stack then. There 
 are various users who put char str[NR_CPUS] on the stack
 and a few other data structures also get incredibly big with
 NR_CPUS arrays.

For the stack case there is one debugging case, and that's for
sprintf'ing cpusets.  That could be easily eliminated.

 If it's for sparse cpu ids -- x86 handles those with an 
 translation array.

I would rather not do this, so much assembler code indexes straight
into the per-cpu arrays.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-04 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Sun, 04 Nov 2007 20:24:54 -0800 (PST)

 From: Andi Kleen [EMAIL PROTECTED]
 Date: Mon, 5 Nov 2007 00:01:03 +0100
 
  If it's for sparse cpu ids -- x86 handles those with an 
  translation array.
 
 I would rather not do this, so much assembler code indexes straight
 into the per-cpu arrays.

Also, at current rates, I'll need to be able to support
4096 cpus for real not very long from now :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html