date:20070306

[Patch 0/3] ucc_geth updates for 2.6.21-rc2

2007-03-06 Thread Li Yang


Jeff,

Please apply.

Thanks

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Patch 1/3] ucc_geth: Fix BD processing

2007-03-06 Thread Li Yang


Fix broken BD processing code.

Signed-off-by: Michael Barkowski [EMAIL PROTECTED]
Signed-off-by: Li Yang [EMAIL PROTECTED]

---
drivers/net/ucc_geth.c |   14 +-
1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 885e73d..639e1e6 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3598,9 +3598,9 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
struct net_device *dev)

/* Move to next BD in the ring */
if (!(bd_status  T_W))
-   ugeth-txBd[txQ] = bd + sizeof(struct qe_bd);
+   bd += sizeof(struct qe_bd);
else
-   ugeth-txBd[txQ] = ugeth-p_tx_bd_ring[txQ];
+   bd = ugeth-p_tx_bd_ring[txQ];

/* If the next BD still needs to be cleaned up, then the bds
   are full.  We need to tell the kernel to stop sending us stuff. */
@@ -3609,6 +3609,8 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
netif_stop_queue(dev);
}

+   ugeth-txBd[txQ] = bd;
+
if (ugeth-p_scheduler) {
ugeth-cpucount[txQ]++;
/* Indicate to QE that there are more Tx bds ready for
@@ -3722,7 +3724,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
/* Handle the transmitted buffer and release */
/* the BD to be used with the current frame  */

-   if ((bd = ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
+   if ((bd == ugeth-txBd[txQ])  (netif_queue_stopped(dev) == 0))
break;

ugeth-stats.tx_packets++;
@@ -3741,10 +3743,12 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)

/* Advance the confirmation BD pointer */
if (!(bd_status  T_W))
-   ugeth-confBd[txQ] += sizeof(struct qe_bd);
+   bd += sizeof(struct qe_bd);
else
-   ugeth-confBd[txQ] = ugeth-p_tx_bd_ring[txQ];
+   bd = ugeth-p_tx_bd_ring[txQ];
+   bd_status = in_be32((u32 *)bd);
}
+   ugeth-confBd[txQ] = bd;
return 0;
}

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Patch 2/3] ucc_geth: returns NETDEV_TX_BUSY when BD ring is full

2007-03-06 Thread Li Yang


Returns NETDEV_TX_BUSY when BD ring is full.

Signed-off-by: Li Yang [EMAIL PROTECTED]
---
drivers/net/ucc_geth.c |3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 639e1e6..dab88b9 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3607,6 +3607,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
if (bd == ugeth-confBd[txQ]) {
if (!netif_queue_stopped(dev))
netif_stop_queue(dev);
+   return NETDEV_TX_BUSY;
}

ugeth-txBd[txQ] = bd;
@@ -3622,7 +3623,7 @@ static int ucc_geth_start_xmit(struct sk_buff *skb, 
struct net_device *dev)

spin_unlock_irq(ugeth-lock);

-   return 0;
+   return NETDEV_TX_OK;
}

static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int 
rx_work_limit)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Patch 3/3] ucc_geth: suppress compile warnings

2007-03-06 Thread Li Yang


Add casts to suppress warnings introduced by kmalloc cast cleanup patch.

Signed-off-by: Li Yang [EMAIL PROTECTED]
---
drivers/net/ucc_geth.c |6 --
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dab88b9..c2db49b 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2804,7 +2804,8 @@ static int ucc_geth_startup(struct ucc_geth_private 
*ugeth)
if (UCC_GETH_TX_BD_RING_ALIGNMENT  4)
align = UCC_GETH_TX_BD_RING_ALIGNMENT;
ugeth-tx_bd_ring_offset[j] =
-   kmalloc((u32) (length + align), GFP_KERNEL);
+   (u32)kmalloc((u32) (length + align),
+GFP_KERNEL);

if (ugeth-tx_bd_ring_offset[j] != 0)
ugeth-p_tx_bd_ring[j] =
@@ -2840,7 +2841,8 @@ static int ucc_geth_startup(struct ucc_geth_private 
*ugeth)
if (UCC_GETH_RX_BD_RING_ALIGNMENT  4)
align = UCC_GETH_RX_BD_RING_ALIGNMENT;
ugeth-rx_bd_ring_offset[j] =
-   kmalloc((u32) (length + align), GFP_KERNEL);
+   (u32)kmalloc((u32) (length + align),
+GFP_KERNEL);
if (ugeth-rx_bd_ring_offset[j] != 0)
ugeth-p_rx_bd_ring[j] =
(void*)((ugeth-rx_bd_ring_offset[j] +

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Tue, 06 Mar 2007 08:58:45 +0100

 Yes, but on bootup you have an appropriate NUMA active policy. (Well... we 
 hope so, but it broke several time in the past)
 I am not sure what kind of mm policy is active for scheduled works.

Good point, that definitely needs investigation.

Thanks for all of your comments so far Eric, very useful :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Nick Piggin

On Mon, Mar 05, 2007 at 08:26:32PM -0800, David Miller wrote:
 
 This is essentially a port of Nick Piggin's dcache hash table
 patches to the routing cache.  It solves the locking issues
 during table grow/shrink that I couldn't handle properly last
 time I tried to code up a patch like this.
 
 But one of the core issues of this kind of change still remains.
 There is a conflict between the desire of routing cache garbage
 collection to reach a state of equilibrium and the hash table
 grow code's desire to match the table size to the current state
 of affairs.
 
 Actually, more accurately, the conflict exists in how this GC
 logic is implemented.  The core issue is that hash table size
 guides the GC processing, and hash table growth therefore
 modifies those GC goals.  So with the patch below we'll just
 keep growing the hash table instead of giving GC some time to
 try to keep the working set in equilibrium before doing the
 hash grow.
 
 One idea is to put the hash grow check in the garbage collector,
 and put the hash shrink check in rt_del().
 
 In fact, it would be a good time to perhaps hack up some entirely
 new passive GC logic for the routing cache.
 
 BTW, another thing that plays into this is that Robert's TRASH work
 could make this patch not necessary :-)
 
 Finally, I know that (due to some of Nick's helpful comments the
 other day) that I'm missing some rcu_assign_pointer()'s in here.
 Fixes in this area are most welcome.

Cool! I have some fixes for the rcu barrier issues, with some C-style
comments and questions :)

I was going to send you a fix first for the rcu barriers, then a
second to convert the read-side to a barrier-less one that I described,
however considering that your patch is a WIP in progress anyway, I
won't worry too much about the normal protocol.

I _think_ my reasoning regarding the rcu barriers and grace periods
is correct. I'll keep thinking about it though. (Paul cc'ed).

I'm not so familiar with this code, so I have sprinkled around a lot
of comments that could be pure crap ;) They are mainly just to help
you ensure that you cover all bases... compile tested only at this
stage.

--
Index: linux-2.6/net/ipv4/route.c
===
--- linux-2.6.orig/net/ipv4/route.c
+++ linux-2.6/net/ipv4/route.c
@@ -311,6 +311,8 @@ static void rthash_free(struct rt_hash_b
 static unsigned int rt_hash_code(struct rt_hash *hashtable,
 u32 daddr, u32 saddr)
 {
+   /* BUG_ON(!rcu_read_protected()) */
+
return (jhash_2words(daddr, saddr, rt_hash_rnd)
 hashtable-mask);
 }
@@ -343,11 +345,16 @@ static void rt_hash_resize_work(struct w
 
old_rt_hash = rt_hash;
/*
-* ensure that if the reader sees the new dentry_hash,
-* then they will also see the old_dentry_hash assignment,
-* above.
+* ensure that if the reader sees the new rt_hash, then they will also
+* see the old_rt_hash assignment, above. synchronize_rcu() is used
+* rather than smp_wmb(), in order to avoid the smp_rmb() in the
+* read-sidde. However synchronize_rcu() also implies a smp_wmb(), so
+* that also means we can skip rcu_assign_pointer().
+*
+* The readers can then also skip rcu_dereference, because a grace
+* period implies that all readers have performed memory barriers.
 */
-   smp_wmb();
+   synchronize_rcu();
rt_hash = new_hash;
synchronize_rcu();
 
@@ -1100,6 +1107,8 @@ static int rt_intern_hash(struct rt_hash
int chain_length;
int attempts = !in_softirq();
 
+   /* BUG_ON(!rcu_read_protected()) */
+
 restart:
chain_length = 0;
min_score = ~(u32)0;
@@ -1286,6 +1295,8 @@ static void rt_del(struct rt_hash *h, un
 {
struct rtable **rthp;
 
+   /* BUG_ON(!rcu_read_protected()) */
+
spin_lock_bh(rt_hash_lock_addr(hash));
ip_rt_put(rt);
for (rthp = h-table[hash].chain; *rthp;
@@ -1328,12 +1339,24 @@ void ip_rt_redirect(__be32 old_gw, __be3
 
for (i = 0; i  2; i++) {
for (k = 0; k  2; k++) {
-   struct rt_hash *h = rt_hash;
-   unsigned hash = rt_hashfn(h, daddr, skeys[i], ikeys[k]);
+   struct rt_hash *h;
+   unsigned hash;
+
+   /*
+* rcu_read_lock() must cover the load of rt_hash, in
+* order to satisfy our RCU protected dynamic hash
+* sizing scheme; and it must also cover the hash list
+* traversal, to satisfy our RCU protected lockless
+* hash entry lookups.
+*
+* This note applies throughout the file.
+*/
+   rcu_read_lock();
+   h =

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread David Miller

From: Nick Piggin [EMAIL PROTECTED]
Date: Tue, 6 Mar 2007 10:11:12 +0100

 @@ -1449,6 +1472,12 @@ static struct dst_entry *ipv4_negative_a
 %u.%u.%u.%u/%02x dropped\n,
   NIPQUAD(rt-rt_dst), rt-fl.fl4_tos);
  #endif
 + /* XXX:
 +  * What if rt does not exist in rt_hash, but is in
 +  * old_rt_hash? Don't we have to also check there?
 +  * Similar questions for a couple of other places that
 +  * look at rt_hash, but not old_rt_hash.
 +  */
   rt_del(h, hash, rt);
   ret = NULL;
   }

For the cases like ip_rt_redirect() I made the decision that we'll
just not add the complexity of having to look in the old_rt_hash
table.

In these kinds of cases it's OK to miss events, they will just happen
again.

It's one of the nice things about the routing cache, if you lose
information it's OK because we'll just cook up a new entry from
the persistent backing store that is the real routing tables.
And for events like redirects, if we miss it, we'll just send the
packet to the wrong next-hop again and receive another redirect
message which we'll (hopefully) propagate to a routine cache
entry.

Thanks for your feedback patch Nick, I'll process it tomorrow
hopefully after getting some sleep.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Nick Piggin

On Tue, Mar 06, 2007 at 01:17:06AM -0800, David Miller wrote:
 From: Nick Piggin [EMAIL PROTECTED]
 Date: Tue, 6 Mar 2007 10:11:12 +0100

  @@ -1449,6 +1472,12 @@ static struct dst_entry *ipv4_negative_a
%u.%u.%u.%u/%02x dropped\n,
  NIPQUAD(rt-rt_dst), rt-fl.fl4_tos);
   #endif
  +   /* XXX:
  +* What if rt does not exist in rt_hash, but is in
  +* old_rt_hash? Don't we have to also check there?
  +* Similar questions for a couple of other places that
  +* look at rt_hash, but not old_rt_hash.
  +*/
  rt_del(h, hash, rt);
  ret = NULL;
  }

 For the cases like ip_rt_redirect() I made the decision that we'll
 just not add the complexity of having to look in the old_rt_hash
 table.

 In these kinds of cases it's OK to miss events, they will just happen
 again.

 It's one of the nice things about the routing cache, if you lose
 information it's OK because we'll just cook up a new entry from
 the persistent backing store that is the real routing tables.
 And for events like redirects, if we miss it, we'll just send the
 packet to the wrong next-hop again and receive another redirect
 message which we'll (hopefully) propagate to a routine cache
 entry.

Ah, that's a very neat trick. OK so with that question out of the
way, there _may_ just be a few other places where you're working
with an rt_hash table outside an rcu read critical section.

I tried to fix up a couple of obvious ones, and I've just put in
the BUG_ON assertions for you to verify the rest.

 Thanks for your feedback patch Nick, I'll process it tomorrow
 hopefully after getting some sleep.

No problem. Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Eric Dumazet

On Tuesday 06 March 2007 10:11, Nick Piggin wrote:

 Cool! I have some fixes for the rcu barrier issues, with some C-style
 comments and questions :)

 I was going to send you a fix first for the rcu barriers, then a
 second to convert the read-side to a barrier-less one that I described,
 however considering that your patch is a WIP in progress anyway, I
 won't worry too much about the normal protocol.

 I _think_ my reasoning regarding the rcu barriers and grace periods
 is correct. I'll keep thinking about it though. (Paul cc'ed).

 I'm not so familiar with this code, so I have sprinkled around a lot
 of comments that could be pure crap ;) They are mainly just to help
 you ensure that you cover all bases... compile tested only at this
 stage.

I think we missed :

+static void rt_hash_resize_work(struct work_struct *work)

+
+   *head = rth-u.dst.rt_next;
+
+   hash = rt_hashfn(rt_hash,
+rth-fl.fl4_dst,
+rth-fl.fl4_src,
+iface);
+   rth-u.dst.rt_next = rt_hash-table[hash].chain;
+   rt_hash-table[hash].chain = rth;

This really needs some ..._del_rcu()/..._add_rcu()_ ... primitives, no ?
Or else a reader might be very confused...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Nick Piggin

On Tue, Mar 06, 2007 at 10:23:44AM +0100, Eric Dumazet wrote:
 On Tuesday 06 March 2007 10:11, Nick Piggin wrote:
 
  Cool! I have some fixes for the rcu barrier issues, with some C-style
  comments and questions :)
 
  I was going to send you a fix first for the rcu barriers, then a
  second to convert the read-side to a barrier-less one that I described,
  however considering that your patch is a WIP in progress anyway, I
  won't worry too much about the normal protocol.
 
  I _think_ my reasoning regarding the rcu barriers and grace periods
  is correct. I'll keep thinking about it though. (Paul cc'ed).
 
  I'm not so familiar with this code, so I have sprinkled around a lot
  of comments that could be pure crap ;) They are mainly just to help
  you ensure that you cover all bases... compile tested only at this
  stage.
 
 I think we missed :
 
 +static void rt_hash_resize_work(struct work_struct *work)
 
 +
 + *head = rth-u.dst.rt_next;
 +
 + hash = rt_hashfn(rt_hash,
 +  rth-fl.fl4_dst,
 +  rth-fl.fl4_src,
 +  iface);
 + rth-u.dst.rt_next = rt_hash-table[hash].chain;
 + rt_hash-table[hash].chain = rth;
 
 This really needs some ..._del_rcu()/..._add_rcu()_ ... primitives, no ?
 Or else a reader might be very confused...

I'm not sure... this code really depends on the hash table management,
rather than the management of the hash tables, if you understand me ;)

From what I can _see_, this is similar to how rt_intern_hash does it.
I don't know exactly why rt_intern_hash can get away without using
rcu_assign_pointer in some cases, however:

Note that we don't need an rcu_assign_pointer for this, because the
memory operations that initialized the entry have already been ordered
when it was first inserted.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Eric Dumazet wrote:

On Monday 05 March 2007 12:20, Howard Chu wrote:

Why is the Maximum Segment Lifetime a global parameter? Surely the
maximum possible lifetime of a particular TCP segment depends on the
actual connection. At the very least, it would be useful to be able to
set it on a per-interface basis. E.g., in the case of the loopback
interface, it would be useful to be able to set it to a very small
duration.


Hi Howard

I think you should address these questions on netdev instead of linux-kernel.


OK, I just subscribed to netdev...


As I note in this draft
http://www.ietf.org/internet-drafts/draft-chu-ldap-ldapi-00.txt
when doing a connection soak test of OpenLDAP using clients connected
through localhost, the entire port range is exhausted in well under a
second, at which point the test stalls until a port comes out of
TIME_WAIT state so the next connection can be opened.

These days it's not uncommon for an OpenLDAP slapd server to handle tens
of thousands of connections per second in real use (e.g., at Google, or
at various telcos). While the LDAP server is fast enough to saturate
even 10gbit ethernet using contemporary CPUs, we have to resort to
multiple virtual interfaces just to make sure we have enough port
numbers available.


I dont uderstand... doesnt slapd server listen for connections on a given 
port, like http ? Or is it doing connections like a ftp server ?


No, you're right, it listens on a single port. There is a standard port 
(389) though of course you can use any port you want.


Of course, if you want to open more than 60.000 concurrent connections, using 
127.0.0.1 address, you might have a problem...


This is probably not something that happens in real world deployments. I 
But it's not 60,000 concurrent connections, it's 60,000 within a 2 
minute span.


I'm not saying this is a high priority problem, I only encountered it in 
a test scenario where I was deliberately trying to max out the server.



Ideally the 2MSL parameter would be dynamically adjusted based on the
route to the destination and the weights associated with those routes.
In the simplest case, connections between machines on the same subnet
(i.e., no router hops involved) should have a much smaller default value
than connections that traverse any routers. I'd settle for a two-level
setting - with no router hops, use the small value; with any router hops
use the large value.


Well, is it really a MSL problem ?


I did a small test (linux-2.6.21-rc1) and was able to get 1.000.000 
connections on localhost on my dual proc machine in one minute, without an 
error.


It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range - 
on my system the default port range is 32768-61000. That means if I use 
up 28232 ports in less than 2MSL then everything stops. netstat will 
show that all the available port numbers are in TIME_WAIT state. And 
this is particularly bad because while waiting for the timeout, I can't 
initiate any new outbound connections of any kind at all - telnet, ssh, 
whatever, you have to wait for at least one port to free up. 
(Interesting denial of service there)


Granted, I was running my test on 2.6.18, perhaps 2.6.21 behaves 
differently.


--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NET : Optimizes inet_getpeer()

2007-03-06 Thread Eric Dumazet

Hi David

Please find this patch against net-2.6.22

Thank you

[PATCH] NET : Optimizes inet_getpeer()

1) Some sysctl vars are declared __read_mostly

2) We can avoid updating stack[] when doing an AVL lookup only.

lookup() macro is extended to receive a second parameter, that may be NULL 
in case of a pure lookup (no need to save the AVL path). This removes 
unnecessary instructions, because compiler knows if this _stack parameter is 
NULL or not.

text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index db3ef96..2f44e61 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -87,10 +87,12 @@ #define PEER_MAXDEPTH 40 /* sufficient f
 
 static int peer_total;
 /* Exported for sysctl_net_ipv4.  */
-int inet_peer_threshold = 65536 + 128; /* start to throw entries more
+int inet_peer_threshold __read_mostly = 65536 + 128;   /* start to throw 
entries more
 * aggressively at this stage */
-int inet_peer_minttl = 120 * HZ;   /* TTL under high load: 120 sec */
-int inet_peer_maxttl = 10 * 60 * HZ;   /* usual time to live: 10 min */
+int inet_peer_minttl __read_mostly = 120 * HZ; /* TTL under high load: 120 sec 
*/
+int inet_peer_maxttl __read_mostly = 10 * 60 * HZ; /* usual time to live: 
10 min */
+int inet_peer_gc_mintime __read_mostly = 10 * HZ;
+int inet_peer_gc_maxtime __read_mostly = 120 * HZ;
 
 static struct inet_peer *inet_peer_unused_head;
 static struct inet_peer **inet_peer_unused_tailp = inet_peer_unused_head;
@@ -99,9 +101,6 @@ static DEFINE_SPINLOCK(inet_peer_unused_
 static void peer_check_expire(unsigned long dummy);
 static DEFINE_TIMER(peer_periodic_timer, peer_check_expire, 0, 0);
 
-/* Exported for sysctl_net_ipv4.  */
-int inet_peer_gc_mintime = 10 * HZ,
-inet_peer_gc_maxtime = 120 * HZ;
 
 /* Called from ip_output.c:ip_init  */
 void __init inet_initpeers(void)
@@ -151,20 +150,27 @@ static void unlink_from_unused(struct in
spin_unlock_bh(inet_peer_unused_lock);
 }
 
-/* Called with local BH disabled and the pool lock held. */
-#define lookup(daddr)  \
+/*
+ * Called with local BH disabled and the pool lock held.
+ * _stack is known to be NULL or not at compile time,
+ * so compiler will optimize the if (_stack) tests.
+ */
+#define lookup(_daddr,_stack)  \
 ({ \
struct inet_peer *u, **v;   \
-   stackptr = stack;   \
-   *stackptr++ = peer_root;   \
+   if (_stack) {   \
+   stackptr = _stack;  \
+   *stackptr++ = peer_root;   \
+   }   \
for (u = peer_root; u != peer_avl_empty; ) {\
-   if (daddr == u-v4daddr)\
+   if (_daddr == u-v4daddr)   \
break;  \
-   if ((__force __u32)daddr  (__force __u32)u-v4daddr)   \
+   if ((__force __u32)_daddr  (__force __u32)u-v4daddr)  \
v = u-avl_left;   \
else\
v = u-avl_right;  \
-   *stackptr++ = v;\
+   if (_stack) \
+   *stackptr++ = v;\
u = *v; \
}   \
u;  \
@@ -288,7 +294,7 @@ static void unlink_from_pool(struct inet
if (atomic_read(p-refcnt) == 1) {
struct inet_peer **stack[PEER_MAXDEPTH];
struct inet_peer ***stackptr, ***delp;
-   if (lookup(p-v4daddr) != p)
+   if (lookup(p-v4daddr, stack) != p)
BUG();
delp = stackptr - 1; /* *delp[0] == p */
if (p-avl_left == peer_avl_empty) {
@@ -373,7 +379,7 @@ struct inet_peer *inet_getpeer(__be32 da
 
/* Look up for the address quickly. */
read_lock_bh(peer_pool_lock);
-   p = lookup(daddr);
+   p = lookup(daddr, NULL);
if (p != peer_avl_empty)
atomic_inc(p-refcnt);
read_unlock_bh(peer_pool_lock);
@@ -400,7 +406,7 @@ struct inet_peer *inet_getpeer(__be32 da
 
write_lock_bh(peer_pool_lock);
/* Check if an entry has suddenly appeared. */
-   p = lookup(daddr);
+   p = lookup(daddr, stack);

[patch 01/19] git-netdev-all: ipw2200 fix

2007-03-06 Thread akpm

From: Andrew Morton [EMAIL PROTECTED]

drivers/net/wireless/ipw2200.c: In function 'show_channels':
drivers/net/wireless/ipw2200.c:1855: warning: implicit declaration of function 
'ipw_get_geo'
drivers/net/wireless/ipw2200.c:1855: warning: initialization makes pointer from 
integer without a cast


Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/wireless/ipw2200.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN drivers/net/wireless/ipw2200.c~git-netdev-all-ipw2200-fix 
drivers/net/wireless/ipw2200.c
--- a/drivers/net/wireless/ipw2200.c~git-netdev-all-ipw2200-fix
+++ a/drivers/net/wireless/ipw2200.c
@@ -1852,7 +1852,7 @@ static ssize_t show_channels(struct devi
 char *buf)
 {
struct ipw_priv *priv = dev_get_drvdata(d);
-   const struct ieee80211_geo *geo = ipw_get_geo(priv-ieee);
+   const struct ieee80211_geo *geo = ieee80211_get_geo(priv-ieee);
int len = 0, i;
 
len = sprintf(buf[len],
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 07/19] dmfe: fix two bugs

2007-03-06 Thread akpm

From: Maxim Levitsky [EMAIL PROTECTED]

Fix a oops on module removal due to deallocating memory before unregistring
driver Fix a NULL pointer dereference when dev_alloc_skb fails

Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff -puN drivers/net/tulip/dmfe.c~dmfe-fix-two-bugs drivers/net/tulip/dmfe.c
--- a/drivers/net/tulip/dmfe.c~dmfe-fix-two-bugs
+++ a/drivers/net/tulip/dmfe.c
@@ -501,14 +501,17 @@ static void __devexit dmfe_remove_one (s
DMFE_DBUG(0, dmfe_remove_one(), 0);
 
if (dev) {
+
+   unregister_netdev(dev);
+
pci_free_consistent(db-pdev, sizeof(struct tx_desc) *
DESC_ALL_CNT + 0x20, db-desc_pool_ptr,
db-desc_pool_dma_ptr);
pci_free_consistent(db-pdev, TX_BUF_ALLOC * TX_DESC_CNT + 4,
db-buf_pool_ptr, db-buf_pool_dma_ptr);
-   unregister_netdev(dev);
pci_release_regions(pdev);
free_netdev(dev);   /* free board information */
+
pci_set_drvdata(pdev, NULL);
}
 
@@ -927,7 +930,7 @@ static inline u32 cal_CRC(unsigned char 
 static void dmfe_rx_packet(struct DEVICE *dev, struct dmfe_board_info * db)
 {
struct rx_desc *rxptr;
-   struct sk_buff *skb;
+   struct sk_buff *skb, *newskb;
int rxlen;
u32 rdes0;
 
@@ -980,9 +983,11 @@ static void dmfe_rx_packet(struct DEVICE
} else {
/* Good packet, send to upper layer */
/* Shorst packet used new SKB */
-   if ( (rxlen  RX_COPY_SIZE) 
-   ( (skb = dev_alloc_skb(rxlen + 
2) )
-   != NULL) ) {
+   if ((rxlen  RX_COPY_SIZE) 
+   ((newskb = dev_alloc_skb(rxlen 
+ 2))
+   != NULL)) {
+
+   skb = newskb;
/* size less than COPY_SIZE, 
allocate a rxlen SKB */
skb-dev = dev;
skb_reserve(skb, 2); /* 16byte 
align */
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Eric Dumazet

On Tuesday 06 March 2007 10:22, Howard Chu wrote:


 It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range -
 on my system the default port range is 32768-61000. That means if I use
 up 28232 ports in less than 2MSL then everything stops. netstat will
 show that all the available port numbers are in TIME_WAIT state. And
 this is particularly bad because while waiting for the timeout, I can't
 initiate any new outbound connections of any kind at all - telnet, ssh,
 whatever, you have to wait for at least one port to free up.
 (Interesting denial of service there)

 Granted, I was running my test on 2.6.18, perhaps 2.6.21 behaves
 differently.

Could you try this attached program and tell me whats happen ?

$ gcc -O2 -o socktest socktest.c -lpthread
$ time ./socktest -n 10
nb_conn=9 nb_accp=9

real0m5.058s
user0m0.212s
sys 0m4.844s

(on my small machine, dell d610 :) )

/*
  Copyright (C) 2007  Eric Dumazet

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 */
#include pthread.h
#include sys/types.h
#include sys/socket.h
#include sys/resource.h
#include sys/wait.h
#include sys/ioctl.h
#include sys/stat.h
#include sys/time.h
#include sys/poll.h
#include sys/sendfile.h
#include sys/epoll.h

#include netinet/in.h
#include netinet/tcp.h
#include arpa/inet.h

#include stdio.h
#include stdlib.h
#include unistd.h
#include errno.h
#include string.h
#include fcntl.h
#include time.h
#include ctype.h
#include netdb.h

int fd __attribute__((aligned(64)));
int port = ;
unsigned long nb_acc __attribute__((aligned(64)));
unsigned long nb_conn1 __attribute__((aligned(64)));
unsigned long nb_conn2 __attribute__((aligned(64)));
unsigned long nb_conn3 __attribute__((aligned(64)));
int limit = 1/3;

void *do_accept(void *arg)
{
int s;
struct sockaddr_in sa;
socklen_t addrlen ;
int flags;
char buffer[1024];

while (1) {
addrlen = sizeof(sa);
s = accept(fd, (struct sockaddr *)sa, addrlen);
if (s == -1) continue;
flags = 0;
recv(s, buffer, 1024, 0);
send(s, Answer\r\n, 8, 0);
close(s);
nb_acc++;
}
}

void *do_conn(void *arg)
{
int i;
int on = 1;
struct sockaddr_in sa;
unsigned long *cpt = (unsigned long *)arg;
for (i = 0 ; i  limit ; i++) {
int s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
int res;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, on, 4);
memset(sa, 0, sizeof(sa));
sa.sin_addr.s_addr = htonl(0x7f01);
sa.sin_port = htons(port);
sa.sin_family = AF_INET;
res = connect(s, (struct sockaddr *)sa, sizeof(sa));
if (res == 0) {
char buffer[1024];
send(s, question\r\n, 10, 0);
recv(s, buffer, sizeof(buffer), 0);
(*cpt)++;
}
else {
static int errcnt = 0;
if (errcnt++  10) printf(connect error %d\n, errno);
}
close(s);
}
}

int main(int argc, char *argv[])
{
int on = 1;
struct sockaddr_in sa;
pthread_t tid, tid1, tid2, tid3;
int i;
void *res;

while ((i = getopt(argc, argv, Vn:)) != EOF) {
if (i == 'n')
limit = atoi(optarg) / 3;
}
fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, on, 4);
memset(sa, 0, sizeof(sa));
sa.sin_port = htons(port);
sa.sin_family = AF_INET;
if (bind(fd, (struct sockaddr *)sa, sizeof(sa)) == -1) {
perror(bind);
return 1;
}
listen(fd, 3);
pthread_create(tid, NULL, do_accept, NULL);
pthread_create(tid1, NULL, do_conn, nb_conn1);
pthread_create(tid2, NULL, do_conn, nb_conn2);
pthread_create(tid3, NULL, do_conn, nb_conn3);
pthread_join(tid1, res);
pthread_join(tid2, res);
pthread_join(tid3, res);
printf(nb_conn=%lu nb_accp=%lu\n, nb_conn1 + nb_conn2 + nb_conn3, nb_acc);
return 0;
}

[patch 03/19] user of the jiffies rounding code: e1000

2007-03-06 Thread akpm

From: Arjan van de Ven [EMAIL PROTECTED]

Use the round_jiffies() function in e1000.

These timers all were of the about once a second or about once every X
seconds variety and several showed up in the what wakes the cpu up profiles
that the tickless patches provide.  Some timers are highly dynamic based on
network load; but even on low activity systems they still show up so the
rounding is done only in cases of low activity, allowing higher frequency
timers in the high activity case.

The various hardware watchdogs are an obvious case; they run every 2 seconds
but aren't otherwise specific of exactly when they need to run.

Signed-off-by: Arjan van de Ven [EMAIL PROTECTED]
Acked-by: Auke Kok [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/e1000/e1000_main.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff -puN 
drivers/net/e1000/e1000_main.c~user-of-the-jiffies-rounding-code-e1000 
drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c~user-of-the-jiffies-rounding-code-e1000
+++ a/drivers/net/e1000/e1000_main.c
@@ -2652,7 +2652,7 @@ e1000_watchdog(unsigned long data)
 
netif_carrier_on(netdev);
netif_wake_queue(netdev);
-   mod_timer(adapter-phy_info_timer, jiffies + 2 * HZ);
+   mod_timer(adapter-phy_info_timer, 
round_jiffies(jiffies + 2 * HZ));
adapter-smartspeed = 0;
} else {
/* make sure the receive unit is started */
@@ -2669,7 +2669,7 @@ e1000_watchdog(unsigned long data)
DPRINTK(LINK, INFO, NIC Link is Down\n);
netif_carrier_off(netdev);
netif_stop_queue(netdev);
-   mod_timer(adapter-phy_info_timer, jiffies + 2 * HZ);
+   mod_timer(adapter-phy_info_timer, 
round_jiffies(jiffies + 2 * HZ));
 
/* 80003ES2LAN workaround--
 * For packet buffer work-around on link down event;
@@ -2721,7 +2721,7 @@ e1000_watchdog(unsigned long data)
e1000_rar_set(adapter-hw, adapter-hw.mac_addr, 0);
 
/* Reset the timer */
-   mod_timer(adapter-watchdog_timer, jiffies + 2 * HZ);
+   mod_timer(adapter-watchdog_timer, round_jiffies(jiffies + 2 * HZ));
 }
 
 enum latency_range {
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 11/19] sis900 warning fixes

2007-03-06 Thread akpm

From: Andrew Morton [EMAIL PROTECTED]

drivers/net/sis900.c: In function 'sis900_reset_phy':
drivers/net/sis900.c:972: warning: 'status' may be used uninitialized in this 
function
drivers/net/sis900.c: In function 'sis900_check_mode':
drivers/net/sis900.c:1431: warning: 'status' may be used uninitialized in this 
function
drivers/net/sis900.c: In function 'sis900_timer':
drivers/net/sis900.c:1467: warning: 'status' may be used uninitialized in this 
function


Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/sis900.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff -puN drivers/net/sis900.c~sis900-warning-fixes drivers/net/sis900.c
--- a/drivers/net/sis900.c~sis900-warning-fixes
+++ a/drivers/net/sis900.c
@@ -968,10 +968,10 @@ static void mdio_write(struct net_device
 
 static u16 sis900_reset_phy(struct net_device *net_dev, int phy_addr)
 {
-   int i = 0;
+   int i;
u16 status;
 
-   while (i++  2)
+   for (i = 0; i  2; i++)
status = mdio_read(net_dev, phy_addr, MII_STATUS);
 
mdio_write( net_dev, phy_addr, MII_CONTROL, MII_CNTL_RESET );
@@ -1430,7 +1430,7 @@ static void sis900_auto_negotiate(struct
int i = 0;
u32 status;
 
-   while (i++  2)
+   for (i = 0; i  2; i++)
status = mdio_read(net_dev, phy_addr, MII_STATUS);
 
if (!(status  MII_STAT_LINK)){
@@ -1466,9 +1466,9 @@ static void sis900_read_mode(struct net_
int phy_addr = sis_priv-cur_phy;
u32 status;
u16 autoadv, autorec;
-   int i = 0;
+   int i;
 
-   while (i++  2)
+   for (i = 0; i  2; i++)
status = mdio_read(net_dev, phy_addr, MII_STATUS);
 
if (!(status  MII_STAT_LINK))
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 08/19] dmfe: Fix link detection

2007-03-06 Thread akpm

From: Maxim Levitsky [EMAIL PROTECTED]

Add link detection

Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |   58 +
 1 file changed, 40 insertions(+), 18 deletions(-)

diff -puN drivers/net/tulip/dmfe.c~dmfe-fix-link-detection 
drivers/net/tulip/dmfe.c
--- a/drivers/net/tulip/dmfe.c~dmfe-fix-link-detection
+++ a/drivers/net/tulip/dmfe.c
@@ -248,7 +248,6 @@ struct dmfe_board_info {
u8 media_mode;  /* user specify media mode */
u8 op_mode; /* real work media mode */
u8 phy_addr;
-   u8 link_failed; /* Ever link failed */
u8 wait_reset;  /* Hardware failed, need to reset */
u8 dm910x_chk_mode; /* Operating mode check */
u8 first_in_callback;   /* Flag to record state */
@@ -447,6 +446,7 @@ static int __devinit dmfe_init_one (stru
dev-poll_controller = poll_dmfe;
 #endif
dev-ethtool_ops = netdev_ethtool_ops;
+   netif_carrier_off(dev);
spin_lock_init(db-lock);
 
pci_read_config_dword(pdev, 0x50, pci_pmr);
@@ -541,7 +541,6 @@ static int dmfe_open(struct DEVICE *dev)
db-tx_packet_cnt = 0;
db-tx_queue_cnt = 0;
db-rx_avail_cnt = 0;
-   db-link_failed = 1;
db-wait_reset = 0;
 
db-first_in_callback = 0;
@@ -1082,6 +1081,7 @@ static void netdev_get_drvinfo(struct ne
 
 static const struct ethtool_ops netdev_ethtool_ops = {
.get_drvinfo= netdev_get_drvinfo,
+   .get_link   = ethtool_op_get_link,
 };
 
 /*
@@ -1097,6 +1097,8 @@ static void dmfe_timer(unsigned long dat
struct dmfe_board_info *db = netdev_priv(dev);
unsigned long flags;
 
+   int link_ok, link_ok_phy;
+
DMFE_DBUG(0, dmfe_timer(), 0);
spin_lock_irqsave(db-lock, flags);
 
@@ -1168,15 +1170,35 @@ static void dmfe_timer(unsigned long dat
(db-chip_revision == 0x0210)) ) {
/* DM9102A Chip */
if (tmp_cr12  2)
-   tmp_cr12 = 0x0; /* Link failed */
+   link_ok = 0;
else
-   tmp_cr12 = 0x3; /* Link OK */
+   link_ok = 1;
}
+   else
+   /*0x43 is used instead of 0x3 because bit 6 should represent
+   link status of external PHY */
+   link_ok = (tmp_cr12  0x43) ? 1 : 0;
+
+
+   /* If chip reports that link is failed it could be because external
+   PHY link status pin is not conected correctly to chip
+   To be sure ask PHY too.
+   */
+
+   /* need a dummy read because of PHY's register latch*/
+   phy_read (db-ioaddr, db-phy_addr, 1, db-chip_id);
+   link_ok_phy = (phy_read (db-ioaddr,
+  db-phy_addr, 1, db-chip_id)  0x4) ? 1 : 0;
+
+   if (link_ok_phy != link_ok) {
+   DMFE_DBUG (0, PHY and chip report different link status, 0);
+   link_ok = link_ok | link_ok_phy;
+   }
 
-   if ( !(tmp_cr12  0x3)  !db-link_failed ) {
+   if ( !link_ok  netif_carrier_ok(dev)) {
/* Link Failed */
DMFE_DBUG(0, Link Failed, tmp_cr12);
-   db-link_failed = 1;
+   netif_carrier_off(dev);
 
/* For Force 10/100M Half/Full mode: Enable Auto-Nego mode */
/* AUTO or force 1M Homerun/Longrun don't need */
@@ -1191,19 +1213,19 @@ static void dmfe_timer(unsigned long dat
db-cr6_data=~0x0200;  /* bit9=0, HD mode */
update_cr6(db-cr6_data, db-ioaddr);
}
-   } else
-   if ((tmp_cr12  0x3)  db-link_failed) {
-   DMFE_DBUG(0, Link link OK, tmp_cr12);
-   db-link_failed = 0;
-
-   /* Auto Sense Speed */
-   if ( (db-media_mode  DMFE_AUTO) 
-   dmfe_sense_speed(db) )
-   db-link_failed = 1;
-   dmfe_process_mode(db);
-   /* SHOW_MEDIA_TYPE(db-op_mode); */
+   } else if (!netif_carrier_ok(dev)) {
+
+   DMFE_DBUG(0, Link link OK, tmp_cr12);
+
+   /* Auto Sense Speed */
+   if ( !(db-media_mode  DMFE_AUTO) || !dmfe_sense_speed(db)) {
+   netif_carrier_on(dev);
+   SHOW_MEDIA_TYPE(db-op_mode);
}
 
+   dmfe_process_mode(db);
+   }
+
/* HPNA remote command check */
if (db-HPNA_command  0xf00) {
db-HPNA_timer--;
@@ -1248,7 +1270,7 @@ static void dmfe_dynamic_reset(struct DE
db-tx_packet_cnt = 0;
db-tx_queue_cnt = 0;

[patch 10/19] dmfe: add support for Wake on lan

2007-03-06 Thread akpm

From: Maxim Levitsky [EMAIL PROTECTED]

Add support for WOL on Magic Packet and on link change

Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |   66 +++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff -puN drivers/net/tulip/dmfe.c~dmfe-add-support-for-wake-on-lan 
drivers/net/tulip/dmfe.c
--- a/drivers/net/tulip/dmfe.c~dmfe-add-support-for-wake-on-lan
+++ a/drivers/net/tulip/dmfe.c
@@ -122,6 +122,11 @@
 #define DM9801_NOISE_FLOOR 8
 #define DM9802_NOISE_FLOOR 5
 
+#define DMFE_WOL_LINKCHANGE0x2000
+#define DMFE_WOL_SAMPLEPACKET  0x1000
+#define DMFE_WOL_MAGICPACKET   0x0800
+
+
 #define DMFE_10MHF  0
 #define DMFE_100MHF 1
 #define DMFE_10MFD  4
@@ -248,6 +253,7 @@ struct dmfe_board_info {
u8 wait_reset;  /* Hardware failed, need to reset */
u8 dm910x_chk_mode; /* Operating mode check */
u8 first_in_callback;   /* Flag to record state */
+   u8 wol_mode;/* user WOL settings */
struct timer_list timer;
 
/* System defined statistic counter */
@@ -428,6 +434,7 @@ static int __devinit dmfe_init_one (stru
db-chip_id = ent-driver_data;
db-ioaddr = pci_resource_start(pdev, 0);
db-chip_revision = dev_rev;
+   db-wol_mode = 0;
 
db-pdev = pdev;
 
@@ -1062,7 +1069,11 @@ static void dmfe_set_filter_mode(struct 
spin_unlock_irqrestore(db-lock, flags);
 }
 
-static void netdev_get_drvinfo(struct net_device *dev,
+/*
+ * Ethtool interace
+ */
+
+static void dmfe_ethtool_get_drvinfo(struct net_device *dev,
   struct ethtool_drvinfo *info)
 {
struct dmfe_board_info *np = netdev_priv(dev);
@@ -1076,9 +1087,35 @@ static void netdev_get_drvinfo(struct ne
dev-base_addr, dev-irq);
 }
 
+static int dmfe_ethtool_set_wol(struct net_device *dev,
+   struct ethtool_wolinfo *wolinfo)
+{
+   struct dmfe_board_info *db = netdev_priv(dev);
+
+   if (wolinfo-wolopts  (WAKE_UCAST | WAKE_MCAST | WAKE_BCAST |
+   WAKE_ARP | WAKE_MAGICSECURE))
+  return -EOPNOTSUPP;
+
+   db-wol_mode = wolinfo-wolopts;
+   return 0;
+}
+
+static void dmfe_ethtool_get_wol(struct net_device *dev,
+struct ethtool_wolinfo *wolinfo)
+{
+   struct dmfe_board_info *db = netdev_priv(dev);
+
+   wolinfo-supported = WAKE_PHY | WAKE_MAGIC;
+   wolinfo-wolopts = db-wol_mode;
+   return;
+}
+
+
 static const struct ethtool_ops netdev_ethtool_ops = {
-   .get_drvinfo= netdev_get_drvinfo,
+   .get_drvinfo= dmfe_ethtool_get_drvinfo,
.get_link   = ethtool_op_get_link,
+   .set_wol= dmfe_ethtool_set_wol,
+   .get_wol= dmfe_ethtool_get_wol,
 };
 
 /*
@@ -2052,6 +2089,7 @@ static int dmfe_suspend(struct pci_dev *
 {
struct net_device *dev = pci_get_drvdata(pci_dev);
struct dmfe_board_info *db = netdev_priv(dev);
+   u32 tmp;
 
/* Disable upper layer interface */
netif_device_detach(dev);
@@ -2067,6 +2105,20 @@ static int dmfe_suspend(struct pci_dev *
/* Fre RX buffers */
dmfe_free_rxbuffer(db);
 
+   /* Enable WOL */
+   pci_read_config_dword(pci_dev, 0x40, tmp);
+   tmp = ~(DMFE_WOL_LINKCHANGE|DMFE_WOL_MAGICPACKET);
+
+   if (db-wol_mode  WAKE_PHY)
+   tmp |= DMFE_WOL_LINKCHANGE;
+   if (db-wol_mode  WAKE_MAGIC)
+   tmp |= DMFE_WOL_MAGICPACKET;
+
+   pci_write_config_dword(pci_dev, 0x40, tmp);
+
+   pci_enable_wake(pci_dev, PCI_D3hot, 1);
+   pci_enable_wake(pci_dev, PCI_D3cold, 1);
+
/* Power down device*/
pci_set_power_state(pci_dev, pci_choose_state (pci_dev,state));
pci_save_state(pci_dev);
@@ -2077,6 +2129,7 @@ static int dmfe_suspend(struct pci_dev *
 static int dmfe_resume(struct pci_dev *pci_dev)
 {
struct net_device *dev = pci_get_drvdata(pci_dev);
+   u32 tmp;
 
pci_restore_state(pci_dev);
pci_set_power_state(pci_dev, PCI_D0);
@@ -2084,6 +2137,15 @@ static int dmfe_resume(struct pci_dev *p
/* Re-initilize DM910X board */
dmfe_init_dm910x(dev);
 
+   /* Disable WOL */
+   pci_read_config_dword(pci_dev, 0x40, tmp);
+
+   tmp = ~(DMFE_WOL_LINKCHANGE | DMFE_WOL_MAGICPACKET);
+   pci_write_config_dword(pci_dev, 0x40, tmp);
+
+   pci_enable_wake(pci_dev, PCI_D3hot, 0);
+   pci_enable_wake(pci_dev, PCI_D3cold, 0);
+
/* Restart upper layer interface */
netif_device_attach(dev);
 
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

[patch 04/19] phy layer: add kernel-doc + DocBook

2007-03-06 Thread akpm

From: Randy Dunlap [EMAIL PROTECTED]

Convert function documentation in drivers/net/phy/ to kernel-doc
and add it to DocBook.

Signed-off-by: Randy Dunlap [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 Documentation/DocBook/kernel-api.tmpl |6 
 drivers/net/phy/mdio_bus.c|   19 +-
 drivers/net/phy/phy.c |  192 +---
 drivers/net/phy/phy_device.c  |  114 +-
 4 files changed, 234 insertions(+), 97 deletions(-)

diff -puN 
Documentation/DocBook/kernel-api.tmpl~phy-layer-add-kernel-doc-docbook 
Documentation/DocBook/kernel-api.tmpl
--- a/Documentation/DocBook/kernel-api.tmpl~phy-layer-add-kernel-doc-docbook
+++ a/Documentation/DocBook/kernel-api.tmpl
@@ -236,6 +236,12 @@ X!Ilib/string.c
 !Enet/core/dev.c
 !Enet/ethernet/eth.c
 !Iinclude/linux/etherdevice.h
+!Edrivers/net/phy/phy.c
+!Idrivers/net/phy/phy.c
+!Edrivers/net/phy/phy_device.c
+!Idrivers/net/phy/phy_device.c
+!Edrivers/net/phy/mdio_bus.c
+!Idrivers/net/phy/mdio_bus.c
 !-- FIXME: Removed for now since no structured comments in source
 X!Enet/core/wireless.c
 --
diff -puN drivers/net/phy/mdio_bus.c~phy-layer-add-kernel-doc-docbook 
drivers/net/phy/mdio_bus.c
--- a/drivers/net/phy/mdio_bus.c~phy-layer-add-kernel-doc-docbook
+++ a/drivers/net/phy/mdio_bus.c
@@ -35,10 +35,14 @@
 #include asm/irq.h
 #include asm/uaccess.h
 
-/* mdiobus_register 
+/**
+ * mdiobus_register - bring up all the PHYs on a given bus and attach them to 
bus
+ * @bus: target mii_bus
  *
- * description: Called by a bus driver to bring up all the PHYs
- *   on a given bus, and attach them to the bus
+ * Description: Called by a bus driver to bring up all the PHYs
+ *   on a given bus, and attach them to the bus.
+ *
+ * Returns 0 on success or  0 on error.
  */
 int mdiobus_register(struct mii_bus *bus)
 {
@@ -114,10 +118,13 @@ void mdiobus_unregister(struct mii_bus *
 }
 EXPORT_SYMBOL(mdiobus_unregister);
 
-/* mdio_bus_match
+/**
+ * mdio_bus_match - determine if given PHY driver supports the given PHY device
+ * @dev: target PHY device
+ * @drv: given PHY driver
  *
- * description: Given a PHY device, and a PHY driver, return 1 if
- *   the driver supports the device.  Otherwise, return 0
+ * Description: Given a PHY device, and a PHY driver, return 1 if
+ *   the driver supports the device.  Otherwise, return 0.
  */
 static int mdio_bus_match(struct device *dev, struct device_driver *drv)
 {
diff -puN drivers/net/phy/phy.c~phy-layer-add-kernel-doc-docbook 
drivers/net/phy/phy.c
--- a/drivers/net/phy/phy.c~phy-layer-add-kernel-doc-docbook
+++ a/drivers/net/phy/phy.c
@@ -39,7 +39,9 @@
 #include asm/irq.h
 #include asm/uaccess.h
 
-/* Convenience function to print out the current phy status
+/**
+ * phy_print_status - Convenience function to print out the current phy status
+ * @phydev: the phy_device struct
  */
 void phy_print_status(struct phy_device *phydev)
 {
@@ -55,10 +57,15 @@ void phy_print_status(struct phy_device 
 EXPORT_SYMBOL(phy_print_status);
 
 
-/* Convenience functions for reading/writing a given PHY
- * register. They MUST NOT be called from interrupt context,
+/**
+ * phy_read - Convenience function for reading a given PHY register
+ * @phydev: the phy_device struct
+ * @regnum: register number to read
+ *
+ * NOTE: MUST NOT be called from interrupt context,
  * because the bus read/write functions may wait for an interrupt
- * to conclude the operation. */
+ * to conclude the operation.
+ */
 int phy_read(struct phy_device *phydev, u16 regnum)
 {
int retval;
@@ -72,6 +79,16 @@ int phy_read(struct phy_device *phydev, 
 }
 EXPORT_SYMBOL(phy_read);
 
+/**
+ * phy_write - Convenience function for writing a given PHY register
+ * @phydev: the phy_device struct
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
 int phy_write(struct phy_device *phydev, u16 regnum, u16 val)
 {
int err;
@@ -85,7 +102,15 @@ int phy_write(struct phy_device *phydev,
 }
 EXPORT_SYMBOL(phy_write);
 
-
+/**
+ * phy_clear_interrupt - Ack the phy device's interrupt
+ * @phydev: the phy_device struct
+ *
+ * If the @phydev driver has an ack_interrupt function, call it to
+ * ack and clear the phy device's interrupt.
+ *
+ * Returns 0 on success on  0 on error.
+ */
 int phy_clear_interrupt(struct phy_device *phydev)
 {
int err = 0;
@@ -96,7 +121,13 @@ int phy_clear_interrupt(struct phy_devic
return err;
 }
 
-
+/**
+ * phy_config_interrupt - configure the PHY device for the requested interrupts
+ * @phydev: the phy_device struct
+ * @interrupts: interrupt flags to configure for this @phydev
+ *
+ * Returns 0 on success on  0 on error.
+ */
 int phy_config_interrupt(struct phy_device *phydev, u32 interrupts)
 {
int err = 0;
@@ -109,9 +140,11 @@ int phy_config_interrupt(struct

[patch 06/19] dmfe: trivial/spelling fixes

2007-03-06 Thread akpm

From: Maxim Levitsky [EMAIL PROTECTED]

Fix a typo, wrap lines on 80-th column, change KERN_ERR to KERN_INFO for
link status message

Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |  126 ++---
 1 file changed, 89 insertions(+), 37 deletions(-)

diff -puN drivers/net/tulip/dmfe.c~dmfe-trivial-spelling-fixes 
drivers/net/tulip/dmfe.c
--- a/drivers/net/tulip/dmfe.c~dmfe-trivial-spelling-fixes
+++ a/drivers/net/tulip/dmfe.c
@@ -143,9 +143,16 @@
 #define DMFE_TX_TIMEOUT ((3*HZ)/2) /* tx packet time-out time 1.5 s */
 #define DMFE_TX_KICK   (HZ/2)  /* tx packet Kick-out time 0.5 s */
 
-#define DMFE_DBUG(dbug_now, msg, value) if (dmfe_debug || (dbug_now)) 
printk(KERN_ERR DRV_NAME : %s %lx\n, (msg), (long) (value))
-
-#define SHOW_MEDIA_TYPE(mode) printk(KERN_ERR DRV_NAME : Change Speed to 
%sMhz %s duplex\n,mode  1 ?100:10, mode  4 ? full:half);
+#define DMFE_DBUG(dbug_now, msg, value) \
+   do { \
+   if (dmfe_debug || (dbug_now)) \
+   printk(KERN_ERR DRV_NAME : %s %lx\n,\
+   (msg), (long) (value)); \
+   } while (0)
+
+#define SHOW_MEDIA_TYPE(mode) \
+   printk (KERN_INFO DRV_NAME : Change Speed to %sMhz %s duplex\n , \
+   (mode  1) ? 100:10, (mode  4) ? full:half);
 
 
 /* CR9 definition: SROM/MII */
@@ -163,10 +170,20 @@
 
 #define SROM_V41_CODE   0x14
 
-#define SROM_CLK_WRITE(data, ioaddr) 
outl(data|CR9_SROM_READ|CR9_SRCS,ioaddr);udelay(5);outl(data|CR9_SROM_READ|CR9_SRCS|CR9_SRCLK,ioaddr);udelay(5);outl(data|CR9_SROM_READ|CR9_SRCS,ioaddr);udelay(5);
+#define SROM_CLK_WRITE(data, ioaddr) \
+   outl(data|CR9_SROM_READ|CR9_SRCS,ioaddr); \
+   udelay(5); \
+   outl(data|CR9_SROM_READ|CR9_SRCS|CR9_SRCLK,ioaddr); \
+   udelay(5); \
+   outl(data|CR9_SROM_READ|CR9_SRCS,ioaddr); \
+   udelay(5);
+
+#define __CHK_IO_SIZE(pci_id, dev_rev) \
+ (( ((pci_id)==PCI_DM9132_ID) || ((dev_rev) = 0x0230) ) ? \
+   DM9102A_IO_SIZE: DM9102_IO_SIZE)
 
-#define __CHK_IO_SIZE(pci_id, dev_rev) ( ((pci_id)==PCI_DM9132_ID) || 
((dev_rev) = 0x0230) ) ? DM9102A_IO_SIZE: DM9102_IO_SIZE
-#define CHK_IO_SIZE(pci_dev, dev_rev) __CHK_IO_SIZE(((pci_dev)-device  16) 
| (pci_dev)-vendor, dev_rev)
+#define CHK_IO_SIZE(pci_dev, dev_rev) \
+   (__CHK_IO_SIZE(((pci_dev)-device  16) | (pci_dev)-vendor, dev_rev))
 
 /* Sten Check */
 #define DEVICE net_device
@@ -329,7 +346,7 @@ static void dmfe_program_DM9802(struct d
 static void dmfe_HPNA_remote_cmd_chk(struct dmfe_board_info * );
 static void dmfe_set_phyxcer(struct dmfe_board_info *);
 
-/* DM910X network baord routine  */
+/* DM910X network board routine  */
 
 /*
  * Search DM910X board ,allocate space and register it
@@ -356,7 +373,8 @@ static int __devinit dmfe_init_one (stru
SET_NETDEV_DEV(dev, pdev-dev);
 
if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
-   printk(KERN_WARNING DRV_NAME : 32-bit PCI DMA not 
available.\n);
+   printk(KERN_WARNING DRV_NAME
+   : 32-bit PCI DMA not available.\n);
err = -ENODEV;
goto err_out_free;
}
@@ -400,8 +418,11 @@ static int __devinit dmfe_init_one (stru
db = netdev_priv(dev);
 
/* Allocate Tx/Rx descriptor memory */
-   db-desc_pool_ptr = pci_alloc_consistent(pdev, sizeof(struct tx_desc) * 
DESC_ALL_CNT + 0x20, db-desc_pool_dma_ptr);
-   db-buf_pool_ptr = pci_alloc_consistent(pdev, TX_BUF_ALLOC * 
TX_DESC_CNT + 4, db-buf_pool_dma_ptr);
+   db-desc_pool_ptr = pci_alloc_consistent(pdev, sizeof(struct tx_desc) *
+   DESC_ALL_CNT + 0x20, db-desc_pool_dma_ptr);
+
+   db-buf_pool_ptr = pci_alloc_consistent(pdev, TX_BUF_ALLOC *
+   TX_DESC_CNT + 4, db-buf_pool_dma_ptr);
 
db-first_tx_desc = (struct tx_desc *) db-desc_pool_ptr;
db-first_tx_desc_dma = db-desc_pool_dma_ptr;
@@ -437,7 +458,8 @@ static int __devinit dmfe_init_one (stru
 
/* read 64 word srom data */
for (i = 0; i  64; i++)
-   ((u16 *) db-srom)[i] = cpu_to_le16(read_srom_word(db-ioaddr, 
i));
+   ((u16 *) db-srom)[i] =
+   cpu_to_le16(read_srom_word(db-ioaddr, i));
 
/* Set Node address */
for (i = 0; i  6; i++)
@@ -506,7 +528,8 @@ static int dmfe_open(struct DEVICE *dev)
 
DMFE_DBUG(0, dmfe_open, 0);
 
-   ret = request_irq(dev-irq, dmfe_interrupt, IRQF_SHARED, dev-name, 
dev);
+   ret = request_irq(dev-irq, dmfe_interrupt,
+ IRQF_SHARED, dev-name, dev);
if (ret)
return ret;
 
@@ -647,7 +670,8 @@ static int dmfe_start_xmit(struct sk_buf
/* No Tx resource check, it never happen nromally */

[patch 09/19] dmfe: add support for suspend/resume

2007-03-06 Thread akpm

From: Maxim Levitsky [EMAIL PROTECTED]

This adds support for suspend resume

[EMAIL PROTECTED]: fix CONFIG_PM=n, coding style]
Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |   52 ++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff -puN drivers/net/tulip/dmfe.c~dmfe-add-support-for-suspend-resume 
drivers/net/tulip/dmfe.c
--- a/drivers/net/tulip/dmfe.c~dmfe-add-support-for-suspend-resume
+++ a/drivers/net/tulip/dmfe.c
@@ -55,9 +55,6 @@
 
 TODO
 
-Implement pci_driver::suspend() and pci_driver::resume()
-power management methods.
-
 Check on 64 bit boxes.
 Check and fix on big endian boxes.
 
@@ -2050,11 +2047,60 @@ static struct pci_device_id dmfe_pci_tbl
 MODULE_DEVICE_TABLE(pci, dmfe_pci_tbl);
 
 
+#ifdef CONFIG_PM
+static int dmfe_suspend(struct pci_dev *pci_dev, pm_message_t state)
+{
+   struct net_device *dev = pci_get_drvdata(pci_dev);
+   struct dmfe_board_info *db = netdev_priv(dev);
+
+   /* Disable upper layer interface */
+   netif_device_detach(dev);
+
+   /* Disable Tx/Rx */
+   db-cr6_data = ~(CR6_RXSC | CR6_TXSC);
+   update_cr6(db-cr6_data, dev-base_addr);
+
+   /* Disable Interrupt */
+   outl(0, dev-base_addr + DCR7);
+   outl(inl (dev-base_addr + DCR5), dev-base_addr + DCR5);
+
+   /* Fre RX buffers */
+   dmfe_free_rxbuffer(db);
+
+   /* Power down device*/
+   pci_set_power_state(pci_dev, pci_choose_state (pci_dev,state));
+   pci_save_state(pci_dev);
+
+   return 0;
+}
+
+static int dmfe_resume(struct pci_dev *pci_dev)
+{
+   struct net_device *dev = pci_get_drvdata(pci_dev);
+
+   pci_restore_state(pci_dev);
+   pci_set_power_state(pci_dev, PCI_D0);
+
+   /* Re-initilize DM910X board */
+   dmfe_init_dm910x(dev);
+
+   /* Restart upper layer interface */
+   netif_device_attach(dev);
+
+   return 0;
+}
+#else
+#define dmfe_suspend NULL
+#define dmfe_resume NULL
+#endif
+
 static struct pci_driver dmfe_driver = {
.name   = dmfe,
.id_table   = dmfe_pci_tbl,
.probe  = dmfe_init_one,
.remove = __devexit_p(dmfe_remove_one),
+   .suspend= dmfe_suspend,
+   .resume = dmfe_resume
 };
 
 MODULE_AUTHOR(Sten Wang, [EMAIL PROTECTED]);
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 17/19] sk98lin: handle pci_enable_device() return value in skge_resume()

2007-03-06 Thread akpm

From: Dmitriy Monakhov [EMAIL PROTECTED]

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/sk98lin/skge.c |   20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff -puN 
drivers/net/sk98lin/skge.c~sk98lin-handle-pci_enable_device-return-value-in-skge_resume
 drivers/net/sk98lin/skge.c
--- 
a/drivers/net/sk98lin/skge.c~sk98lin-handle-pci_enable_device-return-value-in-skge_resume
+++ a/drivers/net/sk98lin/skge.c
@@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING sk98lin: unable to enable device %s 
+   in resume\n, dev-name);
+   goto err_out;
+   }
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);
@@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *p
ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, 
sk98lin, dev);
if (ret) {
printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, 
dev-irq);
-   pAC-AllocFlag = ~SK_ALLOC_IRQ;
-   dev-irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto err_out_disable_pdev;
}
 
netif_device_attach(dev);
@@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *p
}
 
return 0;
+
+err_out_disable_pdev:
+   pci_disable_device(pdev);
+err_out:
+   pAC-AllocFlag = ~SK_ALLOC_IRQ;
+   dev-irq = 0;
+   return ret;
 }
 #else
 #define skge_suspend NULL
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 14/19] drivers/net/vioc/vioc_driver.c: replace pci_module_init with pci_register_driver

2007-03-06 Thread akpm

From: Richard Knutsson [EMAIL PROTECTED]

Replace pci_module_init with pci_register_driver

Signed-off-by: Richard Knutson [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/vioc/vioc_driver.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN 
drivers/net/vioc/vioc_driver.c~drivers-net-vioc-vioc_driverc-replace-pci_module_init-with-pci_register_driver
 drivers/net/vioc/vioc_driver.c
--- 
a/drivers/net/vioc/vioc_driver.c~drivers-net-vioc-vioc_driverc-replace-pci_module_init-with-pci_register_driver
+++ a/drivers/net/vioc/vioc_driver.c
@@ -841,9 +841,9 @@ static int __init vioc_module_init(void)
vioc_irq_init();
spp_init();
 
-   ret = pci_module_init(vioc_driver);
+   ret = pci_register_driver(vioc_driver);
if (ret) {
-   printk(KERN_ERR %s: pci_module_init() - %d\n, __FUNCTION__,
+   printk(KERN_ERR %s: pci_register_driver() - %d\n, 
__FUNCTION__,
   ret);
vioc_irq_exit();
return ret;
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 19/19] network: add the missing phy_device speed information to phy_mii_ioctl

2007-03-06 Thread akpm

From: Shan Lu [EMAIL PROTECTED]

Function `phy_mii_ioctl' returns physical device's information based on
user requests.  When requested to return the basic mode control register
information (BMCR), the original implementation only returns the physical
device's duplex information and forgets to return speed information, which
should not be because BMCR register is used to hold both duplex and speed
information.

The patch checks the BMCR value against speed-related flags and fills the
return structure's speed field accordingly.

Signed-off-by: Shan [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/phy/phy.c |6 ++
 1 file changed, 6 insertions(+)

diff -puN 
drivers/net/phy/phy.c~network-add-the-missing-phy_device-speed-information-to-phy_mii_ioctl
 drivers/net/phy/phy.c
--- 
a/drivers/net/phy/phy.c~network-add-the-missing-phy_device-speed-information-to-phy_mii_ioctl
+++ a/drivers/net/phy/phy.c
@@ -382,6 +382,12 @@ int phy_mii_ioctl(struct phy_device *phy
phydev-duplex = DUPLEX_FULL;
else
phydev-duplex = DUPLEX_HALF;
+   if ((!phydev-autoneg) 
+   (val  BMCR_SPEED1000))
+   phydev-speed = SPEED_1000;
+   else if ((!phydev-autoneg) 
+   (val  BMCR_SPEED100))
+   phydev-speed = SPEED_100;
break;
case MII_ADVERTISE:
phydev-advertising = val;
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 12/19] fix irq problem with NAPI + NETPOLL

2007-03-06 Thread akpm

From: Atsushi Nemoto [EMAIL PROTECTED]

It seems netif_receive_skb() was designed not to call from irq context, but
NAPI + NETPOLL break this rule.  If netif_receive_skb() was called from irq
context, redirect to netif_rx() instead of processing the skb in that
context.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Cc: Francois Romieu [EMAIL PROTECTED]
Cc: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 net/core/dev.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff -puN net/core/dev.c~8139too-fix-irq-problem-with-napi-netpoll 
net/core/dev.c
--- a/net/core/dev.c~8139too-fix-irq-problem-with-napi-netpoll
+++ a/net/core/dev.c
@@ -1769,8 +1769,15 @@ int netif_receive_skb(struct sk_buff *sk
__be16 type;
 
/* if we've gotten here through NAPI, check netpoll */
-   if (skb-dev-poll  netpoll_rx(skb))
-   return NET_RX_DROP;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   if (skb-dev-poll  skb-dev-poll_controller) {
+   /* NAPI poll might be called in irq context on NETPOLL */
+   if (in_irq() || irqs_disabled())
+   return netif_rx(skb);
+   if (netpoll_rx(skb))
+   return NET_RX_DROP;
+   }
+#endif
 
if (!skb-tstamp.off_sec)
net_timestamp(skb);
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 16/19] devinit devexit cleanups for de2104x driver

2007-03-06 Thread akpm

From: Prarit Bhargava [EMAIL PROTECTED]

Fixes MODPOST warnings similar to:

WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to
.init.text:de_init_one from .data.rel.local after 'de_driver' (at offset 0x20)
WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to
.exit.text:de_remove_one from .data.rel.local after 'de_driver' (at offset 0x28)

Signed-off-by: Prarit Bhargava [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/de2104x.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff -puN 
drivers/net/tulip/de2104x.c~__devinit-__devexit-cleanups-for-de2104x-driver 
drivers/net/tulip/de2104x.c
--- 
a/drivers/net/tulip/de2104x.c~__devinit-__devexit-cleanups-for-de2104x-driver
+++ a/drivers/net/tulip/de2104x.c
@@ -1685,7 +1685,7 @@ static const struct ethtool_ops de_ethto
.get_regs   = de_get_regs,
 };
 
-static void __init de21040_get_mac_address (struct de_private *de)
+static void __devinit de21040_get_mac_address (struct de_private *de)
 {
unsigned i;
 
@@ -1703,7 +1703,7 @@ static void __init de21040_get_mac_addre
}
 }
 
-static void __init de21040_get_media_info(struct de_private *de)
+static void __devinit de21040_get_media_info(struct de_private *de)
 {
unsigned int i;
 
@@ -1765,7 +1765,7 @@ static unsigned __devinit tulip_read_eep
return retval;
 }
 
-static void __init de21041_get_srom_info (struct de_private *de)
+static void __devinit de21041_get_srom_info (struct de_private *de)
 {
unsigned i, sa_offset = 0, ofs;
u8 ee_data[DE_EEPROM_SIZE + 6] = {};
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/2] 8139too: force media setting cleanup

2007-03-06 Thread akpm

From: Bernard Lee [EMAIL PROTECTED]

Setting bit 4  5 alone in 8139too module media option does not really
force 100Mbps full-duplex mode.  When media option bit 0-3 is cleared,
8139too module does not force media setting.  Therefore, bit 0-3 requires
to be set for bit 4  5 to take effect.  The hidden bit 0-3 setting is not
stated in module description.

It can be fixed by changing rtl8139_private structure default_port bitfield
from 4-bit to 6-bit.

Besides, module media bit 9 is a duplicate of bit 4 (full-duplex).  It is
suggested that bit 9 is freed.  A remark is added to module description
that bit 0 can be used to force setting.  It helps to clarify 10Mbps
half-duplex mode.

Signed-off-by: Bernard Lee [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/8139too.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN drivers/net/8139too.c~8139too-force-media-setting-fix 
drivers/net/8139too.c
--- a/drivers/net/8139too.c~8139too-force-media-setting-fix
+++ a/drivers/net/8139too.c
@@ -586,7 +586,7 @@ struct rtl8139_private {
signed char phys[4];/* MII device addresses. */
char twistie, twist_row, twist_col; /* Twister tune state. */
unsigned int watchdog_fired : 1;
-   unsigned int default_port : 4;  /* Last dev-if_port value. */
+   unsigned int default_port : 6;  /* Last dev-if_port value. */
unsigned int have_thread : 1;
spinlock_t lock;
spinlock_t rx_lock;
@@ -612,7 +612,7 @@ module_param_array(full_duplex, int, NUL
 module_param(debug, int, 0);
 MODULE_PARM_DESC (debug, 8139too bitmapped message enable number);
 MODULE_PARM_DESC (multicast_filter_limit, 8139too maximum number of filtered 
multicast addresses);
-MODULE_PARM_DESC (media, 8139too: Bits 4+9: force full duplex, bit 5: 
100Mbps);
+MODULE_PARM_DESC (media, 8139too: bit 0: force setting, bit 4: full duplex, 
bit 5: 100Mbps);
 MODULE_PARM_DESC (full_duplex, 8139too: Force full duplex for board(s) (1));
 
 static int read_eeprom (void __iomem *ioaddr, int location, int addr_len);
@@ -1068,8 +1068,8 @@ static int __devinit rtl8139_init_one (s
/* The lower four bits are the media type. */
option = (board_idx = MAX_UNITS) ? 0 : media[board_idx];
if (option  0) {
-   tp-mii.full_duplex = (option  0x210) ? 1 : 0;
-   tp-default_port = option  0xFF;
+   tp-mii.full_duplex = (option  0x10) ? 1 : 0;
+   tp-default_port = option  0x3F;
if (tp-default_port)
tp-mii.force_media = 1;
}
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 13/19] cxgb3: add SW LRO support

2007-03-06 Thread akpm

From: Divy Le Ray [EMAIL PROTECTED]

Add all-in-sw lro support.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/cxgb3/adapter.h |   21 +
 drivers/net/cxgb3/common.h  |1 
 drivers/net/cxgb3/cxgb3_ioctl.h |1 
 drivers/net/cxgb3/cxgb3_main.c  |   16 +
 drivers/net/cxgb3/sge.c |  341 +-
 drivers/net/cxgb3/t3_cpl.h  |   10 
 6 files changed, 384 insertions(+), 6 deletions(-)

diff -puN drivers/net/cxgb3/adapter.h~cxgb3-add-sw-lro-support 
drivers/net/cxgb3/adapter.h
--- a/drivers/net/cxgb3/adapter.h~cxgb3-add-sw-lro-support
+++ a/drivers/net/cxgb3/adapter.h
@@ -95,6 +95,23 @@ struct sge_fl {  /* SGE per free-buffer
unsigned long alloc_failed; /* # of times buffer allocation failed */
 };
 
+/* Max active LRO sessions per queue set */
+#define MAX_LRO_PER_QSET 8
+
+struct sge_lro_session {
+   struct sk_buff *skb;
+   struct sk_buff *skb_last_frag;
+   u32 seq;
+   u16 iplen;
+};
+
+struct sge_lro {
+   unsigned int enabled;
+   unsigned int num_active;
+   struct sge_lro_session *last_s;
+   struct sge_lro_session s[MAX_LRO_PER_QSET];
+};
+
 /*
  * Bundle size for grouping offload RX packets for delivery to the stack.
  * Don't make this too big as we do prefetch on each packet in a bundle.
@@ -164,6 +181,9 @@ enum {  /* per port SGE 
statistics */
SGE_PSTAT_TX_CSUM,  /* # of TX checksum offloads */
SGE_PSTAT_VLANEX,   /* # of VLAN tag extractions */
SGE_PSTAT_VLANINS,  /* # of VLAN tag insertions */
+   SGE_PSTATS_LRO_QUEUED,  /* # of LRO appended packets */
+   SGE_PSTATS_LRO_FLUSHED, /* # of LRO flushed packets */
+   SGE_PSTATS_LRO_X_STREAMS,   /* # of exceeded LRO contexts */
 
SGE_PSTAT_MAX   /* must be last */
 };
@@ -171,6 +191,7 @@ enum {  /* per port SGE 
statistics */
 struct sge_qset {  /* an SGE queue set */
struct sge_rspq rspq;
struct sge_fl fl[SGE_RXQ_PER_SET];
+   struct sge_lro lro;
struct sge_txq txq[SGE_TXQ_PER_SET];
struct net_device *netdev;  /* associated net device */
unsigned long txq_stopped;  /* which Tx queues are stopped */
diff -puN drivers/net/cxgb3/common.h~cxgb3-add-sw-lro-support 
drivers/net/cxgb3/common.h
--- a/drivers/net/cxgb3/common.h~cxgb3-add-sw-lro-support
+++ a/drivers/net/cxgb3/common.h
@@ -322,6 +322,7 @@ struct tp_params {
 
 struct qset_params {   /* SGE queue set parameters */
unsigned int polling;   /* polling/interrupt service for rspq */
+   unsigned int lro;   /* large receive offload */
unsigned int coalesce_usecs;/* irq coalescing timer */
unsigned int rspq_size; /* # of entries in response queue */
unsigned int fl_size;   /* # of entries in regular free list */
diff -puN drivers/net/cxgb3/cxgb3_ioctl.h~cxgb3-add-sw-lro-support 
drivers/net/cxgb3/cxgb3_ioctl.h
--- a/drivers/net/cxgb3/cxgb3_ioctl.h~cxgb3-add-sw-lro-support
+++ a/drivers/net/cxgb3/cxgb3_ioctl.h
@@ -90,6 +90,7 @@ struct ch_qset_params {
int32_t fl_size[2];
int32_t intr_lat;
int32_t polling;
+   int32_t lro;
int32_t cong_thres;
 };
 
diff -puN drivers/net/cxgb3/cxgb3_main.c~cxgb3-add-sw-lro-support 
drivers/net/cxgb3/cxgb3_main.c
--- a/drivers/net/cxgb3/cxgb3_main.c~cxgb3-add-sw-lro-support
+++ a/drivers/net/cxgb3/cxgb3_main.c
@@ -1031,7 +1031,11 @@ static char stats_strings[][ETH_GSTRING_
VLANinsertions ,
TxCsumOffload  ,
RxCsumGood ,
-   RxDrops
+   RxDrops,
+
+   LroQueued  ,
+   LroFlushed ,
+   LroExceededSessions
 };
 
 static int get_stats_count(struct net_device *dev)
@@ -1145,6 +1149,9 @@ static void get_stats(struct net_device 
*data++ = collect_sge_port_stats(adapter, pi, SGE_PSTAT_TX_CSUM);
*data++ = collect_sge_port_stats(adapter, pi, SGE_PSTAT_RX_CSUM_GOOD);
*data++ = s-rx_cong_drops;
+   *data++ = collect_sge_port_stats(adapter, pi, SGE_PSTATS_LRO_QUEUED);
+   *data++ = collect_sge_port_stats(adapter, pi, SGE_PSTATS_LRO_FLUSHED);
+   *data++ = collect_sge_port_stats(adapter, pi, SGE_PSTATS_LRO_X_STREAMS);
 }
 
 static inline void reg_block_dump(struct adapter *ap, void *buf,
@@ -1624,6 +1631,12 @@ static int cxgb_extension_ioctl(struct n
}
}
}
+   if (t.lro = 0) {
+   struct sge_qset *qs = adapter-sge.qs[t.qset_idx];
+
+   q-lro = t.lro;
+   qs-lro.enabled = t.lro;
+   }
break;
}
case CHELSIO_GET_QSET_PARAMS:{
@@ -1643,6 +1656,7 @@ static int cxgb_extension_ioctl(struct n
t.fl_size[0] = q-fl_size;

[patch 15/19] 3c59x: Handle pci_enable_device() failure while resuming

2007-03-06 Thread akpm

From: Dmitriy Monakhov [EMAIL PROTECTED]

Handle pci_enable_device() failure while resuming, we can safely exit here.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/3c59x.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff -puN 
drivers/net/3c59x.c~3c59x-handle-pci_enable_device-failure-while-resuming 
drivers/net/3c59x.c
--- a/drivers/net/3c59x.c~3c59x-handle-pci_enable_device-failure-while-resuming
+++ a/drivers/net/3c59x.c
@@ -822,11 +822,17 @@ static int vortex_resume(struct pci_dev 
 {
struct net_device *dev = pci_get_drvdata(pdev);
struct vortex_private *vp = netdev_priv(dev);
+   int err;
 
if (dev  vp) {
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   err = pci_enable_device(pdev);
+   if (err) {
+   printk(KERN_WARNING %s: Could not enable device \n,
+   dev-name);
+   return err;
+   }
pci_set_master(pdev);
if (request_irq(dev-irq, vp-full_bus_master_rx ?
boomerang_interrupt : vortex_interrupt, 
IRQF_SHARED, dev-name, dev)) {
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 05/19] revert drivers/net/tulip/dmfe: support basic carrier detection

2007-03-06 Thread akpm

From: Andrew Morton [EMAIL PROTECTED]

Revert 7628b0a8c01a02966d2228bdf741ddedb128e8f8.  Thomas Bachler
reports:

  Commit 7628b0a8c01a02966d2228bdf741ddedb128e8f8 (drivers/net/tulip/dmfe:
  support basic carrier detection) breaks networking on my Davicom DM9009. 
  ethtool always reports there is no link.  tcpdump shows incoming packets,
  but TX is disabled.  Reverting the above patch fixes the problem.


Cc: Samuel Thibault [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Thomas Bachler [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff -puN 
drivers/net/tulip/dmfe.c~revert-drivers-net-tulip-dmfe-support-basic-carrier-detection
 drivers/net/tulip/dmfe.c
--- 
a/drivers/net/tulip/dmfe.c~revert-drivers-net-tulip-dmfe-support-basic-carrier-detection
+++ a/drivers/net/tulip/dmfe.c
@@ -187,7 +187,7 @@ struct rx_desc {
 struct dmfe_board_info {
u32 chip_id;/* Chip vendor/Device ID */
u32 chip_revision;  /* Chip revision */
-   struct DEVICE *dev; /* net device */
+   struct DEVICE *next_dev;/* next device */
struct pci_dev *pdev;   /* PCI device */
spinlock_t lock;
 
@@ -399,8 +399,6 @@ static int __devinit dmfe_init_one (stru
/* Init system  device */
db = netdev_priv(dev);
 
-   db-dev = dev;
-
/* Allocate Tx/Rx descriptor memory */
db-desc_pool_ptr = pci_alloc_consistent(pdev, sizeof(struct tx_desc) * 
DESC_ALL_CNT + 0x20, db-desc_pool_dma_ptr);
db-buf_pool_ptr = pci_alloc_consistent(pdev, TX_BUF_ALLOC * 
TX_DESC_CNT + 4, db-buf_pool_dma_ptr);
@@ -428,7 +426,6 @@ static int __devinit dmfe_init_one (stru
dev-poll_controller = poll_dmfe;
 #endif
dev-ethtool_ops = netdev_ethtool_ops;
-   netif_carrier_off(db-dev);
spin_lock_init(db-lock);
 
pci_read_config_dword(pdev, 0x50, pci_pmr);
@@ -1053,7 +1050,6 @@ static void netdev_get_drvinfo(struct ne
 
 static const struct ethtool_ops netdev_ethtool_ops = {
.get_drvinfo= netdev_get_drvinfo,
-   .get_link   = ethtool_op_get_link,
 };
 
 /*
@@ -1148,7 +1144,6 @@ static void dmfe_timer(unsigned long dat
/* Link Failed */
DMFE_DBUG(0, Link Failed, tmp_cr12);
db-link_failed = 1;
-   netif_carrier_off(db-dev);
 
/* For Force 10/100M Half/Full mode: Enable Auto-Nego mode */
/* AUTO or force 1M Homerun/Longrun don't need */
@@ -1171,8 +1166,6 @@ static void dmfe_timer(unsigned long dat
if ( (db-media_mode  DMFE_AUTO) 
dmfe_sense_speed(db) )
db-link_failed = 1;
-   else
-   netif_carrier_on(db-dev);
dmfe_process_mode(db);
/* SHOW_MEDIA_TYPE(db-op_mode); */
}
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 18/19] mv643xx ethernet driver IRQ registration fix

2007-03-06 Thread akpm

From: Giridhar Pemmasani [EMAIL PROTECTED]

During initialization, mv643xx driver registers IRQ before setting up tx/rx
rings.  This causes kernel oops because mv643xx_poll, which gets called
right after registering IRQ, calls netif_rx_complete, which accesses the rx
ring (I don't have the oops message anymore; I just remember this sequence
of calls).  Attached (tested) patch first initializes the rx/tx rings and
then registers the IRQ.

Signed-off-by: Giridhar Pemmasani [EMAIL PROTECTED]
Cc: Dale Farnsworth [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/mv643xx_eth.c |   40 +++-
 1 file changed, 22 insertions(+), 18 deletions(-)

diff -puN 
drivers/net/mv643xx_eth.c~mv643xx-ethernet-driver-irq-registration-fix 
drivers/net/mv643xx_eth.c
--- a/drivers/net/mv643xx_eth.c~mv643xx-ethernet-driver-irq-registration-fix
+++ a/drivers/net/mv643xx_eth.c
@@ -787,14 +787,6 @@ static int mv643xx_eth_open(struct net_d
unsigned int size;
int err;
 
-   err = request_irq(dev-irq, mv643xx_eth_int_handler,
-   IRQF_SHARED | IRQF_SAMPLE_RANDOM, dev-name, dev);
-   if (err) {
-   printk(KERN_ERR Can not assign IRQ number to MV643XX_eth%d\n,
-   port_num);
-   return -EAGAIN;
-   }
-
eth_port_init(mp);
 
memset(mp-timeout, 0, sizeof(struct timer_list));
@@ -806,8 +798,7 @@ static int mv643xx_eth_open(struct net_d
GFP_KERNEL);
if (!mp-rx_skb) {
printk(KERN_ERR %s: Cannot allocate Rx skb ring\n, dev-name);
-   err = -ENOMEM;
-   goto out_free_irq;
+   return -ENOMEM;
}
mp-tx_skb = kmalloc(sizeof(*mp-tx_skb) * mp-tx_ring_size,
GFP_KERNEL);
@@ -861,13 +852,8 @@ static int mv643xx_eth_open(struct net_d
dev-name, size);
printk(KERN_ERR %s: Freeing previously allocated TX queues...,
dev-name);
-   if (mp-rx_sram_size)
-   iounmap(mp-p_tx_desc_area);
-   else
-   dma_free_coherent(NULL, mp-tx_desc_area_size,
-   mp-p_tx_desc_area, mp-tx_desc_dma);
err = -ENOMEM;
-   goto out_free_tx_skb;
+   goto out_free_tx_ring;
}
memset((void *)mp-p_rx_desc_area, 0, size);
 
@@ -875,6 +861,14 @@ static int mv643xx_eth_open(struct net_d
 
mv643xx_eth_rx_refill_descs(dev);   /* Fill RX ring with skb's */
 
+   err = request_irq(dev-irq, mv643xx_eth_int_handler,
+   IRQF_SHARED | IRQF_SAMPLE_RANDOM, dev-name, dev);
+   if (err) {
+   printk(KERN_ERR Can not assign IRQ number to MV643XX_eth%d\n,
+   port_num);
+   goto out_free_rx_ring;
+   }
+
/* Clear any pending ethernet port interrupts */
mv_write(MV643XX_ETH_INTERRUPT_CAUSE_REG(port_num), 0);
mv_write(MV643XX_ETH_INTERRUPT_CAUSE_EXTEND_REG(port_num), 0);
@@ -900,12 +894,22 @@ static int mv643xx_eth_open(struct net_d
 
return 0;
 
+out_free_rx_ring:
+   if (mp-rx_sram_size)
+   iounmap(mp-p_rx_desc_area);
+   else
+   dma_free_coherent(NULL, mp-rx_desc_area_size,
+   mp-p_rx_desc_area, mp-rx_desc_dma);
+out_free_tx_ring:
+   if (mp-tx_sram_size)
+   iounmap(mp-p_tx_desc_area);
+   else
+   dma_free_coherent(NULL, mp-tx_desc_area_size,
+   mp-p_tx_desc_area, mp-tx_desc_dma);
 out_free_tx_skb:
kfree(mp-tx_skb);
 out_free_rx_skb:
kfree(mp-rx_skb);
-out_free_irq:
-   free_irq(dev-irq, dev);
 
return err;
 }
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/2] div64_64: common code

2007-03-06 Thread akpm

From: Stephen Hemminger [EMAIL PROTECTED]

Implement div64_64(): 64-bit by 64-bit division.  Needed by networking (at
least).

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
Cc: Russell King [EMAIL PROTECTED]
Cc: Geert Uytterhoeven [EMAIL PROTECTED]
Cc: Roman Zippel [EMAIL PROTECTED]
Cc: Ralf Baechle [EMAIL PROTECTED]
Cc: Chris Zankel [EMAIL PROTECTED]
Cc: Jeff Dike [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 include/asm-arm/div64.h  |3 +++
 include/asm-generic/div64.h  |7 +++
 include/asm-i386/div64.h |4 
 include/asm-m68k/div64.h |3 +++
 include/asm-mips/div64.h |4 
 include/asm-um/div64.h   |1 +
 include/asm-xtensa/div64.h   |6 ++
 lib/Makefile |5 +++--
 lib/div64.c  |   22 ++
 net/ipv4/tcp_cubic.c |   21 -
 net/netfilter/xt_connbytes.c |   16 
 11 files changed, 53 insertions(+), 39 deletions(-)

diff -puN include/asm-arm/div64.h~div64_64-common-code include/asm-arm/div64.h
--- a/include/asm-arm/div64.h~div64_64-common-code
+++ a/include/asm-arm/div64.h
@@ -2,6 +2,7 @@
 #define __ASM_ARM_DIV64
 
 #include asm/system.h
+#include linux/types.h
 
 /*
  * The semantics of do_div() are:
@@ -223,4 +224,6 @@
 
 #endif
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #endif
diff -puN include/asm-generic/div64.h~div64_64-common-code 
include/asm-generic/div64.h
--- a/include/asm-generic/div64.h~div64_64-common-code
+++ a/include/asm-generic/div64.h
@@ -30,6 +30,11 @@
__rem;  \
  })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
@@ -49,6 +54,8 @@ extern uint32_t __div64_32(uint64_t *div
__rem;  \
  })
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #else /* BITS_PER_LONG == ?? */
 
 # error do_div() does not yet support the C64
diff -puN include/asm-i386/div64.h~div64_64-common-code include/asm-i386/div64.h
--- a/include/asm-i386/div64.h~div64_64-common-code
+++ a/include/asm-i386/div64.h
@@ -1,6 +1,8 @@
 #ifndef __I386_DIV64
 #define __I386_DIV64
 
+#include linux/types.h
+
 /*
  * do_div() is NOT a C function. It wants to return
  * two values (the quotient and the remainder), but
@@ -45,4 +47,6 @@ div_ll_X_l_rem(long long divs, long div,
return dum2;
 
 }
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif
diff -puN include/asm-m68k/div64.h~div64_64-common-code include/asm-m68k/div64.h
--- a/include/asm-m68k/div64.h~div64_64-common-code
+++ a/include/asm-m68k/div64.h
@@ -1,6 +1,8 @@
 #ifndef _M68K_DIV64_H
 #define _M68K_DIV64_H
 
+#include linux/types.h
+
 /* n = n / base; return rem; */
 
 #define do_div(n, base) ({ \
@@ -23,4 +25,5 @@
__rem;  \
 })
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif /* _M68K_DIV64_H */
diff -puN include/asm-mips/div64.h~div64_64-common-code include/asm-mips/div64.h
--- a/include/asm-mips/div64.h~div64_64-common-code
+++ a/include/asm-mips/div64.h
@@ -9,6 +9,8 @@
 #ifndef _ASM_DIV64_H
 #define _ASM_DIV64_H
 
+#include linux/types.h
+
 #if (_MIPS_SZLONG == 32)
 
 #include asm/compiler.h
@@ -78,6 +80,8 @@
__quot = __quot  32 | __low; \
(n) = __quot; \
__mod; })
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif /* (_MIPS_SZLONG == 32) */
 
 #if (_MIPS_SZLONG == 64)
diff -puN include/asm-um/div64.h~div64_64-common-code include/asm-um/div64.h
--- a/include/asm-um/div64.h~div64_64-common-code
+++ a/include/asm-um/div64.h
@@ -3,4 +3,5 @@
 
 #include asm/arch/div64.h
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif
diff -puN include/asm-xtensa/div64.h~div64_64-common-code 
include/asm-xtensa/div64.h
--- a/include/asm-xtensa/div64.h~div64_64-common-code
+++ a/include/asm-xtensa/div64.h
@@ -11,9 +11,15 @@
 #ifndef _XTENSA_DIV64_H
 #define _XTENSA_DIV64_H
 
+#include linux/types.h
+
 #define do_div(n,base) ({ \
int __res = n % ((unsigned int) base); \
n /= (unsigned int) base; \
__res; })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
 #endif
diff -puN lib/Makefile~div64_64-common-code lib/Makefile
--- a/lib/Makefile~div64_64-common-code
+++ a/lib/Makefile
@@ -4,7 +4,7 @@
 
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 rbtree.o radix-tree.o dump_stack.o \
-idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
+idr.o int_sqrt.o bitmap.o extable.o prio_tree.o \
 sha1.o irq_regs.o reciprocal_div.o
 
 lib-$(CONFIG_MMU) += ioremap.o
@@ -12,7 +12,8 @@

linux 2.6 Ipv4 routing enhancement (fwd)

2007-03-06 Thread Robert Olsson


Richard Kojedzinszky writes:

  traffic, and also update the routing table (from BGP), the route cache 
  seemed to be the bottleneck, as upon every fib update the whole route 
  cache is flushed, and sometimes it took as many cpu cycles to let some 
  packets being dropped. Meanwhile i knew that *BSD systems do not use such 
  a cache, and of course without it a router can provide a constant 
  performance, not depending on the number of different ip flows, and 
  updating the fib does not take such a long time.

 Hmm I think there is cache is *BSD* too  

 Anyway you're correct the that the GC and insert/deletion of routes
 flushes the cache and can causes packets drops when all flows has
 to get recreated. Yes it's something thats needs to be addressed but
 it's not that common that people use dynamic routing protocols.

 Anyway Dave and Alexey started to look into this some time ago I got 
 involved later there were some idea how deal with this. This work 
 didn't come an end. So if you want to contribute I think we all be 
 happy. 

  For this to be solved, i have played with ipv4 routing in linux kernel a 
  bit. I have done two separate things:
  - developed a new fib algorithm in fib_trie's place for ipv4
  - rewrote the kernel not to use it's dst cache

 Just for routing?

  The fib algorithm is like cisco's CEF (at least if my knowledge is correct), 
  but first I use a 16-branching tree, to look up the address by 4 bit steps, 
  and 
  each node in this tree contains a simple sub-tree which is a radix tree, of 
  course with maximum possible height 4. I think this is very simple, and is 
  nearly 3 times faster than fib_trie. Now it has a missing feature: it does 
  not 
  export the fib in /proc/net/route.

 Full semantic match... . 
 
 The LC-trie scales tree brancing automatically so looking into linux 
 router running full BGP feed with 204300 prefixes we see:
   
 1: 27567  2: 10127  3: 8149  4: 3630  5: 1529  6: 558  7: 197  8: 53  16: 1

 Root node is 16-bit too and   Aver depth: 2.60
 So 3 times faster than fib_trie thats full sensation. How do you test?

  The second thing i have done to minimize the cpu cycles during the 
  forwarding 
  phase, rewriting ip_input.c, route.c and some others to lef.c, and having a 
  minimal functionality. I mean, for example, when a packet gets through the 
  lef 
  functions, ipsec policies are not checked.

  It would be nice to see a profile before and with your patch 

  And to be more efficient, I attached a neighbour pointer to each fib entry, 
  and 
  using this the lookup + forwarding code is very fast.

  Of course, the route cache needs very little time to forward packets when 
  there 
  are a small number of different ip flows, but when dealing with traffic in 
  an 
  ISP at core level, this cannot be stated.

  So I have done tests with LEF, and compared them to the original linux 
  kernel's 
  performance.
  With the worst case, LEF performed nearly 90% of the linux kernel with the 
  most 
  optimal case. Of course original linux performs poorly with the worst case.

 Send them and with profiles is possible...

 Cheers.

--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 12/19] fix irq problem with NAPI + NETPOLL

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Atsushi Nemoto [EMAIL PROTECTED]

It seems netif_receive_skb() was designed not to call from irq context, but
NAPI + NETPOLL break this rule.  If netif_receive_skb() was called from irq
context, redirect to netif_rx() instead of processing the skb in that
context.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Cc: Francois Romieu [EMAIL PROTECTED]
Cc: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 net/core/dev.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)


net/* stuff, I'll let DaveM make the call and apply it...


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] bonding: fix double dev_add_pack

2007-03-06 Thread Jeff Garzik


Jay Vosburgh wrote:

Bonding can erroneously register the same packet_type to receive
ARPs (for use by ARP validation): once at device open time, and once via
sysfs.  Since sysfs can change the validate setting (and thus register
or unregister) at any time, a flag is needed to synchronize with device
open in order to avoid double registrations, and the simplest place is
within the packet_type structure itself.  Double unregister is not an
issue.

Bug reported by Ulrich Oelmann [EMAIL PROTECTED].

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]


applied 1-3


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] natsemi: netpoll fixes

2007-03-06 Thread Jeff Garzik


Sergei Shtylyov wrote:

Fix two issues in this driver's netpoll path: one usual, with spin_unlock_irq()
enabling interrupts which nobody asks it to do (that has been fixed recently in
a number of drivers) and one unusual, with poll_controller() method possibly
causing loss of interrupts due to the interrupt status register being cleared
by a simple read and the interrpupt handler simply storing it, not accumulating.

Signed-off-by: Sergei Shtylyov [EMAIL PROTECTED]

---
 drivers/net/natsemi.c |   24 +++-
 1 files changed, 19 insertions(+), 5 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch 1/3] ucc_geth: Fix BD processing

2007-03-06 Thread Jeff Garzik


Li Yang wrote:

Fix broken BD processing code.

Signed-off-by: Michael Barkowski [EMAIL PROTECTED]
Signed-off-by: Li Yang [EMAIL PROTECTED]

---
drivers/net/ucc_geth.c |   14 +-
1 files changed, 9 insertions(+), 5 deletions(-)


applied 1-2 of 3


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 05/19] revert drivers/net/tulip/dmfe: support basic carrier detection

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Andrew Morton [EMAIL PROTECTED]

Revert 7628b0a8c01a02966d2228bdf741ddedb128e8f8.  Thomas Bachler
reports:

  Commit 7628b0a8c01a02966d2228bdf741ddedb128e8f8 (drivers/net/tulip/dmfe:
  support basic carrier detection) breaks networking on my Davicom DM9009. 
  ethtool always reports there is no link.  tcpdump shows incoming packets,

  but TX is disabled.  Reverting the above patch fixes the problem.


Cc: Samuel Thibault [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Thomas Bachler [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |9 +
 1 file changed, 1 insertion(+), 8 deletions(-)


applied patches 5-8


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 15/19] 3c59x: Handle pci_enable_device() failure while resuming

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Dmitriy Monakhov [EMAIL PROTECTED]

Handle pci_enable_device() failure while resuming, we can safely exit here.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/3c59x.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)


applied patches 15-16


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pcnet32: only allocate init_block dma consistent

2007-03-06 Thread Jeff Garzik


Don Fry wrote:

The patch below moves the init_block out of the private struct and
only allocates init block with pci_alloc_consistent.

This has two effects:

1. Performance increase for non cache coherent machines, because the
   CPU only data in the private struct are now cached

2. locks are working now for platforms, which need to have locks
   in cached memory

Also use netdev_priv() instead of dev-priv

Signed-off-by: Thomas Bogendoerfer [EMAIL PROTECTED]


please separate the netdev_priv() change into a separate, precursor 
patch, and resend both



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Patch 1/3] ucc_geth: Fix BD processing

2007-03-06 Thread Li Yang-r58472


 -Original Message-
 From: Jeff Garzik [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 06, 2007 7:11 PM
 To: Li Yang-r58472
 Cc: netdev@vger.kernel.org
 Subject: Re: [Patch 1/3] ucc_geth: Fix BD processing
 
 Li Yang wrote:
  Fix broken BD processing code.
 
  Signed-off-by: Michael Barkowski [EMAIL PROTECTED]
  Signed-off-by: Li Yang [EMAIL PROTECTED]
 
  ---
  drivers/net/ucc_geth.c |   14 +-
  1 files changed, 9 insertions(+), 5 deletions(-)
 
 applied 1-2 of 3

Thanks.  What's your comment about 3 of 3?

- Leo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 12/19] fix irq problem with NAPI + NETPOLL

2007-03-06 Thread Andrew Morton

On Tue, 06 Mar 2007 06:06:11 -0500 Jeff Garzik [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] wrote:
  From: Atsushi Nemoto [EMAIL PROTECTED]
  
  It seems netif_receive_skb() was designed not to call from irq context, but
  NAPI + NETPOLL break this rule.  If netif_receive_skb() was called from irq
  context, redirect to netif_rx() instead of processing the skb in that
  context.
  
  Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
  Cc: Jeff Garzik [EMAIL PROTECTED]
  Cc: Francois Romieu [EMAIL PROTECTED]
  Cc: David S. Miller [EMAIL PROTECTED]
  Signed-off-by: Andrew Morton [EMAIL PROTECTED]
  ---
  
   net/core/dev.c |   11 +--
   1 file changed, 9 insertions(+), 2 deletions(-)
 
 net/* stuff, I'll let DaveM make the call and apply it...

argh, sorry, this patch (which is mysteriously called
8139too-fix-irq-problem-with-napi-netpoll.patch) was supposed to go to
davem and you were supposed to receive
8139too-force-media-setting-fix.patch (which you were cc'ed on).

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH ] pcnet32: Fix PCnet32 performance bug on non-coherent architecutres

2007-03-06 Thread Jeff Garzik


Don Fry wrote:

The PCnet32 driver always passed the the size of the largest possible packet
to the pci_dma_sync_single_for_cpu and pci_dma_sync_single_for_device.
This results in a fairly large colateral damage in the caches and makes
the flush operation itself much slower.  On a system with a 40MHz CPU this
patch increases network bandwidth by about 12%.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]
Acked-by: Don Fry [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 11/19] sis900 warning fixes

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

diff -puN drivers/net/sis900.c~sis900-warning-fixes drivers/net/sis900.c
--- a/drivers/net/sis900.c~sis900-warning-fixes
+++ a/drivers/net/sis900.c
@@ -1430,7 +1430,7 @@ static void sis900_auto_negotiate(struct
int i = 0;
u32 status;
 
-	while (i++  2)

+   for (i = 0; i  2; i++)
status = mdio_read(net_dev, phy_addr, MII_STATUS);
 
 	if (!(status  MII_STAT_LINK)){


applied, though, you missed killing a redundant initialization


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 18/19] mv643xx ethernet driver IRQ registration fix

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Giridhar Pemmasani [EMAIL PROTECTED]

During initialization, mv643xx driver registers IRQ before setting up tx/rx
rings.  This causes kernel oops because mv643xx_poll, which gets called
right after registering IRQ, calls netif_rx_complete, which accesses the rx
ring (I don't have the oops message anymore; I just remember this sequence
of calls).  Attached (tested) patch first initializes the rx/tx rings and
then registers the IRQ.

Signed-off-by: Giridhar Pemmasani [EMAIL PROTECTED]
Cc: Dale Farnsworth [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/mv643xx_eth.c |   40 +++-
 1 file changed, 22 insertions(+), 18 deletions(-)


seems sane enough to me, but I would like to get this via Dale, who has 
been a fairly active maintainer so far



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tc35815 driver update (take 2)

2007-03-06 Thread Jeff Garzik


Atsushi Nemoto wrote:

Current tc35815 driver is very obsolete and less maintained for a long
time.  Replace it with a new driver based on one from CELF patch
archive.

Major advantages of CELF version (version 1.23, for kernel 2.6.10) are:

* Independent of JMR3927.
  (Actually independent of MIPS, but AFAIK the chip is used only on
   MIPS platforms)
* TX4938 support.
* 64-bit proof.
* Asynchronous and on-demand auto negotiation.
* High performance on non-coherent architecture.
* ethtool support.
* Many bugfixes and cleanups.

And improvoments since version 1.23 are:

* TX4939 support.
* NETPOLL support.
* NAPI support. (disabled by default)
* Reduce memcpy on receiving.
* PM support.
* Many cleanups and bugfixes.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]

 drivers/net/Kconfig |3 
 drivers/net/tc35815.c   | 2587 ++
 include/linux/pci_ids.h |2 
 3 files changed, 1917 insertions(+), 675 deletions(-)


applied to #upstream, let's give it a good review while it hangs out in 
libata-dev.git#ALL and -mm



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] NetXen: Make driver use multiple PCI functions

2007-03-06 Thread Jeff Garzik


Linsys Contractor Mithlesh Thukral wrote:

NetXen: Make driver use multiple PCI functions.
This patch will make NetXen driver work with multiple PCI functions. This will
make the usage of memory resources as well as interrupts more independent
among different functions which results in better throughput. This change has
been done after the multiport capable firmware related changes are already
there in the NetXen's driver in Linux.

Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]

---

 drivers/net/netxen/netxen_nic.h  |  126 ++---
 drivers/net/netxen/netxen_nic_ethtool.c  |   80 +--
 drivers/net/netxen/netxen_nic_hdr.h  |8 
 drivers/net/netxen/netxen_nic_hw.c   |  213 ++--
 drivers/net/netxen/netxen_nic_hw.h   |   18 
 drivers/net/netxen/netxen_nic_init.c |  115 +---

 drivers/net/netxen/netxen_nic_isr.c  |   80 +--
 drivers/net/netxen/netxen_nic_main.c |  523 ++---
 drivers/net/netxen/netxen_nic_niu.c  |   27 -
 drivers/net/netxen/netxen_nic_phan_reg.h |  125 -
 10 files changed, 631 insertions(+), 684 deletions(-)


applied 1-3 to #upstream (2.6.22)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 03/19] user of the jiffies rounding code: e1000

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Arjan van de Ven [EMAIL PROTECTED]

Use the round_jiffies() function in e1000.

These timers all were of the about once a second or about once every X
seconds variety and several showed up in the what wakes the cpu up profiles
that the tickless patches provide.  Some timers are highly dynamic based on
network load; but even on low activity systems they still show up so the
rounding is done only in cases of low activity, allowing higher frequency
timers in the high activity case.

The various hardware watchdogs are an obvious case; they run every 2 seconds
but aren't otherwise specific of exactly when they need to run.

Signed-off-by: Arjan van de Ven [EMAIL PROTECTED]
Acked-by: Auke Kok [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/e1000/e1000_main.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


applied 3-4 to #upstream (2.6.22)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 09/19] dmfe: add support for suspend/resume

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Maxim Levitsky [EMAIL PROTECTED]

This adds support for suspend resume

[EMAIL PROTECTED]: fix CONFIG_PM=n, coding style]
Signed-off-by: Maxim Levitsky [EMAIL PROTECTED]
Cc: Valerie Henson [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/tulip/dmfe.c |   52 ++---
 1 file changed, 49 insertions(+), 3 deletions(-)


applied 9-10 to #upstream


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 17/19] sk98lin: handle pci_enable_device() return value in skge_resume()

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Dmitriy Monakhov [EMAIL PROTECTED]

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/sk98lin/skge.c |   20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 19/19] network: add the missing phy_device speed information to phy_mii_ioctl

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Shan Lu [EMAIL PROTECTED]

Function `phy_mii_ioctl' returns physical device's information based on
user requests.  When requested to return the basic mode control register
information (BMCR), the original implementation only returns the physical
device's duplex information and forgets to return speed information, which
should not be because BMCR register is used to hold both duplex and speed
information.

The patch checks the BMCR value against speed-related flags and fills the
return structure's speed field accordingly.

Signed-off-by: Shan [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/phy/phy.c |6 ++
 1 file changed, 6 insertions(+)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 14/19] drivers/net/vioc/vioc_driver.c: replace pci_module_init with pci_register_driver

2007-03-06 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

From: Richard Knutsson [EMAIL PROTECTED]

Replace pci_module_init with pci_register_driver

Signed-off-by: Richard Knutson [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/net/vioc/vioc_driver.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread Ilpo Järvinen

Complete rewrite for update_scoreboard and mark_head_lost. Couple
of hints became unnecessary because of this change. Changes
!TCPCB_TAGBITS check from the original to !(S|L) but it shouldn't
make a difference, and if there ever is an R only skb TCP will
mark it as LOST too. The algorithm uses some ideas presented by
David Miller and Baruch Even.

Seqno lookups require fast lookups that are provided using
RB-tree patch(+abstraction) from DaveM.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---

I'm sorry about poorly chunked diff, is it possible to force git to 
produce better (large block) diffs when a complete function is rewritten 
from scratch in the patch (manpage of git-diff-files hints -B bit it did 
not work, affects whole file rewrites only perhaps)?

This probably conflicts with the other patches in the rbtree patchset of 
DaveM (two first are required) because I tested this one (at least the 
non-timedout part worked) and didn't want some random breakage 
from the other patches (as such was reported).

 include/linux/tcp.h  |6 -
 include/net/tcp.h|6 +
 net/ipv4/tcp_input.c |  194 +-
 net/ipv4/tcp_minisocks.c |1 
 4 files changed, 130 insertions(+), 77 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index b73687a..ccb9645 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -320,16 +320,14 @@ #endif
 
struct tcp_sack_block_wire recv_sack_cache[4];
 
-   /* from STCP, retrans queue hinting */
-   struct sk_buff* lost_skb_hint;
+   u32 highest_sack;   /* Start seq of globally highest revd SACK */
 
-   struct sk_buff *scoreboard_skb_hint;
+   /* from STCP, retrans queue hinting */
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
struct sk_buff *fastpath_skb_hint;
 
int fastpath_cnt_hint;
-   int lost_cnt_hint;
int retransmit_cnt_hint;
int forward_cnt_hint;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index ceb95ec..02929bb 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1047,8 +1047,6 @@ static inline void tcp_mib_init(void)
 
 /*from STCP */
 static inline void clear_all_retrans_hints(struct tcp_sock *tp){
-   tp-lost_skb_hint = NULL;
-   tp-scoreboard_skb_hint = NULL;
tp-retransmit_skb_hint = NULL;
tp-forward_skb_hint = NULL;
tp-fastpath_skb_hint = NULL;
@@ -1203,6 +1201,10 @@ #define tcp_for_write_queue_from(skb, sk
for (; (skb != (struct sk_buff *)(sk)-sk_write_queue);\
 skb = skb-next)
 
+#define tcp_for_write_queue_backwards_from(skb, sk)\
+   for (; (skb != (struct sk_buff *)(sk)-sk_write_queue);\
+skb = skb-prev)
+
 static inline struct sk_buff *tcp_send_head(struct sock *sk)
 {
return sk-sk_send_head;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22d0bb0..eb474eb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1188,6 +1188,10 @@ tcp_sacktag_write_queue(struct sock *sk,
 
if (fack_count  tp-fackets_out)
tp-fackets_out = fack_count;
+
+   if (after(TCP_SKB_CB(skb)-seq,
+   tp-highest_sack))
+   tp-highest_sack = TCP_SKB_CB(skb)-seq;
} else {
if (dup_sack  (sackedTCPCB_RETRANS))
reord = min(fack_count, reord);
@@ -1726,96 +1730,144 @@ static inline void tcp_reset_reno_sack(s
tp-left_out = tp-lost_out;
 }
 
-/* Mark head of queue up as lost. */
-static void tcp_mark_head_lost(struct sock *sk, struct tcp_sock *tp,
-  int packets, u32 high_seq)
+/* Forward walk marking LOST until non-timedout is encountered */
+static void tcp_timedout_mark_forward(struct sock *sk, struct sk_buff *skb)
 {
-   struct sk_buff *skb;
-   int cnt;
-
-   BUG_TRAP(packets = tp-packets_out);
-   if (tp-lost_skb_hint) {
-   skb = tp-lost_skb_hint;
-   cnt = tp-lost_cnt_hint;
-   } else {
-   skb = tcp_write_queue_head(sk);
-   cnt = 0;
-   }
+   struct tcp_sock *tp = tcp_sk(sk);
 
+   /* ...continue timed out work if necessary */
tcp_for_write_queue_from(skb, sk) {
-   if (skb == tcp_send_head(sk))
-   break;
-   /* TODO: do this better */
-   /* this is not the most efficient way to do this... */
-   tp-lost_skb_hint = skb;
-   tp-lost_cnt_hint = cnt;
-   cnt += tcp_skb_pcount(skb);
-   if (cnt  packets || after(TCP_SKB_CB(skb)-end_seq, high_seq))
+   if (skb == tcp_send_head(sk) ||
+   !tcp_skb_timedout(sk, skb))
break;
-

Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread Ilpo Järvinen

On Tue, 6 Mar 2007, Ilpo Järvinen wrote:

 because I tested this one (at least the non-timedout part worked) 

...meant that timedout wasn't that throughoutly tested with such a simple 
testcase I used (only FACK was tested).

--
 i.

compat_sock_common_getsockopt typo?

2007-03-06 Thread Johannes Berg

The function reads as follows:

int compat_sock_common_getsockopt(struct socket *sock, int level, int optname,
  char __user *optval, int __user *optlen)
{
struct sock *sk = sock-sk;

if (sk-sk_prot-compat_setsockopt != NULL)
^^^

return sk-sk_prot-compat_getsockopt(sk, level, optname,
   ^^^
  optval, optlen);
return sk-sk_prot-getsockopt(sk, level, optname, optval, optlen);
}
EXPORT_SYMBOL(compat_sock_common_getsockopt);

Is that intentional to make protocol writers assign both if they want
compat_setsockopt? :P

johannes


signature.asc
Description: This is a digitally signed message part

wireless extensions vs. 64-bit architectures

2007-03-06 Thread Johannes Berg

Hi,

Wtf! After struggling with some strange problems with zd1211rw (see some
other mail) I decided to think again about what could possibly cause all
the other problems I'm having with it. The kernel seems fine, but iw*
userspace continually segfaults! And it also seems to be not
reproducible for most other people, I'd asked on IRC once a while.

Well. Some thinking and stracing and thinking later it occurred to me...
Hell! wext is ioctls and includes this gem:

struct  iw_point
{
  void __user   *pointer;   /* Pointer to the data  (in user space) */
  __u16 length; /* number of fields or size in bytes */
  __u16 flags;  /* Optional params */
};

Of course nobody ever tells you this, but it's used in a shitload of
places.

Btw, did I mention that I'm running a stock debian powerpc 32-bit
userspace on my 64-bit machine. Oh and of course wext doesn't have any
32-in-64 compat code.

/me laughes manically about wext.

And don't tell me the fix is to use the netlink interface to wext.
Actually, I think it may have the same bug, it seems to be operating
with iw_point (or at least its size) too but I can't really tell, the
code's just too clear, I always just see right through it... Oh and I
still insist on removing the whole pile of junk, netlink interface
first.

Isn't there any possibility that we can kill userspace interfaces that
are terminally broken without keeping them for years to come?

Sorry. This is just too frustrating.

johannes
-- 
Now playing: Nightwish (Century Child) - End Of All Hope



signature.asc
Description: This is a digitally signed message part

[RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Robert Olsson


David Miller writes:
 
 Interesting.
 
  Actually, more accurately, the conflict exists in how this GC
  logic is implemented.  The core issue is that hash table size
  guides the GC processing, and hash table growth therefore
  modifies those GC goals.  So with the patch below we'll just
  keep growing the hash table instead of giving GC some time to
  try to keep the working set in equilibrium before doing the
  hash grow.
 
 AFIK the equilibrium is resizing function as well but using fixed 
 hash table. So can we do without equilibrium resizing if tables 
 are dynamic?  I think so

 With the hash data structure we could monitor the average chain 
 length or just size and resize hash after that.

  One idea is to put the hash grow check in the garbage collector,
  and put the hash shrink check in rt_del().
  
  In fact, it would be a good time to perhaps hack up some entirely
  new passive GC logic for the routing cache.

 Could be, remeber GC in the hash chain also which was added after
 although it does's decrease the number of entries but it gives
 an upper limit. Also gc-goal must picked so it does not force 
 unwanted resizing.

  BTW, another thing that plays into this is that Robert's TRASH work
  could make this patch not necessary :-)

 It has built-in resize and chain control and the gc-goal is chosen not 
 to unnecessary resize the root node. 

 Cheers.
--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Robert Olsson


Eric Dumazet writes:

  Well, maybe... but after looking robert's trash, I discovered its model is 
  essentially a big (2^18 slots) root node (our hash table), and very few 
  order:1,2,3 nodes.
 
 It's getting hashlike yes. I guess all effective algorithms today is doing
 some sort of index lookup and for large number of entries we cannot expect 
 to find the next node in the same cache line so the tree depth becomes a
 crucial performance factor. IMO nothing can beat a prefect distributed and 
 perfect sized hash. The trash work is an effort to get close with dynamic 
 data structure.

 Cheers
--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET]: Please revert disallowing zero listen queues

2007-03-06 Thread Gerrit Renker

Please can you reconsider the patch regarding the accept_queue

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.22.git;a=commit;h=8488df894d05d6fa41c2bd298c335f944bb0e401

It disallows to set a `backlog' argument to listen(2) of zero. Using
a zero backlog is often done (e.g. ttcp), and disallowing a zero 
backlog will break many applications. I had to recode several applications
which rely on this convention.

The problem further spreads from TCP to DCCP (same behaviour).

Below is a patch to revert this change.

Thank you 
Gerrit

diff --git a/include/net/sock.h b/include/net/sock.h
index 849c7df..2c7d60c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -426,7 +426,7 @@ static inline void sk_acceptq_added(stru
 
 static inline int sk_acceptq_is_full(struct sock *sk)
 {
-   return sk-sk_ack_backlog = sk-sk_max_ack_backlog;
+   return sk-sk_ack_backlog  sk-sk_max_ack_backlog;
 }
 
 /*
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

GMRP support ?

2007-03-06 Thread xerces8

(reposted from linux-net)

Hi!

Is there any work on IEEE 802.1D GARP/GMRP support in linux?

(GMRP tells switches about multicast group membership, read :
no more broadcasting of multicast traffic, but sending only
to interested parties.)

I have no idea of support level in switches, I guess it is
a chicken-egg problem.

Regards,
David

PS: Or is that Cisco standard (can't recall it's name right now)
supported ? multicast is mandatory in IPv6 so it can be ignored any more ;)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Eric Dumazet

On Tuesday 06 March 2007 14:42, Robert Olsson wrote:
 Eric Dumazet writes:
   Well, maybe... but after looking robert's trash, I discovered its model
   is essentially a big (2^18 slots) root node (our hash table), and very
   few order:1,2,3 nodes.

  It's getting hashlike yes. I guess all effective algorithms today is
 doing some sort of index lookup and for large number of entries we cannot
 expect to find the next node in the same cache line so the tree depth
 becomes a crucial performance factor. IMO nothing can beat a prefect
 distributed and perfect sized hash. The trash work is an effort to get
 close with dynamic data structure.

Indeed. It would be nice to see how it performs with say 2^20 elements...
Because with your data, I wonder if the extra complexity of the trash is worth 
it (since most lookups are going to only hit the hash and give the answer 
without intermediate nodes)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wireless extensions vs. 64-bit architectures

2007-03-06 Thread Johannes Berg

On Tue, 2007-03-06 at 02:27 +0100, Johannes Berg wrote:
 Actually, I think it may have the same bug, it seems to be operating
 with iw_point (or at least its size) too

I'm told that the code that uses it is only internal and the size isn't
part of the userspace interface which makes this wrong. But we still
have many programs relying on ioctls and none relying on the wext/nl
interface so that isn't really a concern anyway.

johannes


signature.asc
Description: This is a digitally signed message part

Re: linux 2.6 Ipv4 routing enhancement (fwd)

2007-03-06 Thread Richard Kojedzinszky


Dear Robert,

Sorry for sending the tgz with .svn included. And i did not send 
instructions.

To do a test with fib_trie, issue
$ make clean all ROUTE_ALG=TRIE  ./try a
with fib_radix:
$ make clean all ROUTE_ALG=RADIX  ./try a
with fib_lef:
$ make clean all ROUTE_ALG=LEF SBBITS=4  ./try a

This last is to use 4 bits per main tree nodes. It could be chosen 
arbitrarily, but 4 seemed to be the best choice.


Regards,
Richard Kojedzinszky

On Tue, 6 Mar 2007, Robert Olsson wrote:



Richard Kojedzinszky writes:

 traffic, and also update the routing table (from BGP), the route cache
 seemed to be the bottleneck, as upon every fib update the whole route
 cache is flushed, and sometimes it took as many cpu cycles to let some
 packets being dropped. Meanwhile i knew that *BSD systems do not use such
 a cache, and of course without it a router can provide a constant
 performance, not depending on the number of different ip flows, and
 updating the fib does not take such a long time.

Hmm I think there is cache is *BSD* too

Anyway you're correct the that the GC and insert/deletion of routes
flushes the cache and can causes packets drops when all flows has
to get recreated. Yes it's something thats needs to be addressed but
it's not that common that people use dynamic routing protocols.

Anyway Dave and Alexey started to look into this some time ago I got
involved later there were some idea how deal with this. This work
didn't come an end. So if you want to contribute I think we all be
happy.

 For this to be solved, i have played with ipv4 routing in linux kernel a
 bit. I have done two separate things:
 - developed a new fib algorithm in fib_trie's place for ipv4
 - rewrote the kernel not to use it's dst cache

Just for routing?

 The fib algorithm is like cisco's CEF (at least if my knowledge is correct),
 but first I use a 16-branching tree, to look up the address by 4 bit steps, 
and
 each node in this tree contains a simple sub-tree which is a radix tree, of
 course with maximum possible height 4. I think this is very simple, and is
 nearly 3 times faster than fib_trie. Now it has a missing feature: it does not
 export the fib in /proc/net/route.

Full semantic match... .

The LC-trie scales tree brancing automatically so looking into linux
router running full BGP feed with 204300 prefixes we see:

1: 27567  2: 10127  3: 8149  4: 3630  5: 1529  6: 558  7: 197  8: 53  16: 1

Root node is 16-bit too and   Aver depth: 2.60
So 3 times faster than fib_trie thats full sensation. How do you test?

 The second thing i have done to minimize the cpu cycles during the forwarding
 phase, rewriting ip_input.c, route.c and some others to lef.c, and having a
 minimal functionality. I mean, for example, when a packet gets through the lef
 functions, ipsec policies are not checked.

 It would be nice to see a profile before and with your patch

 And to be more efficient, I attached a neighbour pointer to each fib entry, 
and
 using this the lookup + forwarding code is very fast.

 Of course, the route cache needs very little time to forward packets when 
there
 are a small number of different ip flows, but when dealing with traffic in an
 ISP at core level, this cannot be stated.

 So I have done tests with LEF, and compared them to the original linux 
kernel's
 performance.
 With the worst case, LEF performed nearly 90% of the linux kernel with the 
most
 optimal case. Of course original linux performs poorly with the worst case.

Send them and with profiles is possible...

Cheers.

--ro


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: compat_sock_common_getsockopt typo?

2007-03-06 Thread James Morris

On Tue, 6 Mar 2007, Johannes Berg wrote:

 The function reads as follows:
 
 int compat_sock_common_getsockopt(struct socket *sock, int level, int optname,
   char __user *optval, int __user *optlen)
 {
 struct sock *sk = sock-sk;
 
 if (sk-sk_prot-compat_setsockopt != NULL)
 ^^^
 
 return sk-sk_prot-compat_getsockopt(sk, level, optname,
^^^
   optval, optlen);
 return sk-sk_prot-getsockopt(sk, level, optname, optval, optlen);
 }
 EXPORT_SYMBOL(compat_sock_common_getsockopt);
 
 Is that intentional to make protocol writers assign both if they want
 compat_setsockopt? :P

It's a bug.



-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fix compat_sock_common_getsockopt typo

2007-03-06 Thread Johannes Berg

This patch fixes a typo in compat_sock_common_getsockopt.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]

--- wireless-dev.orig/net/core/sock.c   2007-03-06 15:44:15.618565674 +0100
+++ wireless-dev/net/core/sock.c2007-03-06 15:44:25.948565674 +0100
@@ -1597,7 +1597,7 @@ int compat_sock_common_getsockopt(struct
 {
struct sock *sk = sock-sk;
 
-   if (sk-sk_prot-compat_setsockopt != NULL)
+   if (sk-sk_prot-compat_getsockopt != NULL)
return sk-sk_prot-compat_getsockopt(sk, level, optname,
  optval, optlen);
return sk-sk_prot-getsockopt(sk, level, optname, optval, optlen);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tc35815 driver update (take 2)

2007-03-06 Thread Atsushi Nemoto

On Tue, 06 Mar 2007 06:20:04 -0500, Jeff Garzik [EMAIL PROTECTED] wrote:
 applied to #upstream, let's give it a good review while it hangs out in 
 libata-dev.git#ALL and -mm

Thank you.  I believe you mean netdev-2.6.git :)

---
Atsushi Nemoto
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix compat_sock_common_getsockopt typo

2007-03-06 Thread James Morris

On Tue, 6 Mar 2007, Johannes Berg wrote:

 This patch fixes a typo in compat_sock_common_getsockopt.
 
 Signed-off-by: Johannes Berg [EMAIL PROTECTED]
 
 --- wireless-dev.orig/net/core/sock.c 2007-03-06 15:44:15.618565674 +0100
 +++ wireless-dev/net/core/sock.c  2007-03-06 15:44:25.948565674 +0100
 @@ -1597,7 +1597,7 @@ int compat_sock_common_getsockopt(struct
  {
   struct sock *sk = sock-sk;
  
 - if (sk-sk_prot-compat_setsockopt != NULL)
 + if (sk-sk_prot-compat_getsockopt != NULL)
   return sk-sk_prot-compat_getsockopt(sk, level, optname,
 optval, optlen);
   return sk-sk_prot-getsockopt(sk, level, optname, optval, optlen);


Acked-by: James Morris [EMAIL PROTECTED]




-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 18/19] mv643xx ethernet driver IRQ registration fix

2007-03-06 Thread Dale Farnsworth

On Tue, Mar 06, 2007 at 02:42:02AM -0800, [EMAIL PROTECTED] wrote:
 From: Giridhar Pemmasani [EMAIL PROTECTED]
 
 During initialization, mv643xx driver registers IRQ before setting up tx/rx
 rings.  This causes kernel oops because mv643xx_poll, which gets called
 right after registering IRQ, calls netif_rx_complete, which accesses the rx
 ring (I don't have the oops message anymore; I just remember this sequence
 of calls).  Attached (tested) patch first initializes the rx/tx rings and
 then registers the IRQ.

I believe a better fix is to disable any pending interrupt sources
before calling request_irq().  I sent the patch below to Giri for
confirmation, but haven't heard back.  I should've copied Andrew.

Giri, have you had a chance to test this alternative patch?

Thanks,
-Dale

 Date: Fri, 2 Mar 2007 17:03:53 -0700
 To: Giridhar Pemmasani [EMAIL PROTECTED]
 Message-ID: [EMAIL PROTECTED]
 In-Reply-To: [EMAIL PROTECTED]
 
 On Fri, Mar 02, 2007 at 04:52:06AM +, Giridhar Pemmasani wrote:
  During initialization, mv643xx driver registers IRQ before setting up tx/rx
  rings. This causes kernel oops because mv643xx_poll, which gets called
  right after registering IRQ, calls netif_rx_complete, which accesses the rx
  ring (I don't have the oops message anymore; I just remember this sequence
  of calls). Attached (tested) patch first initializes the rx/tx rings and
  then registers the IRQ.
  
  Giri
  
  Signed-off-by: Giridhar Pemmasani [EMAIL PROTECTED]
 
 Giridhar, does the following patch fix your problem, in place of the
 patch you supplied?
 
 -Dale

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 3e045a6..192b390 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -787,6 +787,12 @@ static int mv643xx_eth_open(struct net_device *dev)
unsigned int size;
int err;
 
+   /* Clear any pending ethernet port interrupts */
+   mv_write(MV643XX_ETH_INTERRUPT_CAUSE_REG(port_num), 0);
+   mv_write(MV643XX_ETH_INTERRUPT_CAUSE_EXTEND_REG(port_num), 0);
+   /* wait for previous write to complete */
+   mv_read (MV643XX_ETH_INTERRUPT_CAUSE_EXTEND_REG(port_num));
+
err = request_irq(dev-irq, mv643xx_eth_int_handler,
IRQF_SHARED | IRQF_SAMPLE_RANDOM, dev-name, dev);
if (err) {
@@ -875,10 +881,6 @@ static int mv643xx_eth_open(struct net_device *dev)
 
mv643xx_eth_rx_refill_descs(dev);   /* Fill RX ring with skb's */
 
-   /* Clear any pending ethernet port interrupts */
-   mv_write(MV643XX_ETH_INTERRUPT_CAUSE_REG(port_num), 0);
-   mv_write(MV643XX_ETH_INTERRUPT_CAUSE_EXTEND_REG(port_num), 0);
-
eth_port_start(dev);
 
/* Interrupt Coalescing */

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] e1000: list e1000-devel mailing list in MAINTAINERS

2007-03-06 Thread Auke Kok

From: Auke Kok [EMAIL PROTECTED]

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 MAINTAINERS |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1dfba85..51efc71 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1797,6 +1797,7 @@ P:Jeff Kirsher
 M: [EMAIL PROTECTED]
 P: Auke Kok
 M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/e1000/
 S: Supported
 
@@ -1811,6 +1812,7 @@ P:Jeff Kirsher
 M: [EMAIL PROTECTED]
 P: Auke Kok
 M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/e1000/
 S: Supported
 
@@ -1825,6 +1827,7 @@ P:Jesse Brandeburg
 M: [EMAIL PROTECTED]
 P: Auke Kok
 M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/e1000/
 S: Supported
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] e1000: Use ARRAY_SIZE macro when appropriate

2007-03-06 Thread Auke Kok

From: Ahmed S. Darwish [EMAIL PROTECTED]

A patch to use ARRAY_SIZE macro already defined in kernel.h.

Signed-off-by: Ahmed S. Darwish [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000/e1000_ethtool.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 6777887..a094288 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -742,7 +742,7 @@ err_setup:
uint32_t pat, value;   \
uint32_t test[] =  \
{0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};  \
-   for (pat = 0; pat  sizeof(test)/sizeof(test[0]); pat++) {  
\
+   for (pat = 0; pat  ARRAY_SIZE(test); pat++) {  \
E1000_WRITE_REG(adapter-hw, R, (test[pat]  W)); \
value = E1000_READ_REG(adapter-hw, R);   \
if (value != (test[pat]  W  M)) { 
\
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] e1000: Use kcalloc()

2007-03-06 Thread Auke Kok

From: Yan Burman [EMAIL PROTECTED]

Replace kmalloc+memsetout the driver. Slightly modified by Auke Kok.

Signed-off-by: Yan Burman [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000/e1000_ethtool.c |   26 --
 drivers/net/e1000/e1000_main.c|   29 -
 2 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index a094288..2881da1 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -654,14 +654,11 @@ e1000_set_ringparam(struct net_device *netdev,
e1000_mac_type mac_type = adapter-hw.mac_type;
struct e1000_tx_ring *txdr, *tx_old;
struct e1000_rx_ring *rxdr, *rx_old;
-   int i, err, tx_ring_size, rx_ring_size;
+   int i, err;
 
if ((ring-rx_mini_pending) || (ring-rx_jumbo_pending))
return -EINVAL;
 
-   tx_ring_size = sizeof(struct e1000_tx_ring) * adapter-num_tx_queues;
-   rx_ring_size = sizeof(struct e1000_rx_ring) * adapter-num_rx_queues;
-
while (test_and_set_bit(__E1000_RESETTING, adapter-flags))
msleep(1);
 
@@ -672,11 +669,11 @@ e1000_set_ringparam(struct net_device *netdev,
rx_old = adapter-rx_ring;
 
err = -ENOMEM;
-   txdr = kzalloc(tx_ring_size, GFP_KERNEL);
+   txdr = kcalloc(adapter-num_tx_queues, sizeof(struct e1000_tx_ring), 
GFP_KERNEL);
if (!txdr)
goto err_alloc_tx;
 
-   rxdr = kzalloc(rx_ring_size, GFP_KERNEL);
+   rxdr = kcalloc(adapter-num_rx_queues, sizeof(struct e1000_rx_ring), 
GFP_KERNEL);
if (!rxdr)
goto err_alloc_rx;
 
@@ -1053,23 +1050,24 @@ e1000_setup_desc_rings(struct e1000_adapter *adapter)
struct e1000_rx_ring *rxdr = adapter-test_rx_ring;
struct pci_dev *pdev = adapter-pdev;
uint32_t rctl;
-   int size, i, ret_val;
+   int i, ret_val;
 
/* Setup Tx descriptor ring and Tx buffers */
 
if (!txdr-count)
txdr-count = E1000_DEFAULT_TXD;
 
-   size = txdr-count * sizeof(struct e1000_buffer);
-   if (!(txdr-buffer_info = kmalloc(size, GFP_KERNEL))) {
+   if (!(txdr-buffer_info = kcalloc(txdr-count,
+ sizeof(struct e1000_buffer),
+ GFP_KERNEL))) {
ret_val = 1;
goto err_nomem;
}
-   memset(txdr-buffer_info, 0, size);
 
txdr-size = txdr-count * sizeof(struct e1000_tx_desc);
E1000_ROUNDUP(txdr-size, 4096);
-   if (!(txdr-desc = pci_alloc_consistent(pdev, txdr-size, txdr-dma))) 
{
+   if (!(txdr-desc = pci_alloc_consistent(pdev, txdr-size,
+   txdr-dma))) {
ret_val = 2;
goto err_nomem;
}
@@ -1116,12 +1114,12 @@ e1000_setup_desc_rings(struct e1000_adapter *adapter)
if (!rxdr-count)
rxdr-count = E1000_DEFAULT_RXD;
 
-   size = rxdr-count * sizeof(struct e1000_buffer);
-   if (!(rxdr-buffer_info = kmalloc(size, GFP_KERNEL))) {
+   if (!(rxdr-buffer_info = kcalloc(rxdr-count,
+ sizeof(struct e1000_buffer),
+ GFP_KERNEL))) {
ret_val = 4;
goto err_nomem;
}
-   memset(rxdr-buffer_info, 0, size);
 
rxdr-size = rxdr-count * sizeof(struct e1000_rx_desc);
if (!(rxdr-desc = pci_alloc_consistent(pdev, rxdr-size, rxdr-dma))) 
{
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 7bbefca..530d5d7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1354,31 +1354,27 @@ e1000_sw_init(struct e1000_adapter *adapter)
 static int __devinit
 e1000_alloc_queues(struct e1000_adapter *adapter)
 {
-   int size;
-
-   size = sizeof(struct e1000_tx_ring) * adapter-num_tx_queues;
-   adapter-tx_ring = kmalloc(size, GFP_KERNEL);
+   adapter-tx_ring = kcalloc(adapter-num_tx_queues,
+  sizeof(struct e1000_tx_ring), GFP_KERNEL);
if (!adapter-tx_ring)
return -ENOMEM;
-   memset(adapter-tx_ring, 0, size);
 
-   size = sizeof(struct e1000_rx_ring) * adapter-num_rx_queues;
-   adapter-rx_ring = kmalloc(size, GFP_KERNEL);
+   adapter-rx_ring = kcalloc(adapter-num_rx_queues,
+  sizeof(struct e1000_rx_ring), GFP_KERNEL);
if (!adapter-rx_ring) {
kfree(adapter-tx_ring);
return -ENOMEM;
}
-   memset(adapter-rx_ring, 0, size);
 
 #ifdef CONFIG_E1000_NAPI
-   size = sizeof(struct net_device) * adapter-num_rx_queues;
-   adapter-polling_netdev = kmalloc(size, GFP_KERNEL);
+   adapter-polling_netdev = kcalloc(adapter-num_rx_queues,
+

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Robert Olsson


Eric Dumazet writes:

  Indeed. It would be nice to see how it performs with say 2^20 elements...
  Because with your data, I wonder if the extra complexity of the trash is 
  worth 
  it (since most lookups are going to only hit the hash and give the answer 
  without intermediate nodes)

 I don't know if I understand you fully. Yes in most cases the first lookup via
 hash-header will take us to direct to the correct leaf. If there are 
collisions 
 we have to sort them out by adding intermediate nodes.

 Something like where you have resizable hash where is each bucket in turn is a 
 resizable hash etc.

 Cheers
--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wireless extensions vs. 64-bit architectures

2007-03-06 Thread Jean Tourrilhes

On Tue, Mar 06, 2007 at 02:27:26AM +0100, Johannes Berg wrote:
 Hi,
 
 Wtf! After struggling with some strange problems with zd1211rw (see some
 other mail) I decided to think again about what could possibly cause all
 the other problems I'm having with it. The kernel seems fine, but iw*
 userspace continually segfaults! And it also seems to be not
 reproducible for most other people, I'd asked on IRC once a while.
 
 Well. Some thinking and stracing and thinking later it occurred to me...
 Hell! wext is ioctls and includes this gem:
 
 struct  iw_point
 {
   void __user   *pointer;   /* Pointer to the data  (in user space) */
   __u16 length; /* number of fields or size in bytes */
   __u16 flags;  /* Optional params */
 };
 
 Of course nobody ever tells you this, but it's used in a shitload of
 places.

Yep, and it's even in fs/compat_ioctl.c. Hint, hint ;-)

 Btw, did I mention that I'm running a stock debian powerpc 32-bit
 userspace on my 64-bit machine. Oh and of course wext doesn't have any
 32-in-64 compat code.

Please check again, it does.

 /me laughes manically about wext.
 
 And don't tell me the fix is to use the netlink interface to wext.
 Actually, I think it may have the same bug, it seems to be operating
 with iw_point (or at least its size) too but I can't really tell, the
 code's just too clear, I always just see right through it... Oh and I
 still insist on removing the whole pile of junk, netlink interface
 first.

Well, why don't you go and check it. For example, check
where IW_EV_POINT_OFF is used.

 Isn't there any possibility that we can kill userspace interfaces that
 are terminally broken without keeping them for years to come?

Well, is there a possibility that people check the facts
before making bold assumptions ?

 Sorry. This is just too frustrating.

Yes, you are perfectly right. This continuous bashing of wext
for no good reason is too frustrating.

 johannes

Now, back to the problem. You seem to have an intermitent
crash. If the stuff above was broken, it would systematically crash,
because it would always get stuff at an offset.
The fact that the crash is not systematic leads me to believe
that something else is at play, such as a compiler optimisation gone
bad, some memory condition, or a driver returning corrupted data to
wext and iwconfig not checking bad data properly.
If you were to give me a proper bug report, there is a chance
that we might make progress.

Have fun...

Jean



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Eric Dumazet

On Tuesday 06 March 2007 18:05, Robert Olsson wrote:
 Eric Dumazet writes:
   Indeed. It would be nice to see how it performs with say 2^20
   elements... Because with your data, I wonder if the extra complexity of
   the trash is worth it (since most lookups are going to only hit the hash
   and give the answer without intermediate nodes)

  I don't know if I understand you fully. Yes in most cases the first lookup
 via hash-header will take us to direct to the correct leaf. If there are
 collisions we have to sort them out by adding intermediate nodes.

With 2^20 entries, your actual limit of 2^19 entries in root node will 
probably show us quite different numbers for order-1,2,3,4... tnodes


  Something like where you have resizable hash where is each bucket in turn
 is a resizable hash etc.

Yes, numbers you gave us basically showed a big root node, and mainly leaves 
and very few tnodes.

I was interested to see the distribution in case the root-node limit is hit, 
and we load into the table a *lot* of entries.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: when having to acquire an SA, ipsec drops the packet

2007-03-06 Thread Joy Latten

On Mon, 2007-03-05 at 22:21 -0500, James Morris wrote:
 On Mon, 5 Mar 2007, Joy Latten wrote:
 
  5. Around the time the set of SAs for OUT direction are to be
 inserted into SAD, I see another ACQUIRE happening.
 
 I have not yet figured out where this second ACQUIRE comes from
 and why it happens. As long as the minimal SA or set of valid outgoing
 SAs exist in SAD, an ACQUIRE should not happen.
 
 I saw something similar to this some time ago when testing various 
 failure modes, and discused it with Herbert.
 
 IIRC, there's a larval SA which is not torn down properly by Racoon once 
 the full SA is established, and the larval SA keeps resending until it 
 times out.
 
Ok, good to know. 
I thought a bit more about this last night but am not
sure best way to fix it. Perhaps a way to keep larval
SA around until all SAs resulting from xfrm_vec[xfrm_nr]
are established... oh well, just thinking out loud... :-) 


Joy
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 18/19] mv643xx ethernet driver IRQ registration fix

2007-03-06 Thread Giridhar Pemmasani


--- Dale Farnsworth [EMAIL PROTECTED] wrote:

 On Tue, Mar 06, 2007 at 02:42:02AM -0800, [EMAIL PROTECTED] wrote:
  From: Giridhar Pemmasani [EMAIL PROTECTED]
  
  During initialization, mv643xx driver registers IRQ before setting up
 tx/rx
  rings.  This causes kernel oops because mv643xx_poll, which gets called
  right after registering IRQ, calls netif_rx_complete, which accesses the
 rx
  ring (I don't have the oops message anymore; I just remember this
 sequence
  of calls).  Attached (tested) patch first initializes the rx/tx rings and
  then registers the IRQ.
 
 I believe a better fix is to disable any pending interrupt sources
 before calling request_irq().  I sent the patch below to Giri for
 confirmation, but haven't heard back.  I should've copied Andrew.
 
 Giri, have you had a chance to test this alternative patch?

I will get my hands on the box this Friday. As soon as I can, I will test and
give feedback.

FWIW, I had a patch earlier that was similar (not same) as you suggest. That
patch was based on the changes from 2.6.15 to 2.6.16 when it broke. In
2.6.15, interrupts as well as rx/tx are disabled until rx/tx rings are setup.
Anyway, I will test this and let you know.

Giri


 

Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
http://videogames.yahoo.com/platform?platform=120121
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread David Miller

From: Howard Chu [EMAIL PROTECTED]
Date: Tue, 06 Mar 2007 01:22:18 -0800

 OK, I just subscribed to netdev...

Unlike other mailing lists you don't have to subscribe
to netdev in order to post to it and ask questions :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread David Miller

From: [EMAIL PROTECTED]
Date: Tue, 06 Mar 2007 02:42:28 -0800

 From: Stephen Hemminger [EMAIL PROTECTED]

 Implement div64_64(): 64-bit by 64-bit division.  Needed by networking (at
 least).

This patch, with the types.h fixes of your's, is already in my
net-2.6.22 GIT tree if you'd like to start pulling from there
Andrew.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: many sockets, slow sendto

2007-03-06 Thread Baruch Even

* Zaccomer Lajos [EMAIL PROTECTED] [070306 17:39]:
 Hi,
 
 
 
 I'm playing around with a simulation, in which many thousands of IP
 
 addresses (on interface aliases) are used to send/receive TCP/UDP
 
 packets. I noticed that the time of send/sendto increased linearly
 
 with the number of file descriptors, and I found it rather strange.

To better understand the reason for this problem you should first use
oprofile to profile the kernel. This will give you the hot spots of the
kernel, where the kernel (or userspace) spends most of its time.

Baruch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: many sockets, slow sendto

2007-03-06 Thread Andi Kleen

Zaccomer Lajos [EMAIL PROTECTED] writes:

 I'm playing around with a simulation, in which many thousands of IP
 
 addresses (on interface aliases) are used to send/receive TCP/UDP

Something seems to be wrong with your emailer. It adds a empty
line between each real line.

 
 packets. I noticed that the time of send/sendto increased linearly
 
 with the number of file descriptors, and I found it rather strange.

Yes that is strange. I would suggest you use oprofile to identify which
parts of the kernel use the CPU time with many descriptors.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pcnet32: only allocate init_block dma consistent

2007-03-06 Thread Don Fry

The change to use netdev_priv can only be done After moving the init
block out of the private structure.  It will break the driver if done
first, which is why they were sent together.

I will separate the changes and resend them.

On Tue, Mar 06, 2007 at 06:13:14AM -0500, Jeff Garzik wrote:
 Don Fry wrote:
 The patch below moves the init_block out of the private struct and
 only allocates init block with pci_alloc_consistent.
 
 This has two effects:
 
 1. Performance increase for non cache coherent machines, because the
CPU only data in the private struct are now cached
 
 2. locks are working now for platforms, which need to have locks
in cached memory
 
 Also use netdev_priv() instead of dev-priv
 
 Signed-off-by: Thomas Bogendoerfer [EMAIL PROTECTED]
 
 please separate the netdev_priv() change into a separate, precursor 
 patch, and resend both
 
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread David Miller

From: Gerrit Renker [EMAIL PROTECTED]
Date: Tue, 6 Mar 2007 13:32:09 +

 Please can you reconsider the patch regarding the accept_queue

 http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.22.git;a=commit;h=8488df894d05d6fa41c2bd298c335f944bb0e401

 It disallows to set a `backlog' argument to listen(2) of zero. Using
 a zero backlog is often done (e.g. ttcp), and disallowing a zero 
 backlog will break many applications. I had to recode several applications
 which rely on this convention.

 The problem further spreads from TCP to DCCP (same behaviour).

 Below is a patch to revert this change.

Everything I've ever seen clearly states that a backlog of
zero means that zero connections are allowed.

So we're not disallowing a backlog argument of zero to
listen().  We'll accept that just fine, the only thing that
happens is that you'll get what you ask for, that being
no connections :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] pcnet32: only allocate init_block dma consistent

2007-03-06 Thread Don Fry

The patch below moves the init_block out of the private struct and
only allocates init block with pci_alloc_consistent. 

This has two effects:

1. Performance increase for non cache coherent machines, because the
   CPU only data in the private struct are now cached

2. locks are working now for platforms, which need to have locks
   in cached memory

Signed-off-by: Thomas Bogendoerfer [EMAIL PROTECTED]
Acked-by: Don Fry [EMAIL PROTECTED]
---
 drivers/net/pcnet32.c |   77 ++---
 1 files changed, 34 insertions(+), 43 deletions(-)

diff --git a/drivers/net/pcnet32.c b/drivers/net/pcnet32.c
index 36f9d98..04b0c44 100644
--- a/drivers/net/pcnet32.c
+++ b/drivers/net/pcnet32.c
@@ -253,12 +253,12 @@ struct pcnet32_access {
  * so the structure should be allocated using pci_alloc_consistent().
  */
 struct pcnet32_private {
-   struct pcnet32_init_block init_block;
+   struct pcnet32_init_block *init_block;
/* The Tx and Rx ring entries must be aligned on 16-byte boundaries in 
32bit mode. */
struct pcnet32_rx_head  *rx_ring;
struct pcnet32_tx_head  *tx_ring;
-   dma_addr_t  dma_addr;/* DMA address of beginning of this
-  object, returned by pci_alloc_consistent */
+   dma_addr_t  init_dma_addr;/* DMA address of beginning of 
the init block,
+  returned by pci_alloc_consistent */
struct pci_dev  *pci_dev;
const char  *name;
/* The saved address of a sent-in-place packet/buffer, for skfree(). */
@@ -1593,7 +1593,6 @@ static int __devinit
 pcnet32_probe1(unsigned long ioaddr, int shared, struct pci_dev *pdev)
 {
struct pcnet32_private *lp;
-   dma_addr_t lp_dma_addr;
int i, media;
int fdx, mii, fset, dxsuflo;
int chip_version;
@@ -1715,7 +1714,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
dxsuflo = 1;
}
 
-   dev = alloc_etherdev(0);
+   dev = alloc_etherdev(sizeof(*lp));
if (!dev) {
if (pcnet32_debug  NETIF_MSG_PROBE)
printk(KERN_ERR PFX Memory allocation failed.\n);
@@ -1806,25 +1805,22 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
}
 
dev-base_addr = ioaddr;
+   lp = dev-priv;
/* pci_alloc_consistent returns page-aligned memory, so we do not have 
to check the alignment */
-   if ((lp =
-pci_alloc_consistent(pdev, sizeof(*lp), lp_dma_addr)) == NULL) {
+   if ((lp-init_block =
+pci_alloc_consistent(pdev, sizeof(*lp-init_block), 
lp-init_dma_addr)) == NULL) {
if (pcnet32_debug  NETIF_MSG_PROBE)
printk(KERN_ERR PFX
   Consistent memory allocation failed.\n);
ret = -ENOMEM;
goto err_free_netdev;
}
-
-   memset(lp, 0, sizeof(*lp));
-   lp-dma_addr = lp_dma_addr;
lp-pci_dev = pdev;
 
spin_lock_init(lp-lock);
 
SET_MODULE_OWNER(dev);
SET_NETDEV_DEV(dev, pdev-dev);
-   dev-priv = lp;
lp-name = chipname;
lp-shared_irq = shared;
lp-tx_ring_size = TX_RING_SIZE;/* default tx ring size */
@@ -1871,23 +1867,21 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
 dev-dev_addr[2] == 0x75)
lp-options = PCNET32_PORT_FD | PCNET32_PORT_GPSI;
 
-   lp-init_block.mode = le16_to_cpu(0x0003);  /* Disable Rx and Tx. */
-   lp-init_block.tlen_rlen =
+   lp-init_block-mode = le16_to_cpu(0x0003); /* Disable Rx and Tx. */
+   lp-init_block-tlen_rlen =
le16_to_cpu(lp-tx_len_bits | lp-rx_len_bits);
for (i = 0; i  6; i++)
-   lp-init_block.phys_addr[i] = dev-dev_addr[i];
-   lp-init_block.filter[0] = 0x;
-   lp-init_block.filter[1] = 0x;
-   lp-init_block.rx_ring = (u32) le32_to_cpu(lp-rx_ring_dma_addr);
-   lp-init_block.tx_ring = (u32) le32_to_cpu(lp-tx_ring_dma_addr);
+   lp-init_block-phys_addr[i] = dev-dev_addr[i];
+   lp-init_block-filter[0] = 0x;
+   lp-init_block-filter[1] = 0x;
+   lp-init_block-rx_ring = (u32) le32_to_cpu(lp-rx_ring_dma_addr);
+   lp-init_block-tx_ring = (u32) le32_to_cpu(lp-tx_ring_dma_addr);
 
/* switch pcnet32 to 32bit mode */
a-write_bcr(ioaddr, 20, 2);
 
-   a-write_csr(ioaddr, 1, (lp-dma_addr + offsetof(struct pcnet32_private,
-init_block))  0x);
-   a-write_csr(ioaddr, 2, (lp-dma_addr + offsetof(struct pcnet32_private,
-init_block))  16);
+   a-write_csr(ioaddr, 1, (lp-init_dma_addr  0x));
+   a-write_csr(ioaddr, 2, (lp-init_dma_addr  16));
 
if

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Eric Dumazet wrote:

On Tuesday 06 March 2007 10:22, Howard Chu wrote:


It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range -
on my system the default port range is 32768-61000. That means if I use
up 28232 ports in less than 2MSL then everything stops. netstat will
show that all the available port numbers are in TIME_WAIT state. And
this is particularly bad because while waiting for the timeout, I can't
initiate any new outbound connections of any kind at all - telnet, ssh,
whatever, you have to wait for at least one port to free up.
(Interesting denial of service there)

Granted, I was running my test on 2.6.18, perhaps 2.6.21 behaves
differently.


Could you try this attached program and tell me whats happen ?

$ gcc -O2 -o socktest socktest.c -lpthread
$ time ./socktest -n 10
nb_conn=9 nb_accp=9

real0m5.058s
user0m0.212s
sys 0m4.844s

(on my small machine, dell d610 :) )


On my Asus laptop (2GHz Pentium M) the first time I ran it it completed 
in about 51 seconds, with no errors. I then copied it to another machine 
and started it up there, and got connect errors right away. I then went 
back to my laptop and ran it again, and got errors that time.


This is the laptop run with errors:
viola:~/src uname -a
Linux viola 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 
i686 i386 GNU/Linux

viola:~/src time ./socktest -n 100
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
nb_conn=993757 nb_accp=993757
1.408u 88.649s 1:42.76 87.6%0+0k 0+0io 0pf+0w

This is my other system, an AMD X2 3800+ (dual core)
mandolin:~/src uname -a
Linux mandolin 2.6.18.3SMP #9 SMP Sat Nov 25 10:08:51 PST 2006 x86_64 
x86_64 x86_64 GNU/Linux

mandolin:~/src gcc -O2 -o socktest socktest.c -lpthread
mandolin:~/src time ./socktest -n 100
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
nb_conn=957088 nb_accp=957088
1.012u 630.991s 5:18.05 198.7%  0+0k 0+0io 0pf+0w
--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread David Miller

From: David Miller [EMAIL PROTECTED]
Date: Tue, 06 Mar 2007 10:37:06 -0800 (PST)

 Everything I've ever seen clearly states that a backlog of
 zero means that zero connections are allowed.

 So we're not disallowing a backlog argument of zero to
 listen().  We'll accept that just fine, the only thing that
 happens is that you'll get what you ask for, that being
 no connections :-)

I'm not saying that a backlog of zero might mean allow one,
in which case we do need to revert the change.  Rather, I'm
trying to clarify what is the real issue here as Gerrit's
email implied that listen() with a zero backlog returns
an error now, which is not true.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wireless extensions vs. 64-bit architectures

2007-03-06 Thread Michael Buesch

On Tuesday 06 March 2007 18:13, Jean Tourrilhes wrote:
 On Tue, Mar 06, 2007 at 02:27:26AM +0100, Johannes Berg wrote:
  Hi,
  
  Wtf! After struggling with some strange problems with zd1211rw (see some
  other mail) I decided to think again about what could possibly cause all
  the other problems I'm having with it. The kernel seems fine, but iw*
  userspace continually segfaults! And it also seems to be not
  reproducible for most other people, I'd asked on IRC once a while.
  
  Well. Some thinking and stracing and thinking later it occurred to me...
  Hell! wext is ioctls and includes this gem:
  
  struct  iw_point
  {
void __user   *pointer;   /* Pointer to the data  (in user space) */
__u16 length; /* number of fields or size in bytes */
__u16 flags;  /* Optional params */
  };
  
  Of course nobody ever tells you this, but it's used in a shitload of
  places.
 
   Yep, and it's even in fs/compat_ioctl.c. Hint, hint ;-)

Ok, it is wrapping the following ioctls:

HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl)
HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl)

What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT
and some others that also use iw_point?

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux 2.6 Ipv4 routing enhancement (fwd)

2007-03-06 Thread Robert Olsson


Richard Kojedzinszky writes:

  Sorry for sending the tgz with .svn included. And i did not send 
  instructions.
  To do a test with fib_trie, issue
  $ make clean all ROUTE_ALG=TRIE  ./try a
  with fib_radix:
  $ make clean all ROUTE_ALG=RADIX  ./try a
  with fib_lef:
  $ make clean all ROUTE_ALG=LEF SBBITS=4  ./try a

 Thanks. First I'll use to do my testing in kernel context and in the 
 forwarding path with full semantic match so it's not that easy to compare.  
 But I'll take a look. BTW the you test so you do correct prefix matching?
 
 FYI. some old fib work on robur.slu.se

 # Look with just hlist
  /pub/Linux/net-development/fib_hlist/

 # 24 bit hash lookup
  /pub/Linux/net-development/fib_hash2/
 
 And some hlist/hash2/trie comparisons in:
 /pub/Linux/tmp/trie-talk-kth.pdf

 Cheers
--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Rick Jones

This is probably not something that happens in real world deployments. I 
But it's not 60,000 concurrent connections, it's 60,000 within a 2 
minute span.


Sounds like a case of Doctor! Doctor! It hurts when I do this.



I'm not saying this is a high priority problem, I only encountered it in 
a test scenario where I was deliberately trying to max out the server.



Ideally the 2MSL parameter would be dynamically adjusted based on the
route to the destination and the weights associated with those routes.
In the simplest case, connections between machines on the same subnet
(i.e., no router hops involved) should have a much smaller default value
than connections that traverse any routers. I'd settle for a two-level
setting - with no router hops, use the small value; with any router hops
use the large value.


With transparant bridging, nobody knows how long the datagram may be out 
there.  Admittedly, the chances of a datagram living for a full two 
minutes these days is probably nil, but just being in the same IP subnet 
doesn't really mean anything when it comes to physical locality.


It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range - 
on my system the default port range is 32768-61000. That means if I use 
up 28232 ports in less than 2MSL then everything stops. netstat will 
show that all the available port numbers are in TIME_WAIT state. And 
this is particularly bad because while waiting for the timeout, I can't 
initiate any new outbound connections of any kind at all - telnet, ssh, 
whatever, you have to wait for at least one port to free up. 
(Interesting denial of service there)


SPECweb benchmarking has had to deal with the issue of attempted 
TIME_WAIT reuse going back to 1997.  It deals with it by not relying on 
the client's configured local/anonymous/ephemeral port number range and 
instead making explicit bind() calls in the (more or less) entire unpriv 
port range (actually it may just be from 5000 to 65535 but still)


Now, if it weren't necessary to fully randomize the ISNs, the chances of 
a successful transition from TIME_WAIT to ESTABLISHED might be greater, 
but going back to the good old days of more or less purly clock driven 
ISN's isn't likely.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread Rick Jones


So we're not disallowing a backlog argument of zero to
listen().  We'll accept that just fine, the only thing that
happens is that you'll get what you ask for, that being
no connections :-)


I'm not sure where HP-UX inherited the 0 = 1 bit - perhaps from BSD, nor 
am I sure there is official chapter and verse, but:


excerpt
backlog is limited to the range of 0 to SOMAXCONN, which is 	defined in 
sys/socket.h.  SOMAXCONN is currently set to 4096.  If any other 
value is specified, the system automatically assigns the closest value 
within the range.  A backlog of 0 specifies only 1 pending 
connection  is allowed at any given time.

/excerpt

I don't have a Solaris, BSD or AIX manpage for listen handy to check 
them but would not be surprised to see they are similar.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] pcnet32: change to use netdev_priv

2007-03-06 Thread Don Fry

use netdev_priv() instead of dev-priv

Signed-off-by: Thomas Bogendoerfer [EMAIL PROTECTED]
Signed-off-by: Don Fry [EMAIL PROTECTED]
---
--- linux-2.6.21-rc2/drivers/net/one.pcnet32.c  2007-03-06 10:48:37.0 
-0800
+++ linux-2.6.21-rc2/drivers/net/pcnet32.c  2007-03-05 18:03:32.0 
-0800
@@ -653,7 +653,7 @@ static void pcnet32_realloc_rx_ring(stru
 
 static void pcnet32_purge_rx_ring(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
int i;
 
/* free all allocated skbuffs */
@@ -681,7 +681,7 @@ static void pcnet32_poll_controller(stru
 
 static int pcnet32_get_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r = -EOPNOTSUPP;
 
@@ -696,7 +696,7 @@ static int pcnet32_get_settings(struct n
 
 static int pcnet32_set_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r = -EOPNOTSUPP;
 
@@ -711,7 +711,7 @@ static int pcnet32_set_settings(struct n
 static void pcnet32_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
 
strcpy(info-driver, DRV_NAME);
strcpy(info-version, DRV_VERSION);
@@ -723,7 +723,7 @@ static void pcnet32_get_drvinfo(struct n
 
 static u32 pcnet32_get_link(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r;
 
@@ -743,19 +743,19 @@ static u32 pcnet32_get_link(struct net_d
 
 static u32 pcnet32_get_msglevel(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
return lp-msg_enable;
 }
 
 static void pcnet32_set_msglevel(struct net_device *dev, u32 value)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
lp-msg_enable = value;
 }
 
 static int pcnet32_nway_reset(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
int r = -EOPNOTSUPP;
 
@@ -770,7 +770,7 @@ static int pcnet32_nway_reset(struct net
 static void pcnet32_get_ringparam(struct net_device *dev,
  struct ethtool_ringparam *ering)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
 
ering-tx_max_pending = TX_MAX_RING_SIZE;
ering-tx_pending = lp-tx_ring_size;
@@ -781,7 +781,7 @@ static void pcnet32_get_ringparam(struct
 static int pcnet32_set_ringparam(struct net_device *dev,
 struct ethtool_ringparam *ering)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
unsigned long flags;
unsigned int size;
ulong ioaddr = dev-base_addr;
@@ -847,7 +847,7 @@ static int pcnet32_self_test_count(struc
 static void pcnet32_ethtool_test(struct net_device *dev,
 struct ethtool_test *test, u64 * data)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
int rc;
 
if (test-flags == ETH_TEST_FL_OFFLINE) {
@@ -868,7 +868,7 @@ static void pcnet32_ethtool_test(struct 
 
 static int pcnet32_loopback_test(struct net_device *dev, uint64_t * data1)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
struct pcnet32_access *a = lp-a;  /* access to registers */
ulong ioaddr = dev-base_addr;  /* card base I/O address */
struct sk_buff *skb;/* sk buff */
@@ -1047,7 +1047,7 @@ static int pcnet32_loopback_test(struct 
 
 static void pcnet32_led_blink_callback(struct net_device *dev)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
struct pcnet32_access *a = lp-a;
ulong ioaddr = dev-base_addr;
unsigned long flags;
@@ -1064,7 +1064,7 @@ static void pcnet32_led_blink_callback(s
 
 static int pcnet32_phys_id(struct net_device *dev, u32 data)
 {
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp = netdev_priv(dev);
struct pcnet32_access *a = lp-a;
ulong ioaddr = dev-base_addr;
unsigned long flags;
@@ -1109,7 +1109,7 @@ static int pcnet32_suspend(struct net_de
int can_sleep)
 {
int csr5;
-   struct pcnet32_private *lp = dev-priv;
+   struct pcnet32_private *lp =

Re: [RFC PATCH]: Dynamically sized routing cache hash table.

2007-03-06 Thread Robert Olsson


Eric Dumazet writes:

  With 2^20 entries, your actual limit of 2^19 entries in root node will 
  probably show us quite different numbers for order-1,2,3,4... tnodes

 Yeep trie will get deeper and lookup more costly as insert and delete.
 The 2^19 was that was getting memory alloction problem that I never
 sorted out.

  Yes, numbers you gave us basically showed a big root node, and mainly leaves 
  and very few tnodes.
  
  I was interested to see the distribution in case the root-node limit is hit, 
  and we load into the table a *lot* of entries.

 Maxlength etc... well maybe root-restriction should be removed and just have 
 maxsize instead.

 Cheers
--ro
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ixgb: Use ARRAY_SIZE macro when appropriate.

2007-03-06 Thread Auke Kok

From: Ahmed S. Darwish [EMAIL PROTECTED]

Signed-off-by: Ahmed S. Darwish [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/ixgb/ixgb_param.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgb/ixgb_param.c b/drivers/net/ixgb/ixgb_param.c
index b27442a..c38ce73 100644
--- a/drivers/net/ixgb/ixgb_param.c
+++ b/drivers/net/ixgb/ixgb_param.c
@@ -245,8 +245,6 @@ ixgb_validate_option(int *value, struct ixgb_option *opt)
return -1;
 }
 
-#define LIST_LEN(l) (sizeof(l) / sizeof(l[0]))
-
 /**
  * ixgb_check_options - Range Checking for Command Line Parameters
  * @adapter: board private structure
@@ -335,7 +333,7 @@ ixgb_check_options(struct ixgb_adapter *adapter)
.name = Flow Control,
.err  = reading default settings from EEPROM,
.def  = ixgb_fc_tx_pause,
-   .arg  = { .l = { .nr = LIST_LEN(fc_list),
+   .arg  = { .l = { .nr = ARRAY_SIZE(fc_list),
 .p = fc_list }}
};
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Please revert disallowing zero listen queues

2007-03-06 Thread David Miller

From: Rick Jones [EMAIL PROTECTED]
Date: Tue, 06 Mar 2007 10:54:00 -0800

  So we're not disallowing a backlog argument of zero to
  listen().  We'll accept that just fine, the only thing that
  happens is that you'll get what you ask for, that being
  no connections :-)

 I'm not sure where HP-UX inherited the 0 = 1 bit - perhaps from BSD, nor 
 am I sure there is official chapter and verse, but:

 excerpt
 backlog is limited to the range of 0 to SOMAXCONN, which is   defined in 
 sys/socket.h.  SOMAXCONN is currently set to 4096.  If any other 
 value is specified, the system automatically assigns the closest value 
  within the range.  A backlog of 0 specifies only 1 pending 
 connection  is allowed at any given time.
 /excerpt

 I don't have a Solaris, BSD or AIX manpage for listen handy to check 
 them but would not be surprised to see they are similar.

Ok, that seals the deal for me, I'll revert the change :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]: Revert accept queue backlog change.

2007-03-06 Thread David Miller


Wei, I have to revert your change, it is incorrect as pointed
out by other people here on netdev.

BSD sockets basically define the 'backlog' parameter to listen()
to mean allow backlog + 1 connections to be queued to the socket.
This allows a backlog parameter of 0 to allow 1 connection, and
there are real applications which do this.

diff --git a/include/net/sock.h b/include/net/sock.h
index 849c7df..2c7d60c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -426,7 +426,7 @@ static inline void sk_acceptq_added(struct sock *sk)
 
 static inline int sk_acceptq_is_full(struct sock *sk)
 {
-   return sk-sk_ack_backlog = sk-sk_max_ack_backlog;
+   return sk-sk_ack_backlog  sk-sk_max_ack_backlog;
 }
 
 /*
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51ca438..6069716 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -934,7 +934,7 @@ static long unix_wait_for_peer(struct sock *other, long 
timeo)
 
sched = !sock_flag(other, SOCK_DEAD) 
!(other-sk_shutdown  RCV_SHUTDOWN) 
-   (skb_queue_len(other-sk_receive_queue) =
+   (skb_queue_len(other-sk_receive_queue) 
 other-sk_max_ack_backlog);
 
unix_state_runlock(other);
@@ -1008,7 +1008,7 @@ restart:
if (other-sk_state != TCP_LISTEN)
goto out_unlock;
 
-   if (skb_queue_len(other-sk_receive_queue) =
+   if (skb_queue_len(other-sk_receive_queue) 
other-sk_max_ack_backlog) {
err = -EAGAIN;
if (!timeo)
@@ -1381,7 +1381,7 @@ restart:
}
 
if (unix_peer(other) != sk 
-   (skb_queue_len(other-sk_receive_queue) =
+   (skb_queue_len(other-sk_receive_queue) 
 other-sk_max_ack_backlog)) {
if (!timeo) {
err = -EAGAIN;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Howard Chu


Rick Jones wrote:
This is probably not something that happens in real world deployments. 
I But it's not 60,000 concurrent connections, it's 60,000 within a 2 
minute span.


Sounds like a case of Doctor! Doctor! It hurts when I do this.


I guess. In the cases where it matters, we use LDAP over Unix Domain 
Sockets instead of TCP. Smarter clients that do connection pooling would 
help too, but the fact that this even came to our attention is because 
not all clients out there are smart enough.


Since we have an alternative that works, I'm not really worried about 
it. I just thought it was worthwhile to raise the question.


I'm not saying this is a high priority problem, I only encountered it 
in a test scenario where I was deliberately trying to max out the server.



Ideally the 2MSL parameter would be dynamically adjusted based on the
route to the destination and the weights associated with those routes.
In the simplest case, connections between machines on the same subnet
(i.e., no router hops involved) should have a much smaller default 
value

than connections that traverse any routers. I'd settle for a two-level
setting - with no router hops, use the small value; with any router 
hops

use the large value.


With transparant bridging, nobody knows how long the datagram may be out 
there.  Admittedly, the chances of a datagram living for a full two 
minutes these days is probably nil, but just being in the same IP subnet 
doesn't really mean anything when it comes to physical locality.


Bridging isn't necessarily a problem though. The 2MSL timeout is 
designed to prevent problems from delayed packets that got sent through 
multiple paths. In a bridging setup you don't allow multiple paths, 
that's what STP is designed to prevent. If you want to configure a 
network that allows multiple paths, you need to use a router, not a bridge.


SPECweb benchmarking has had to deal with the issue of attempted 
TIME_WAIT reuse going back to 1997.  It deals with it by not relying on 
the client's configured local/anonymous/ephemeral port number range and 
instead making explicit bind() calls in the (more or less) entire unpriv 
port range (actually it may just be from 5000 to 65535 but still)


That still doesn't solve the problem, it only ~doubles the available 
port range. That means it takes 0.6 seconds to trigger the problem 
instead of only 0.3 seconds...


Now, if it weren't necessary to fully randomize the ISNs, the chances of 
a successful transition from TIME_WAIT to ESTABLISHED might be greater, 
but going back to the good old days of more or less purly clock driven 
ISN's isn't likely.


In an environment where connections are opened and closed very quickly 
with only a small amount of data carried per connection, it might make 
sense to remember the last sequence number used on a port and use that 
as the floor of the next randomly generated ISN. Monotonically 
increasing sequence numbers aren't a security risk if there's still a 
randomly determined gap from one connection to the next. But I don't 
think it's necessary to consider this at the moment.

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sunhttp://highlandsun.com/hyc
  Chief Architect, OpenLDAP http://www.openldap.org/project/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: when having to acquire an SA, ipsec drops the packet

2007-03-06 Thread James Morris

On Tue, 6 Mar 2007, Joy Latten wrote:

  I saw something similar to this some time ago when testing various 
  failure modes, and discused it with Herbert.
  
  IIRC, there's a larval SA which is not torn down properly by Racoon once 
  the full SA is established, and the larval SA keeps resending until it 
  times out.
  
 Ok, good to know. 
 I thought a bit more about this last night but am not
 sure best way to fix it. Perhaps a way to keep larval
 SA around until all SAs resulting from xfrm_vec[xfrm_nr]
 are established... oh well, just thinking out loud... :-) 

I think the solution, if this actually the problem, is for the userland 
code to maintain the SAs.


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions

2007-03-06 Thread Baruch Even

* Ilpo J?rvinen [EMAIL PROTECTED] [070306 14:52]:
 Complete rewrite for update_scoreboard and mark_head_lost. Couple
 of hints became unnecessary because of this change. Changes
 !TCPCB_TAGBITS check from the original to !(S|L) but it shouldn't
 make a difference, and if there ever is an R only skb TCP will
 mark it as LOST too. The algorithm uses some ideas presented by
 David Miller and Baruch Even.
 
 Seqno lookups require fast lookups that are provided using
 RB-tree patch(+abstraction) from DaveM.
 
 Signed-off-by: Ilpo J?rvinen [EMAIL PROTECTED]
 ---
 
 I'm sorry about poorly chunked diff, is it possible to force git to 
 produce better (large block) diffs when a complete function is rewritten 
 from scratch in the patch (manpage of git-diff-files hints -B bit it did 
 not work, affects whole file rewrites only perhaps)?
 
 This probably conflicts with the other patches in the rbtree patchset of 
 DaveM (two first are required) because I tested this one (at least the 
 non-timedout part worked) and didn't want some random breakage 
 from the other patches (as such was reported).
 
  include/linux/tcp.h  |6 -
  include/net/tcp.h|6 +
  net/ipv4/tcp_input.c |  194 
 +-
  net/ipv4/tcp_minisocks.c |1 
  4 files changed, 130 insertions(+), 77 deletions(-)
 

[snip]

 + newtp-highest_sack = treq-snt_isn + 1;

That's the only initialization that you have for highest_sack, I think
that you should initialize it when a loss is detected to the start_seq
of the first packet that wasn't acked.

Didn't review the rest, still need to arrange a proper tree with
preliminary patches to apply it on. Could you note the kernel you based
it on and include all patches applied before it?

Baruch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] div64_64: common code

2007-03-06 Thread Ralf Baechle

On Tue, Mar 06, 2007 at 02:42:28AM -0800, [EMAIL PROTECTED] wrote:

 Implement div64_64(): 64-bit by 64-bit division.  Needed by networking (at
 least).

Your patch only implements div64_64() for 32-bit MIPS.  Below patch adds
the trivial 64-bit bits.

  Ralf

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]

 include/asm-mips/div64.h |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-mips/include/asm-mips/div64.h
===
--- linux-mips.orig/include/asm-mips/div64.h
+++ linux-mips/include/asm-mips/div64.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (C) 2000, 2004  Maciej W. Rozycki
- * Copyright (C) 2003 Ralf Baechle
+ * Copyright (C) 2003, 07 Ralf Baechle ([EMAIL PROTECTED])
  *
  * This file is subject to the terms and conditions of the GNU General Public
  * License.  See the file COPYING in the main directory of this archive
@@ -105,6 +105,11 @@ extern uint64_t div64_64(uint64_t divide
(n) = __quot; \
__mod; })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
+
 #endif /* (_MIPS_SZLONG == 64) */
 
 #endif /* _ASM_DIV64_H */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP 2MSL on loopback

2007-03-06 Thread Eric Dumazet


Howard Chu a écrit :

Eric Dumazet wrote:

On Tuesday 06 March 2007 10:22, Howard Chu wrote:


It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range -
on my system the default port range is 32768-61000. That means if I use
up 28232 ports in less than 2MSL then everything stops. netstat will
show that all the available port numbers are in TIME_WAIT state. And
this is particularly bad because while waiting for the timeout, I can't
initiate any new outbound connections of any kind at all - telnet, ssh,
whatever, you have to wait for at least one port to free up.
(Interesting denial of service there)

Granted, I was running my test on 2.6.18, perhaps 2.6.21 behaves
differently.


Could you try this attached program and tell me whats happen ?

$ gcc -O2 -o socktest socktest.c -lpthread
$ time ./socktest -n 10
nb_conn=9 nb_accp=9

real0m5.058s
user0m0.212s
sys 0m4.844s

(on my small machine, dell d610 :) )


On my Asus laptop (2GHz Pentium M) the first time I ran it it completed 
in about 51 seconds, with no errors. I then copied it to another machine 
and started it up there, and got connect errors right away. I then went 
back to my laptop and ran it again, and got errors that time.


This is the laptop run with errors:
viola:~/src uname -a
Linux viola 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 
i686 i386 GNU/Linux

viola:~/src time ./socktest -n 100
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
nb_conn=993757 nb_accp=993757
1.408u 88.649s 1:42.76 87.6%0+0k 0+0io 0pf+0w

This is my other system, an AMD X2 3800+ (dual core)
mandolin:~/src uname -a
Linux mandolin 2.6.18.3SMP #9 SMP Sat Nov 25 10:08:51 PST 2006 x86_64 
x86_64 x86_64 GNU/Linux

mandolin:~/src gcc -O2 -o socktest socktest.c -lpthread
mandolin:~/src time ./socktest -n 100
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
connect error 99
nb_conn=957088 nb_accp=957088
1.012u 630.991s 5:18.05 198.7%  0+0k 0+0io 0pf+0w


Let me see, any chance you can try the prog on 2.6.20 ?

If not, please send :

grep . /proc/sys/net/ipv4/*

Thank you
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 211 matches

Mail list logo