date:20071207

Re: [PATCH 2.6.25] net: move trie_local and trie_main into the proc iterator

2007-12-07 Thread David Miller

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 18:00:12 +0300

 From: Eric W. Biederman [EMAIL PROTECTED]

 We only use these variables when displaying the trie in proc so
 place them into the iterator to make this explicit.  We should
 probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES
 case but at least this makes it clear that the silliness is limited
 to the display in /proc.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/6] ipv6 - make fib6_rules_init to return an error code

2007-12-07 Thread David Miller

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:53:32 +0100

 When the fib_rules initialization finished, no return code is provided
 so there is no way to know, for the caller, if the initialization has
 been successful or has failed. This patch fix that.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sockets affected by IPsec always block (2.6.23)

2007-12-07 Thread Stefan Rompf

Am Freitag, 7. Dezember 2007 04:20 schrieb David Miller:

 If IPSEC takes a long time to resolve, and we don't block, the
 connect() can hard fail (we will just keep dropping the outgoing SYN
 packet send attempts, eventually hitting the retry limit) in cases
 where if we did block it would not fail (because we wouldn't send
 the first SYN until IPSEC resolved).

David - I'm aware of this, the discussion is which behaviour is ok. Let's go 
back to a real life example. I've already researched that the squid web proxy 
has a poll() based main loop doing nonblocking connects, may be with multiple 
threads.

Situation: One user wants to access a web page that needs IPSEC. The SA takes 
30 seconds to come up.

a) Non-blocking connect is respected: SYN packets during the first 30 seconds 
will be dropped as you said. Connection can be completed on the next SYN 
retry (timeout in linux: 3 minutes). During this time, the 500 other users 
can continue to browse using the proxy.

b) Non-blocking connect is ignored during IPSEC resolving as you advocate it: 
Connection for the one user can be completed immediatly after IPSEC comes up. 
That's the pro. However, until then, the other 500 proxy user CANNOT ACCESS 
THE WEB because squid's threads are stuck in connect()s on sockets they 
configured not to block. If the IPSEC SA never resolves due to some network 
outage, squid will sleep forever or until an admin configures it that it 
doesn't try to connect the adress in question and restarts it.

Don't you realize how broken this behaviour is? Can you give me ONE example of 
an application that works better with b) and why this outweights the problems 
it creates for everybody else?

Even the DNS example you posted in  
[EMAIL PROTECTED] is wrong because the second 
server will never queried if the kernel puts the process into coma while the 
IPSEC SA to the first server cannot be resolved.

Stefan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread Ilpo Järvinen

On Thu, 6 Dec 2007, Lachlan Andrew wrote:
 On 04/12/2007, Ilpo Järvinen [EMAIL PROTECTED] wrote:
  On Mon, 3 Dec 2007, Lachlan Andrew wrote:
  
   When SACK is active, the per-packet processing becomes more involved,
   tracking the list of lost/SACKed packets.  This causes a CPU spike
   just after a loss, which increases the RTTs, at least in my
   experience.
 
  I suspect that as long as old code was able to use hint, it wasn't doing
  that bad. But it was seriously lacking ability to take advantage of sack
  processing hint when e.g., a new hole appeared, or cumulative ACK arrived.
 
  ...Code available in net-2.6.25 might cure those.
 
 We had been using one of your earlier patches, and still had the
 problem.  I think you've cured the problem with SACK itself, but there
 still seems to be something taking a lot of CPU while recovering from
 the loss. 

I guess if you get a large cumulative ACK, the amount of processing is 
still overwhelming (added DaveM if he has some idea how to combat it).

Even a simple scenario (this isn't anything fancy at all, will occur all 
the time): Just one loss = rest skbs grow one by one into a single 
very large SACK block (and we do that efficiently for sure) = then the 
fast retransmit gets delivered and a cumulative ACK for whole orig_window 
arrives = clean_rtx_queue has to do a lot of processing. In this case we 
could optimize RB-tree cleanup away (by just blanking it all) but still 
getting rid of all those skbs is going to take a larger moment than I'd 
like to see.

That tree blanking could be extended to cover anything which ACK more than 
half of the tree by just replacing the root (and dealing with potential 
recolorization of the root).

 It is possible that it was to do with  web100  which we
 have also been running, but I cut out most of the statistics from that
 and still had problems.

No idea about what it could do, haven't yet looked web100, I was planning 
at some point of time...

-- 
 i.

[PATCH] AF_RXRPC: Add a missing goto

2007-12-07 Thread David Howells

Add a missing goto to error handling in the RXKAD security module for AF_RXRPC.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 net/rxrpc/rxkad.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index e09a95a..8e69d69 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -1021,6 +1021,7 @@ static int rxkad_verify_response(struct rxrpc_connection 
*conn,
 
abort_code = RXKADINCONSISTENCY;
if (version != RXKAD_VERSION)
+   goto protocol_error;
 
abort_code = RXKADTICKETLEN;
if (ticket_len  4 || ticket_len  MAXKRB5TICKETLEN)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2007-12-07 Thread Denis V. Lunev

Andrew Morton wrote:
 On Fri, 07 Dec 2007 04:51:37 + David Woodhouse [EMAIL PROTECTED] wrote:
 
 On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote:
 Well I clearly goofed when I added the initial network namespace support
 for /proc/net.  Currently things work but there are odd details visible
 to user space, even when we have a single network namespace.

 Since we do not cache proc_dir_entry dentries at the moment we can
 just modify -lookup to return a different directory inode depending
 on the network namespace of the process looking at /proc/net, replacing
 the current technique of using a magic and fragile follow_link method.

 To accomplish that this patch:
 - introduces a shadow_proc method to allow different dentries to
   be returned from proc_lookup.
 - Removes the old /proc/net follow_link magic
 - Fixes a weakness in our not caching of proc generic dentries.

 As shadow_proc uses a task struct to decided which dentry to return we
 can go back later and fix the proc generic caching without modifying any 
 code that
 uses the shadow_proc method.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 ---
  fs/proc/generic.c   |   12 ++-
  fs/proc/proc_net.c  |   86 
 +++
  include/linux/proc_fs.h |3 ++
  3 files changed, 19 insertions(+), 82 deletions(-)
 (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)

 This seems to have broken the use of /proc/bus/usb as a mountpoint. It
 always appears empty now, whatever's supposed to be mounted there.

 
 Yes.  Denis and Eric are tossing around competing patches but afaik nobody
 is happy with any of them.  Guys, could we get this sorted soonish please?
 

Andrew, I become too relaxed after receiving
Tested-by: Giacomo Catenazzi [EMAIL PROTECTED]

Eric, I believe that reverting an original behavior is better than your
new one as
- you introduce search into the depth by calling have_submounts(dentry)
during revalidation for all(!) /proc dentries
- your shadowing behavior will be broken if you'll mount something in
the depth of shadowed tree (this can be done as a DoS attempt)

As a last minute call, may be it will be better to pin network namespace
like a pid namespace during mount to avoid this crap at all?

Regards,
Den
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/3][IPV6]: create route6 proc init-fini functions

2007-12-07 Thread Daniel Lezcano

Make the proc creation/destruction to be a separate function. That allows to
remove the #ifdef CONFIG_PROC_FS in the init/fini function and make them more
readable.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 net/ipv6/route.c |   58 +--
 1 file changed, 40 insertions(+), 18 deletions(-)

Index: net-2.6.25/net/ipv6/route.c
===
--- net-2.6.25.orig/net/ipv6/route.c
+++ net-2.6.25/net/ipv6/route.c
@@ -2353,6 +2353,40 @@ static const struct file_operations rt6_
.llseek  = seq_lseek,
.release = single_release,
 };
+
+static int ipv6_route_proc_init(struct net *net)
+{
+   int ret = -ENOMEM;
+   if (!proc_net_fops_create(net, ipv6_route,
+ 0, ipv6_route_proc_fops))
+   goto out;
+
+   if (!proc_net_fops_create(net, rt6_stats,
+ S_IRUGO, rt6_stats_seq_fops))
+   goto out_ipv6_route;
+
+   ret = 0;
+out:
+   return ret;
+out_ipv6_route:
+   proc_net_remove(net, ipv6_route);
+   goto out;
+}
+
+static void ipv6_route_proc_fini(struct net *net)
+{
+   proc_net_remove(net, ipv6_route);
+   proc_net_remove(net, rt6_stats);
+}
+#else
+static inline int ipv6_route_proc_init(struct net *net)
+{
+   return 0;
+}
+static inline void ipv6_route_proc_fini(struct net *net)
+{
+   return ;
+}
 #endif /* CONFIG_PROC_FS */
 
 #ifdef CONFIG_SYSCTL
@@ -2479,21 +2513,14 @@ int __init ip6_route_init(void)
if (ret)
goto out_kmem_cache;
 
-#ifdef CONFIG_PROC_FS
-   ret = -ENOMEM;
-   if (!proc_net_fops_create(init_net, ipv6_route,
- 0, ipv6_route_proc_fops))
+   ret = ipv6_route_proc_init(init_net);
+   if (ret)
goto out_fib6_init;
 
-   if (!proc_net_fops_create(init_net, rt6_stats,
- S_IRUGO, rt6_stats_seq_fops))
-   goto out_proc_ipv6_route;
-#endif
-
 #ifdef CONFIG_XFRM
ret = xfrm6_init();
if (ret)
-   goto out_proc_rt6_stats;
+   goto out_proc_init;
 #endif
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
ret = fib6_rules_init();
@@ -2517,14 +2544,10 @@ xfrm6_init:
 #endif
 #ifdef CONFIG_XFRM
xfrm6_fini();
-out_proc_rt6_stats:
 #endif
-#ifdef CONFIG_PROC_FS
-   proc_net_remove(init_net, rt6_stats);
-out_proc_ipv6_route:
-   proc_net_remove(init_net, ipv6_route);
+out_proc_init:
+   ipv6_route_proc_fini(init_net);
 out_fib6_init:
-#endif
rt6_ifdown(NULL);
fib6_gc_cleanup();
 out_kmem_cache:
@@ -2537,8 +2560,7 @@ void ip6_route_cleanup(void)
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
fib6_rules_cleanup();
 #endif
-   proc_net_remove(init_net, ipv6_route);
-   proc_net_remove(init_net, rt6_stats);
+   ipv6_route_proc_fini(init_net);
 #ifdef CONFIG_XFRM
xfrm6_fini();
 #endif

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/3][IPV6]: remove ifdef in route6 init/fini functions

2007-12-07 Thread Daniel Lezcano

The route6 init function is a little difficult to read because it contains 
a lot of ifdef. The patchset redefines the usual static inline functions when
the code is to be disabled by configuration, so we can call the code without
taking care of the config option in the init function.

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 2/3]sysctl: prepare core tables to point to netns variables

2007-12-07 Thread Pavel Emelyanov

Some of ctl variables are going to be on the struct 
net. Here's the way to adjust the -data pointer on the
ctl_table-s to point on the right variable.

Since some pointers still point on the global variables,
I keep turning the write bits off on such tables.

This looks to become a common procedure for net sysctls,
so later parts of this code may migrate to some more
generic place.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 57a7ead..dc4cf7d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -167,8 +167,13 @@ static __net_init int sysctl_core_net_init(struct net *net)
if (tbl == NULL)
goto err_dup;
 
-   for (tmp = tbl; tmp-procname; tmp++)
-   tmp-mode = ~0222;
+   for (tmp = tbl; tmp-procname; tmp++) {
+   if (tmp-data = (void *)init_net 
+   tmp-data  (void *)(init_net + 1))
+   tmp-data += (char *)net - (char *)init_net;
+   else
+   tmp-mode = ~0222;
+   }
}
 
net-sysctl_core_hdr = register_net_sysctl_table(net,
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread Ilpo Järvinen

On Fri, 7 Dec 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)

  I guess if you get a large cumulative ACK, the amount of processing is 
  still overwhelming (added DaveM if he has some idea how to combat it).

  Even a simple scenario (this isn't anything fancy at all, will occur all 
  the time): Just one loss = rest skbs grow one by one into a single 
  very large SACK block (and we do that efficiently for sure) = then the 
  fast retransmit gets delivered and a cumulative ACK for whole orig_window 
  arrives = clean_rtx_queue has to do a lot of processing. In this case we 
  could optimize RB-tree cleanup away (by just blanking it all) but still 
  getting rid of all those skbs is going to take a larger moment than I'd 
  like to see.

  That tree blanking could be extended to cover anything which ACK more than 
  half of the tree by just replacing the root (and dealing with potential 
  recolorization of the root).

 Yes, it's the classic problem.  But it ought to be at least
 partially masked when TSO is in use, because we'll only process
 a handful of SKBs.  The more effectively TSO batches, the
 less work clean_rtx_queue() will do.

No, that's not what is going to happen, TSO won't help at all
because one-by-one SACKs will fragment every single one of them
(see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
case, or am I missing something?

 Web100 just provides statistics and other kinds of connection data
 to userspace, all the actual algorithm etc. modifications have been
 merged upstream and yanked out of the web100 patch.  I was looking
 at it the other night and it's frankly totally uninteresting these
 days :-)

...Thanks, I'll keep that in my mind when looking... :-)

-- 
 i.

Re: [patch 0/3][IPV6]: remove ifdef in route6 init/fini functions

2007-12-07 Thread YOSHIFUJI Hideaki / 吉藤英明

In article [EMAIL PROTECTED] (at Fri, 07 Dec 2007 14:13:25 +0100), Daniel 
Lezcano [EMAIL PROTECTED] says:

 The route6 init function is a little difficult to read because it contains 
 a lot of ifdef. The patchset redefines the usual static inline functions when
 the code is to be disabled by configuration, so we can call the code without
 taking care of the config option in the init function.

Acked-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

--yoshfuji
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

IPsec replay sequence number overflow behavior? (RFC4303 section 3.3.3)

2007-12-07 Thread Paul Moore

Hello all,

As part of the IPv6 gap analysis that the Linux Foundation is currently 
doing I've been looking at the IPsec auditing requirements as defined in 
RFC4303 and I came across some odd behavior regarding SA sequence number 
overflows ...

RFC4303 states the following:

   3.3.3.  Sequence Number Generation

   The sender's counter is initialized to 0 when an SA is established.
   The sender increments the sequence number (or ESN) counter for this
   SA and inserts the low-order 32 bits of the value into the Sequence
   Number field.  Thus, the first packet sent using a given SA will
   contain a sequence number of 1.

   If anti-replay is enabled (the default), the sender checks to ensure
   that the counter has not cycled before inserting the new value in the
   Sequence Number field.  In other words, the sender MUST NOT send a
   packet on an SA if doing so would cause the sequence number to cycle.
   An attempt to transmit a packet that would result in sequence number
   overflow is an auditable event.  The audit log entry for this event
   SHOULD include the SPI value, current date/time, Source Address,
   Destination Address, and (in IPv6) the cleartext Flow ID.

The related code in net/xfrm/xfrm_output.c:xfrm_output() looks like this:

   if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
   XFRM_SKB_CB(skb)-seq = ++x-replay.oseq;
   if (xfrm_aevent_is_on())
   xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
   }

Which doesn't appear to take into account sequence number overflow at all.  
Granted, it does send notifications to userspace but it doesn't do anything 
to prevent the packet from being sent if the sequence number wraps.  I'm 
still a few years behind in my IPsec specifications so I could be missing 
something here (extended sequence numbers spring to mind and the kernel's 
curious mixing of 32bit and 64bit types for SA sequence number counters) but 
at first glance this appears to be a bug ... yes/no?

If it is a bug, I think the basic fix should be pretty simple, changing the 
above xfrm_output() code to the following:

   if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
   XFRM_SKB_CB(skb)-seq = ++x-replay.oseq;
+  if (x-replay.oseq == 0)
+  goto error;
   if (xfrm_aevent_is_on())
   xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
   }

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25 1/3] ipv4: no need pass pointer to a default into fib_detect_death

2007-12-07 Thread Denis V. Lunev

ipv4: no need pass pointer to a default into fib_detect_death

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Acked-by: Alexey Kuznetsov [EMAIL PROTECTED]
---
 net/ipv4/fib_hash.c  |4 ++--
 net/ipv4/fib_lookup.h|2 +-
 net/ipv4/fib_semantics.c |6 +++---
 net/ipv4/fib_trie.c  |4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 30ff657..76bb7fd 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -314,7 +314,7 @@ fn_hash_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
if (next_fi != res-fi)
break;
} else if (!fib_detect_death(fi, order, last_resort,
-last_idx, 
fn_hash_last_dflt)) {
+last_idx, 
fn_hash_last_dflt)) {
if (res-fi)
fib_info_put(res-fi);
res-fi = fi;
@@ -332,7 +332,7 @@ fn_hash_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
goto out;
}
 
-   if (!fib_detect_death(fi, order, last_resort, last_idx, 
fn_hash_last_dflt)) {
+   if (!fib_detect_death(fi, order, last_resort, last_idx, 
fn_hash_last_dflt)) {
if (res-fi)
fib_info_put(res-fi);
res-fi = fi;
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index eef9eec..6c9dd42 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -36,6 +36,6 @@ extern struct fib_alias *fib_find_alias(struct list_head *fah,
u8 tos, u32 prio);
 extern int fib_detect_death(struct fib_info *fi, int order,
struct fib_info **last_resort,
-   int *last_idx, int *dflt);
+   int *last_idx, int dflt);
 
 #endif /* _FIB_LOOKUP_H */
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index ec9b0dd..bbd4a24 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -346,7 +346,7 @@ struct fib_alias *fib_find_alias(struct list_head *fah, u8 
tos, u32 prio)
 }
 
 int fib_detect_death(struct fib_info *fi, int order,
-struct fib_info **last_resort, int *last_idx, int *dflt)
+struct fib_info **last_resort, int *last_idx, int dflt)
 {
struct neighbour *n;
int state = NUD_NONE;
@@ -358,10 +358,10 @@ int fib_detect_death(struct fib_info *fi, int order,
}
if (state==NUD_REACHABLE)
return 0;
-   if ((stateNUD_VALID)  order != *dflt)
+   if ((stateNUD_VALID)  order != dflt)
return 0;
if ((stateNUD_VALID) ||
-   (*last_idx0  order  *dflt)) {
+   (*last_idx0  order  dflt)) {
*last_resort = fi;
*last_idx = order;
}
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6385cca..914a0d2 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1827,7 +1827,7 @@ fn_trie_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
if (next_fi != res-fi)
break;
} else if (!fib_detect_death(fi, order, last_resort,
-last_idx, trie_last_dflt)) {
+last_idx, trie_last_dflt)) {
if (res-fi)
fib_info_put(res-fi);
res-fi = fi;
@@ -1843,7 +1843,7 @@ fn_trie_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
goto out;
}
 
-   if (!fib_detect_death(fi, order, last_resort, last_idx, 
trie_last_dflt)) {
+   if (!fib_detect_death(fi, order, last_resort, last_idx, 
trie_last_dflt)) {
if (res-fi)
fib_info_put(res-fi);
res-fi = fi;
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25 2/3] ipv4: unify assignment of fi to fib_result

2007-12-07 Thread Denis V. Lunev

ipv4: unify assignment of fi to fib_result

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Acked-by: Alexey Kuznetsov [EMAIL PROTECTED]
---
 net/ipv4/fib_hash.c   |   19 ---
 net/ipv4/fib_lookup.h |   10 ++
 net/ipv4/fib_trie.c   |   19 ---
 3 files changed, 18 insertions(+), 30 deletions(-)

diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 76bb7fd..a52b570 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -315,10 +315,7 @@ fn_hash_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
break;
} else if (!fib_detect_death(fi, order, last_resort,
 last_idx, 
fn_hash_last_dflt)) {
-   if (res-fi)
-   fib_info_put(res-fi);
-   res-fi = fi;
-   atomic_inc(fi-fib_clntref);
+   fib_result_assign(res, fi);
fn_hash_last_dflt = order;
goto out;
}
@@ -333,21 +330,13 @@ fn_hash_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
}
 
if (!fib_detect_death(fi, order, last_resort, last_idx, 
fn_hash_last_dflt)) {
-   if (res-fi)
-   fib_info_put(res-fi);
-   res-fi = fi;
-   atomic_inc(fi-fib_clntref);
+   fib_result_assign(res, fi);
fn_hash_last_dflt = order;
goto out;
}
 
-   if (last_idx = 0) {
-   if (res-fi)
-   fib_info_put(res-fi);
-   res-fi = last_resort;
-   if (last_resort)
-   atomic_inc(last_resort-fib_clntref);
-   }
+   if (last_idx = 0)
+   fib_result_assign(res, last_resort);
fn_hash_last_dflt = last_idx;
 out:
read_unlock(fib_hash_lock);
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index 6c9dd42..26ee66d 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -38,4 +38,14 @@ extern int fib_detect_death(struct fib_info *fi, int order,
struct fib_info **last_resort,
int *last_idx, int dflt);
 
+static inline void fib_result_assign(struct fib_result *res,
+struct fib_info *fi)
+{
+   if (res-fi != NULL)
+   fib_info_put(res-fi);
+   res-fi = fi;
+   if (fi != NULL)
+   atomic_inc(fi-fib_clntref);
+}
+
 #endif /* _FIB_LOOKUP_H */
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 914a0d2..29a06af 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1828,10 +1828,7 @@ fn_trie_select_default(struct fib_table *tb, const 
struct flowi *flp, struct fib
break;
} else if (!fib_detect_death(fi, order, last_resort,
 last_idx, trie_last_dflt)) {
-   if (res-fi)
-   fib_info_put(res-fi);
-   res-fi = fi;
-   atomic_inc(fi-fib_clntref);
+   fib_result_assign(res, fi);
trie_last_dflt = order;
goto out;
}
@@ -1844,20 +1841,12 @@ fn_trie_select_default(struct fib_table *tb, const 
struct flowi *flp, struct fib
}
 
if (!fib_detect_death(fi, order, last_resort, last_idx, 
trie_last_dflt)) {
-   if (res-fi)
-   fib_info_put(res-fi);
-   res-fi = fi;
-   atomic_inc(fi-fib_clntref);
+   fib_result_assign(res, fi);
trie_last_dflt = order;
goto out;
}
-   if (last_idx = 0) {
-   if (res-fi)
-   fib_info_put(res-fi);
-   res-fi = last_resort;
-   if (last_resort)
-   atomic_inc(last_resort-fib_clntref);
-   }
+   if (last_idx = 0)
+   fib_result_assign(res, last_resort);
trie_last_dflt = last_idx;
  out:;
rcu_read_unlock();
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 07/22] NET: DM9000: Use msleep() instead of udelay()

2007-12-07 Thread Ben Dooks

On Fri, Nov 23, 2007 at 08:39:45PM -0500, Jeff Garzik wrote:
 are you sure you cannot sleep during suspend?

Yes. This is not the first driver that has had this problem,
see the sm501 as another example.

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] AF_RXRPC: Add a missing goto

2007-12-07 Thread David Miller

From: David Howells [EMAIL PROTECTED]
Date: Fri, 07 Dec 2007 11:23:55 +

 Add a missing goto to error handling in the RXKAD security module for 
 AF_RXRPC.
 
 Signed-off-by: David Howells [EMAIL PROTECTED]

Applied, thanks David.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 6/6] ipv6 - route6/fib6 : dont panic a kmem_cache_create

2007-12-07 Thread David Miller

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:53:35 +0100

 If the kmem_cache_creation fails, the kernel will panic. It is acceptable
 if the system is booting, but if the ipv6 protocol is compiled as a module
 and it is loaded after the system has booted, do we want to panic instead
 of just failing to initialize the protocol ?

 The init function is now returning an error and this one is checked for
 protocol initialization. So the ipv6 protocol will safely fails.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Also applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc4-mm1

2007-12-07 Thread Ilpo Järvinen

On Wed, 5 Dec 2007, David Miller wrote:

 From: Reuben Farrelly [EMAIL PROTECTED]
 Date: Thu, 06 Dec 2007 17:59:37 +1100

  On 5/12/2007 4:17 PM, Andrew Morton wrote:
   - Lots of device IDs have been removed from the e1000 driver and moved 
   over
 to e1000e.  So if your e1000 stops working, you forgot to set 
   CONFIG_E1000E.

  This non fatal oops which I have just noticed may be related to this change 
  then 
  - certainly looks networking related.

  WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
  Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1

  Call Trace:
IRQ  [8046e038] tcp_fastretrans_alert+0x229/0xe63
[80470975] tcp_ack+0xa3f/0x127d
[804747b7] tcp_rcv_established+0x55f/0x7f8
[8047b1aa] tcp_v4_do_rcv+0xdb/0x3a7
[881148a8] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99

 No, it's from TCP assertions and changes added by Ilpo to the
 net-2.6.25 tree recently.

Yeah, this (very likely) due to the new SACK processing (in net-2.6.25). 
I'll look what could go wrong with fack_count calculations, most likely 
it's the reason (I've found earlier one out-of-place retransmission 
segment in one of my test case which already indicated that there's 
something incorrect with them but didn't have time to debug it yet).

Thanks for report. Some info about how easily you can reproduce  
couple of sentences about the test case might be useful later on when 
evaluating the fix.

-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)

 I guess if you get a large cumulative ACK, the amount of processing is 
 still overwhelming (added DaveM if he has some idea how to combat it).

 Even a simple scenario (this isn't anything fancy at all, will occur all 
 the time): Just one loss = rest skbs grow one by one into a single 
 very large SACK block (and we do that efficiently for sure) = then the 
 fast retransmit gets delivered and a cumulative ACK for whole orig_window 
 arrives = clean_rtx_queue has to do a lot of processing. In this case we 
 could optimize RB-tree cleanup away (by just blanking it all) but still 
 getting rid of all those skbs is going to take a larger moment than I'd 
 like to see.

 That tree blanking could be extended to cover anything which ACK more than 
 half of the tree by just replacing the root (and dealing with potential 
 recolorization of the root).

Yes, it's the classic problem.  But it ought to be at least
partially masked when TSO is in use, because we'll only process
a handful of SKBs.  The more effectively TSO batches, the
less work clean_rtx_queue() will do.

When not doing TSO the behavior is super-stupid, we bump reference
counts on the same page multiple times while running over the SKBs
since consequetive SKBs cover data in different spans of the same
page.

The core issue is that we have a poorly behaving data container,
and therefore that's obviously what we need to change.

Conceptually what we probably need to do is seperate the data
maintainence from the SKB objects themselves.  There is a blob
that maintains the paged data state for everything in the
retransmit queue.  SKBs are built and get the page pointers
but don't actually grab references to the pages, the blob
does that and it keeps track of how many SKB references to each
page there are, non-atomically.

The hardest part is dealing with the page lifetime issues.
Unfortunately, when we trim the rtx queue, references to the clones
can still exist in the driver output path.  It's a difficult problem
to overcome in fact, so in the end my suggestion above might not
even be workable.

 No idea about what it could do, haven't yet looked web100, I was planning 
 at some point of time...

Web100 just provides statistics and other kinds of connection data
to userspace, all the actual algorithm etc. modifications have been
merged upstream and yanked out of the web100 patch.  I was looking
at it the other night and it's frankly totally uninteresting these
days :-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] multiple namespaces in the all dst_ifdown routines

2007-12-07 Thread David Miller

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 6 Dec 2007 15:17:46 +0300

 move dst entries to a namespace loopback to catch refcounting leaks.
 
 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][VLAN] Merge tree equal tails in vlan_skb_recv

2007-12-07 Thread Patrick McHardy


Pavel Emelyanov wrote:

There are tree paths in it, that set the skb-proto and then
perform common receive manipulations (basically call netif_rx()).

I think, that we can make this code flow easier to understand
by introducing the vlan_set_encap_proto() function (I hope the 
name is good) to setup the skb proto and merge the paths calling 
netif_rx() together.


Surprisingly, but gcc detects this thing and merges these paths
by itself, so this patch doesn't make the vlan module smaller.



I already have something similar queued, but your patch is a nice
cleanup on top. I'll merge it into my tree and send it out after
some testing, hopefully today.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25] Cleanup IN_DEV_MFORWARD macro

2007-12-07 Thread Pavel Emelyanov

This is essentially IN_DEV_ANDCONF with proper arguments.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index dd093ea..962a062 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -78,9 +78,7 @@ static inline void ipv4_devconf_setall(struct in_device 
*in_dev)
(max(IPV4_DEVCONF_ALL(attr), IN_DEV_CONF_GET((in_dev), attr)))
 
 #define IN_DEV_FORWARD(in_dev) IN_DEV_CONF_GET((in_dev), FORWARDING)
-#define IN_DEV_MFORWARD(in_dev)
(IPV4_DEVCONF_ALL(MC_FORWARDING)  \
-IPV4_DEVCONF((in_dev)-cnf, \
- MC_FORWARDING))
+#define IN_DEV_MFORWARD(in_dev)IN_DEV_ANDCONF((in_dev), 
MC_FORWARDING)
 #define IN_DEV_RPFILTER(in_dev)IN_DEV_ANDCONF((in_dev), 
RP_FILTER)
 #define IN_DEV_SOURCE_ROUTE(in_dev)IN_DEV_ANDCONF((in_dev), \
   ACCEPT_SOURCE_ROUTE)
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/6] ipv6 - make fib6_init to return an error code

2007-12-07 Thread David Miller

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:53:30 +0100

 If there is an error in the initialization function, nothing is followed up
 to the caller. So I add a return value to be set for the init function.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Applied.

Please format your header subject lines as:

[patch N/M] [IPV6]: Blah blah blah.

Since this is what I edit them into anyways.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/20] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT

2007-12-07 Thread David Miller

From: Denis Cheng [EMAIL PROTECTED]
Date: Fri,  7 Dec 2007 00:04:36 +0800

 single list_head variable initialized with LIST_HEAD_INIT could almost
 always can be replaced with LIST_HEAD declaration, this shrinks the code
 and looks better.

 Signed-off-by: Denis Cheng [EMAIL PROTECTED]

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 4/6] ipv6 - make ip6_route_init to return an error code

2007-12-07 Thread David Miller

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:53:33 +0100

 The route initialization function does not return any value to notify if
 the initialization is successful or not. This patch checks all calls made
 for the initilization in order to return a value for the caller.

 Unfortunatly, proc_net_fops_create will return a NULL pointer if 
 CONFIG_PROC_FS
 is off, so we can not check the return code without an ifdef CONFIG_PROC_FS 
 block in the ip6_route_init function.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25] Cleanup sysctl manipulations in devinet.c

2007-12-07 Thread Pavel Emelyanov

This includes:

 * moving neigh_sysctl_(un)register calls inside
   devinet_sysctl_(un)register ones, as they are always
   called in pairs;
 * making __devinet_sysctl_unregister() to unregister
   the ipv4_devconf struct, while original devinet_sysctl_unregister()
   works with the in_device to handle both - devconf and
   neigh sysctls;
 * make stubs for CONFIG_SYSCTL=n case to get rid of
   in-code ifdefs.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0b5f042..872883e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -99,7 +99,14 @@ static void inet_del_ifa(struct in_device *in_dev, struct 
in_ifaddr **ifap,
 int destroy);
 #ifdef CONFIG_SYSCTL
 static void devinet_sysctl_register(struct in_device *idev);
-static void devinet_sysctl_unregister(struct ipv4_devconf *p);
+static void devinet_sysctl_unregister(struct in_device *idev);
+#else
+static inline void devinet_sysctl_register(struct in_device *idev)
+{
+}
+static inline void devinet_sysctl_unregister(struct in_device *idev)
+{
+}
 #endif
 
 /* Locks all the inet devices. */
@@ -163,17 +170,10 @@ static struct in_device *inetdev_init(struct net_device 
*dev)
goto out_kfree;
/* Reference in_dev-dev */
dev_hold(dev);
-#ifdef CONFIG_SYSCTL
-   neigh_sysctl_register(dev, in_dev-arp_parms, NET_IPV4,
- NET_IPV4_NEIGH, ipv4, NULL, NULL);
-#endif
-
/* Account for reference dev-ip_ptr (below) */
in_dev_hold(in_dev);
 
-#ifdef CONFIG_SYSCTL
devinet_sysctl_register(in_dev);
-#endif
ip_mc_init_dev(in_dev);
if (dev-flags  IFF_UP)
ip_mc_up(in_dev);
@@ -212,15 +212,9 @@ static void inetdev_destroy(struct in_device *in_dev)
inet_free_ifa(ifa);
}
 
-#ifdef CONFIG_SYSCTL
-   devinet_sysctl_unregister(in_dev-cnf);
-#endif
-
dev-ip_ptr = NULL;
 
-#ifdef CONFIG_SYSCTL
-   neigh_sysctl_unregister(in_dev-arp_parms);
-#endif
+   devinet_sysctl_unregister(in_dev);
neigh_parms_release(arp_tbl, in_dev-arp_parms);
arp_ifdown(dev);
 
@@ -1114,13 +1108,8 @@ static int inetdev_event(struct notifier_block *this, 
unsigned long event,
 */
inetdev_changename(dev, in_dev);
 
-#ifdef CONFIG_SYSCTL
-   devinet_sysctl_unregister(in_dev-cnf);
-   neigh_sysctl_unregister(in_dev-arp_parms);
-   neigh_sysctl_register(dev, in_dev-arp_parms, NET_IPV4,
- NET_IPV4_NEIGH, ipv4, NULL, NULL);
+   devinet_sysctl_unregister(in_dev);
devinet_sysctl_register(in_dev);
-#endif
break;
}
 out:
@@ -1519,21 +1508,31 @@ out:
return;
 }
 
+static void __devinet_sysctl_unregister(struct ipv4_devconf *cnf)
+{
+   struct devinet_sysctl_table *t = cnf-sysctl;
+
+   if (t == NULL)
+   return;
+
+   cnf-sysctl = NULL;
+   unregister_sysctl_table(t-sysctl_header);
+   kfree(t-dev_name);
+   kfree(t);
+}
+
 static void devinet_sysctl_register(struct in_device *idev)
 {
-   return __devinet_sysctl_register(idev-dev-name, idev-dev-ifindex,
+   neigh_sysctl_register(idev-dev, idev-arp_parms, NET_IPV4,
+   NET_IPV4_NEIGH, ipv4, NULL, NULL);
+   __devinet_sysctl_register(idev-dev-name, idev-dev-ifindex,
idev-cnf);
 }
 
-static void devinet_sysctl_unregister(struct ipv4_devconf *p)
+static void devinet_sysctl_unregister(struct in_device *idev)
 {
-   if (p-sysctl) {
-   struct devinet_sysctl_table *t = p-sysctl;
-   p-sysctl = NULL;
-   unregister_sysctl_table(t-sysctl_header);
-   kfree(t-dev_name);
-   kfree(t);
-   }
+   __devinet_sysctl_unregister(idev-cnf);
+   neigh_sysctl_unregister(idev-arp_parms);
 }
 #endif
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/6] ipv6 - make xfrm6_init to return an error code

2007-12-07 Thread David Miller

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:53:31 +0100

 The xfrm initialization function does not return any error code, so
 if there is an error, the caller can not be advise of that.
 This patch checks the return code of the different called functions
 in order to return a successful or failed initialization.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 5/6] ipv6 - make af_inet6 to check ip6_route_init return value

2007-12-07 Thread David Miller

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:53:34 +0100

 The af_inet6 initialization function does not check the return code
 of the route initilization, so if something goes wrong, the protocol
 initialization will continue anyway.
 This patch takes into account the modification made in the different
 route's initialization subroutines to check the return value and to 
 make the protocol initialization to fail.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Applied, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/20] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT

2007-12-07 Thread David Miller

From: Denis Cheng [EMAIL PROTECTED]
Date: Fri,  7 Dec 2007 00:01:26 +0800

 single list_head variable initialized with LIST_HEAD_INIT could almost
 always can be replaced with LIST_HEAD declaration, this shrinks the code
 and looks better.

 Signed-off-by: Denis Cheng [EMAIL PROTECTED]

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/3][IPV6]: remove ifdef in route6 for xfrm6

2007-12-07 Thread Daniel Lezcano

The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 include/net/xfrm.h |   16 +---
 net/ipv6/route.c   |7 +--
 2 files changed, 14 insertions(+), 9 deletions(-)

Index: net-2.6.25/include/net/xfrm.h
===
--- net-2.6.25.orig/include/net/xfrm.h
+++ net-2.6.25/include/net/xfrm.h
@@ -842,7 +842,6 @@ xfrm_state_addr_cmp(struct xfrm_tmpl *tm
 }
 
 #ifdef CONFIG_XFRM
-
 extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, 
unsigned short family);
 
 static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff 
*skb, unsigned short family)
@@ -1066,12 +1065,23 @@ struct xfrm6_tunnel {
 
 extern void xfrm_init(void);
 extern void xfrm4_init(void);
-extern int xfrm6_init(void);
-extern void xfrm6_fini(void);
 extern void xfrm_state_init(void);
 extern void xfrm4_state_init(void);
+#ifdef CONFIG_XFRM
+extern int xfrm6_init(void);
+extern void xfrm6_fini(void);
 extern int xfrm6_state_init(void);
 extern void xfrm6_state_fini(void);
+#else
+static inline int xfrm6_init(void)
+{
+   return 0;
+}
+static inline void xfrm6_fini(void)
+{
+   ;
+}
+#endif
 
 extern int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, 
void*), void *);
 extern struct xfrm_state *xfrm_state_alloc(void);
Index: net-2.6.25/net/ipv6/route.c
===
--- net-2.6.25.orig/net/ipv6/route.c
+++ net-2.6.25/net/ipv6/route.c
@@ -2517,11 +2517,10 @@ int __init ip6_route_init(void)
if (ret)
goto out_fib6_init;
 
-#ifdef CONFIG_XFRM
ret = xfrm6_init();
if (ret)
goto out_proc_init;
-#endif
+
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
ret = fib6_rules_init();
if (ret)
@@ -2542,9 +2541,7 @@ fib6_rules_init:
fib6_rules_cleanup();
 xfrm6_init:
 #endif
-#ifdef CONFIG_XFRM
xfrm6_fini();
-#endif
 out_proc_init:
ipv6_route_proc_fini(init_net);
 out_fib6_init:
@@ -2561,9 +2558,7 @@ void ip6_route_cleanup(void)
fib6_rules_cleanup();
 #endif
ipv6_route_proc_fini(init_net);
-#ifdef CONFIG_XFRM
xfrm6_fini();
-#endif
rt6_ifdown(NULL);
fib6_gc_cleanup();
kmem_cache_destroy(ip6_dst_ops.kmem_cachep);

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 1/3]sysctl: make the sys.net.core sysctls per-namespace

2007-12-07 Thread Pavel Emelyanov

Making them per-namespace is required for the following 
two reasons:

 First, some ctl values have a per-namespace meaning.
 Second, making them writable from the sub-namespace
 is an isolation hole.

So I introduce the pernet operations to create these
tables. For init_net I use the existing statically
declared tables, for sub-namespace they are duplicated
and the write bits are removed from the mode.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index f97b2a4..d593611 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -37,6 +37,9 @@ struct net {
 
struct sock *rtnl;  /* rtnetlink socket */
 
+   /* core sysctls */
+   struct ctl_table_header *sysctl_core_hdr;
+
/* List of all packet sockets. */
rwlock_tpacket_sklist_lock;
struct hlist_head   packet_sklist;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index e322713..57a7ead 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -151,18 +151,58 @@ static struct ctl_table net_core_table[] = {
{ .ctl_name = 0 }
 };
 
-static __initdata struct ctl_path net_core_path[] = {
+static __net_initdata struct ctl_path net_core_path[] = {
{ .procname = net, .ctl_name = CTL_NET, },
{ .procname = core, .ctl_name = NET_CORE, },
{ },
 };
 
-static __init int sysctl_core_init(void)
+static __net_init int sysctl_core_net_init(struct net *net)
 {
-   struct ctl_table_header *hdr;
+   struct ctl_table *tbl, *tmp;
+
+   tbl = net_core_table;
+   if (net != init_net) {
+   tbl = kmemdup(tbl, sizeof(net_core_table), GFP_KERNEL);
+   if (tbl == NULL)
+   goto err_dup;
+
+   for (tmp = tbl; tmp-procname; tmp++)
+   tmp-mode = ~0222;
+   }
+
+   net-sysctl_core_hdr = register_net_sysctl_table(net,
+   net_core_path, tbl);
+   if (net-sysctl_core_hdr == NULL)
+   goto err_reg;
 
-   hdr = register_sysctl_paths(net_core_path, net_core_table);
-   return hdr == NULL ? -ENOMEM : 0;
+   return 0;
+
+err_reg:
+   if (tbl != net_core_table)
+   kfree(tbl);
+err_dup:
+   return -ENOMEM;
+}
+
+static __net_exit void sysctl_core_net_exit(struct net *net)
+{
+   struct ctl_table *tbl;
+
+   tbl = net-sysctl_core_hdr-ctl_table_arg;
+   unregister_net_sysctl_table(net-sysctl_core_hdr);
+   BUG_ON(tbl == net_core_table);
+   kfree(tbl);
+}
+
+static __net_initdata struct pernet_operations sysctl_core_ops = {
+   .init = sysctl_core_net_init,
+   .exit = sysctl_core_net_exit,
+};
+
+static __init int sysctl_core_init(void)
+{
+   return register_pernet_subsys(sysctl_core_ops);
 }
 
 __initcall(sysctl_core_init);
-- 
1.5.3.4


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 3/3][IPV6]: route6 remove ifdef for fib_rules

2007-12-07 Thread Daniel Lezcano

The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 include/net/ip6_fib.h |   12 +++-
 net/ipv6/route.c  |7 +--
 2 files changed, 12 insertions(+), 7 deletions(-)

Index: net-2.6.25/include/net/ip6_fib.h
===
--- net-2.6.25.orig/include/net/ip6_fib.h
+++ net-2.6.25/include/net/ip6_fib.h
@@ -226,8 +226,18 @@ extern voidfib6_gc_cleanup(void);
 
 extern int fib6_init(void);
 
+#ifdef CONFIG_IPV6_MULTIPLE_TABLES
 extern int fib6_rules_init(void);
 extern voidfib6_rules_cleanup(void);
-
+#else
+static inline int   fib6_rules_init(void)
+{
+   return 0;
+}
+static inline void  fib6_rules_cleanup(void)
+{
+   return ;
+}
+#endif
 #endif
 #endif
Index: net-2.6.25/net/ipv6/route.c
===
--- net-2.6.25.orig/net/ipv6/route.c
+++ net-2.6.25/net/ipv6/route.c
@@ -2521,11 +2521,10 @@ int __init ip6_route_init(void)
if (ret)
goto out_proc_init;
 
-#ifdef CONFIG_IPV6_MULTIPLE_TABLES
ret = fib6_rules_init();
if (ret)
goto xfrm6_init;
-#endif
+
ret = -ENOBUFS;
if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL) ||
__rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL) ||
@@ -2537,10 +2536,8 @@ out:
return ret;
 
 fib6_rules_init:
-#ifdef CONFIG_IPV6_MULTIPLE_TABLES
fib6_rules_cleanup();
 xfrm6_init:
-#endif
xfrm6_fini();
 out_proc_init:
ipv6_route_proc_fini(init_net);
@@ -2554,9 +2551,7 @@ out_kmem_cache:
 
 void ip6_route_cleanup(void)
 {
-#ifdef CONFIG_IPV6_MULTIPLE_TABLES
fib6_rules_cleanup();
-#endif
ipv6_route_proc_fini(init_net);
xfrm6_fini();
rt6_ifdown(NULL);

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 3/3]sysctl: make sysctl_somaxconn per-namespace

2007-12-07 Thread Pavel Emelyanov

Just move the variable on the struct net and adjust
its usage.

Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/linux/socket.h b/include/linux/socket.h
index eb5bdd5..bd2b30a 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -24,7 +24,6 @@ struct __kernel_sockaddr_storage {
 #include linux/types.h   /* pid_t*/
 #include linux/compiler.h/* __user   */
 
-extern int sysctl_somaxconn;
 #ifdef CONFIG_PROC_FS
 struct seq_file;
 extern void socket_seq_show(struct seq_file *seq);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index d593611..b62e31f 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -39,6 +39,7 @@ struct net {
 
/* core sysctls */
struct ctl_table_header *sysctl_core_hdr;
+   int sysctl_somaxconn;
 
/* List of all packet sockets. */
rwlock_tpacket_sklist_lock;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index dc4cf7d..130338f 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -127,7 +127,7 @@ static struct ctl_table net_core_table[] = {
{
.ctl_name   = NET_CORE_SOMAXCONN,
.procname   = somaxconn,
-   .data   = sysctl_somaxconn,
+   .data   = init_net.sysctl_somaxconn,
.maxlen = sizeof(int),
.mode   = 0644,
.proc_handler   = proc_dointvec
@@ -161,6 +161,8 @@ static __net_init int sysctl_core_net_init(struct net *net)
 {
struct ctl_table *tbl, *tmp;
 
+   net-sysctl_somaxconn = SOMAXCONN;
+
tbl = net_core_table;
if (net != init_net) {
tbl = kmemdup(tbl, sizeof(net_core_table), GFP_KERNEL);
diff --git a/net/socket.c b/net/socket.c
index 9ebca5c..7651de0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1365,17 +1365,17 @@ asmlinkage long sys_bind(int fd, struct sockaddr __user 
*umyaddr, int addrlen)
  * ready for listening.
  */
 
-int sysctl_somaxconn __read_mostly = SOMAXCONN;
-
 asmlinkage long sys_listen(int fd, int backlog)
 {
struct socket *sock;
int err, fput_needed;
+   int somaxconn;
 
sock = sockfd_lookup_light(fd, err, fput_needed);
if (sock) {
-   if ((unsigned)backlog  sysctl_somaxconn)
-   backlog = sysctl_somaxconn;
+   somaxconn = sock-sk-sk_net-sysctl_somaxconn;
+   if ((unsigned)backlog  somaxconn)
+   backlog = somaxconn;
 
err = security_socket_listen(sock, backlog);
if (!err)
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements

2007-12-07 Thread Richard Knutsson

David Miller wrote:

From: Richard Knutsson [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 15:37:46 +0100

David Miller wrote:

But this time I'll just let you know up front that I
don't see much value in this patch.  It is not a clear
improvement to replace int's with bool's in my mind and
the other changes are just whitespace changes.

Is it not an improvement to distinct booleans from actual values? Do you 
use integers for ASCII characters too? It can also avoid some potential 
bugs like the 'if (i == TRUE)'...
What is wrong with 'size_t' (since it is unsigned, compared to (some) 
'int')?

When you say int found; is there any doubt in your mind that
this integer is going to hold a 1 or a 0 depending upon whether
we found something?

That's the problem I have with these kinds of patches, they do
not increase clarity, it's just pure mindless edits.

But is there not a good thing if also the compiler knows + names are 
sometime not as clear as that one?

In new code, fine, use booleans if you want.

I would even accept that it helps to change to boolean for
arguments to functions that are global in scope.

But not for function local variables in cases like this.

Oh, I see your point now. Believed it to be yet another 'booleans is not 
C idiom'.

Sorry about the noise
Richard Knutsson

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 06/22] NET: DM9000: Use kthread to probe MII status when device open

2007-12-07 Thread Ben Dooks

On Fri, Nov 23, 2007 at 08:38:51PM -0500, Jeff Garzik wrote:
 seems like a delayed workqueue would be most appropriate for this.

I like the fact that the use of kthread shows the user how much
cpu time is being used by the execution of monitoring the phy. How
badly do people object to using a kthread?

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25 3/3] ipv4: last default route is a fib table property

2007-12-07 Thread Denis V. Lunev

ipv4: last default route is a fib table property

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Acked-by: Alexey Kuznetsov [EMAIL PROTECTED]
---
 include/net/ip_fib.h |1 +
 net/ipv4/fib_hash.c  |   16 
 net/ipv4/fib_trie.c  |   18 +-
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 690fb4d..d70b9b4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -141,6 +141,7 @@ struct fib_table {
struct hlist_node tb_hlist;
u32 tb_id;
unsignedtb_stamp;
+   int tb_default;
int (*tb_lookup)(struct fib_table *tb, const struct flowi 
*flp, struct fib_result *res);
int (*tb_insert)(struct fib_table *, struct fib_config *);
int (*tb_delete)(struct fib_table *, struct fib_config *);
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index a52b570..481de47 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -272,8 +272,6 @@ out:
return err;
 }
 
-static int fn_hash_last_dflt=-1;
-
 static void
 fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct 
fib_result *res)
 {
@@ -314,9 +312,9 @@ fn_hash_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
if (next_fi != res-fi)
break;
} else if (!fib_detect_death(fi, order, last_resort,
-last_idx, 
fn_hash_last_dflt)) {
+   last_idx, tb-tb_default)) {
fib_result_assign(res, fi);
-   fn_hash_last_dflt = order;
+   tb-tb_default = order;
goto out;
}
fi = next_fi;
@@ -325,19 +323,20 @@ fn_hash_select_default(struct fib_table *tb, const struct 
flowi *flp, struct fib
}
 
if (order = 0 || fi == NULL) {
-   fn_hash_last_dflt = -1;
+   tb-tb_default = -1;
goto out;
}
 
-   if (!fib_detect_death(fi, order, last_resort, last_idx, 
fn_hash_last_dflt)) {
+   if (!fib_detect_death(fi, order, last_resort, last_idx,
+   tb-tb_default)) {
fib_result_assign(res, fi);
-   fn_hash_last_dflt = order;
+   tb-tb_default = order;
goto out;
}
 
if (last_idx = 0)
fib_result_assign(res, last_resort);
-   fn_hash_last_dflt = last_idx;
+   tb-tb_default = last_idx;
 out:
read_unlock(fib_hash_lock);
 }
@@ -772,6 +771,7 @@ struct fib_table * __init fib_hash_init(u32 id)
return NULL;
 
tb-tb_id = id;
+   tb-tb_default = -1;
tb-tb_lookup = fn_hash_lookup;
tb-tb_insert = fn_hash_insert;
tb-tb_delete = fn_hash_delete;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 29a06af..850165a 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1779,8 +1779,6 @@ static int fn_trie_flush(struct fib_table *tb)
return found;
 }
 
-static int trie_last_dflt = -1;
-
 static void
 fn_trie_select_default(struct fib_table *tb, const struct flowi *flp, struct 
fib_result *res)
 {
@@ -1827,28 +1825,29 @@ fn_trie_select_default(struct fib_table *tb, const 
struct flowi *flp, struct fib
if (next_fi != res-fi)
break;
} else if (!fib_detect_death(fi, order, last_resort,
-last_idx, trie_last_dflt)) {
+last_idx, tb-tb_default)) {
fib_result_assign(res, fi);
-   trie_last_dflt = order;
+   tb-tb_default = order;
goto out;
}
fi = next_fi;
order++;
}
if (order = 0 || fi == NULL) {
-   trie_last_dflt = -1;
+   tb-tb_default = -1;
goto out;
}
 
-   if (!fib_detect_death(fi, order, last_resort, last_idx, 
trie_last_dflt)) {
+   if (!fib_detect_death(fi, order, last_resort, last_idx,
+   tb-tb_default)) {
fib_result_assign(res, fi);
-   trie_last_dflt = order;
+   tb-tb_default = order;
goto out;
}
if (last_idx = 0)
fib_result_assign(res, last_resort);
-   trie_last_dflt = last_idx;
- out:;
+   tb-tb_default = last_idx;
+out:
rcu_read_unlock();
 }
 
@@ -1975,6 +1974,7 @@ struct fib_table * __init fib_hash_init(u32 id)
return NULL;
 
tb-tb_id = id;
+   tb-tb_default = -1;
tb-tb_lookup =

Re: [patch 22/22] NET: DM9000: Show the MAC address source after printing MAC

2007-12-07 Thread Ben Dooks

On Fri, Nov 23, 2007 at 08:43:04PM -0500, Jeff Garzik wrote:
 ACK patches 16-22

Is reposting here ok to get these queued for the next kernel
release, or are there people to CC: for this?

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Use BUILD_BUG_ON in inet_timewait_sock.c checks

2007-12-07 Thread Pavel Emelyanov

Make the INET_TWDR_TWKILL_SLOTS vs sizeof(twdr-thread_slots)
check nicer.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index a60b99e..d43e787 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -194,16 +194,14 @@ out:
 
 EXPORT_SYMBOL_GPL(inet_twdr_hangman);
 
-extern void twkill_slots_invalid(void);
-
 void inet_twdr_twkill_work(struct work_struct *work)
 {
struct inet_timewait_death_row *twdr =
container_of(work, struct inet_timewait_death_row, twkill_work);
int i;
 
-   if ((INET_TWDR_TWKILL_SLOTS - 1)  (sizeof(twdr-thread_slots) * 8))
-   twkill_slots_invalid();
+   BUILD_BUG_ON((INET_TWDR_TWKILL_SLOTS - 1) 
+   (sizeof(twdr-thread_slots) * 8));
 
while (twdr-thread_slots) {
spin_lock_bh(twdr-death_lock);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Use BUILD_BUG_ON for tcp_skb_cb size checking

2007-12-07 Thread Pavel Emelyanov

The sizeof(struct tcp_skb_cb) should not be less than the
sizeof(skb-cb). This is checked in net/ipv4/tcp.c, but
this check can be made more gracefully.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8e65182..c8bebd3 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2411,7 +2411,6 @@ void tcp_done(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(tcp_done);
 
-extern void __skb_cb_too_small_for_tcp(int, int);
 extern struct tcp_congestion_ops tcp_reno;
 
 static __initdata unsigned long thash_entries;
@@ -2430,9 +2429,7 @@ void __init tcp_init(void)
unsigned long limit;
int order, i, max_share;
 
-   if (sizeof(struct tcp_skb_cb)  sizeof(skb-cb))
-   __skb_cb_too_small_for_tcp(sizeof(struct tcp_skb_cb),
-  sizeof(skb-cb));
+   BUILD_BUG_ON(sizeof(struct tcp_skb_cb)  sizeof(skb-cb));
 
tcp_hashinfo.bind_bucket_cachep =
kmem_cache_create(tcp_bind_bucket,
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx.

2007-12-07 Thread David Miller

From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Fri, 07 Dec 2007 10:41:48 -0800 (PST)

 RTCF_xxx flags, defined in include/linux/in_route.h) are available for
 IPv4 route (rtable) entries only.  Use RTF_xxx flags instead,
 defined in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info).

 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

2.6.24 net driver mis-patch

2007-12-07 Thread David Miller


Jeff, this belonged in netdev-2.6, Linus's tree doesn't have the BNX2X
driver yet, your 2.6.25 bound tree does.

As a result you added the ZLIB_INFLATE dependency to the TEHUTI driver
in 2.6.24 instead of BNX2X where it belongs, please fix this, thanks
:-)

commit 70eba18b5664f90d7620905e005b89388e5fd94b
Author: Eliezer Tamir [EMAIL PROTECTED]
Date:   Wed Dec 5 16:12:39 2007 +0200

make bnx2x select ZLIB_INFLATE

The bnx2x module depends on the zlib_inflate functions.  The
build will fail if ZLIB_INFLATE has not been selected manually
or by building another module that automatically selects it.

Modify BNX2X config option to 'select ZLIB_INFLATE' like BNX2
and others.  This seems to fix it.

Signed-off-by:  Lee Schermerhorn [EMAIL PROTECTED]
Acked-by: Eliezer Tamir [EMAIL PROTECTED]
Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index d9107e5..6cde4ed 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2588,6 +2588,7 @@ config MLX4_DEBUG
 config TEHUTI
tristate Tehuti Networks 10G Ethernet
depends on PCI
+   select ZLIB_INFLATE
help
  Tehuti Networks 10G Ethernet NIC
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] iproute2: support dotted-quad netmask notation.

2007-12-07 Thread Andreas Henriksson


On tor, 2007-12-06 at 11:53 -0800, Stephen Hemminger wrote:
 On Tue, 4 Dec 2007 14:58:18 +0100
 Andreas Henriksson [EMAIL PROTECTED] wrote:
 
  Suggested patch for allowing netmask to be specified in dotted quad format.
  See http://bugs.debian.org/357172
  
  (Known problem: this will not prevent some invalid syntaxes,
  ie. 255.0.255.0 will be treated as 255.255.255.0)
  
  Comments? Suggestions? Improvements?
 
 Fix the bug you mentioned?
 
 [... snip example code ...]

Updated patch, added your netmask validation code but without the check
that made 0.0.0.0 (default) and 255.255.255.255 (one address) invalid
netmasks as they are permitted in CIDR format. 

Signed-off-by: Andreas Henriksson [EMAIL PROTECTED]

diff --git a/lib/utils.c b/lib/utils.c
index 4c42dfd..b4a6125 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -47,6 +47,41 @@ int get_integer(int *val, const char *arg, int base)
return 0;
 }
 
+/* a valid netmask must be 2^n - 1 (n = 1..31) */
+static int is_valid_netmask(const inet_prefix *addr)
+{
+uint32_t host;
+
+if (addr-family != AF_INET)
+return 0;
+
+host = ~ntohl(addr-data[0]);
+
+return (host  (host + 1)) == 0;
+}
+
+static int get_netmask(unsigned *val, const char *arg, int base)
+{
+   inet_prefix addr;
+
+   if (!get_unsigned(val, arg, base))
+   return 0;
+
+   /* try coverting dotted quad to CIDR */
+   if (!get_addr_1(addr, arg, AF_INET)) {
+   u_int32_t mask;
+
+   *val=0;
+   for (mask = addr.data[0]; mask; mask = 1)
+   (*val)++;
+
+   if (is_valid_netmask(addr))
+   return 0;
+   }
+
+   return -1;
+}
+
 int get_unsigned(unsigned *val, const char *arg, int base)
 {
unsigned long res;
@@ -304,7 +339,8 @@ int get_prefix_1(inet_prefix *dst, char *arg, int family)
dst-bitlen = 32;
}
if (slash) {
-   if (get_unsigned(plen, slash+1, 0) || plen  
dst-bitlen) {
+   if (get_netmask(plen, slash+1, 0)
+   || plen  dst-bitlen) {
err = -1;
goto done;
}


-- 
Regards,
Andreas Henriksson

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-2.6.25] Cleanup IN_DEV_MFORWARD macro

2007-12-07 Thread Herbert Xu

On Fri, Dec 07, 2007 at 07:19:38PM +0300, Pavel Emelyanov wrote:
 This is essentially IN_DEV_ANDCONF with proper arguments.
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

Acked-by: Herbert Xu [EMAIL PROTECTED]

Thanks Pavel! I must have written that one before writing the
AND macro :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] rlim qdisc support.

2007-12-07 Thread Stephen Hemminger

Setup code for new rlim qdisc. For use by anyone who wants to
test rlim before kernel inclusion.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 include/linux/pkt_sched.h |6 ++
 tc/Makefile   |1 +
 tc/q_rlim.c   |  115 +
 3 files changed, 122 insertions(+), 0 deletions(-)
 create mode 100644 tc/q_rlim.c

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 919af93..7973dc4 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -475,4 +475,10 @@ struct tc_netem_corrupt
 
 #define NETEM_DIST_SCALE   8192
 
+struct tc_rlim_qopt
+{
+   __u32   limit;  /* fifo limit (packets) */
+   __u32   rate;   /* bits per sec */
+};
+
 #endif
diff --git a/tc/Makefile b/tc/Makefile
index a715566..e46954d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -13,6 +13,7 @@ TCMODULES += q_tbf.o
 TCMODULES += q_cbq.o
 TCMODULES += q_rr.o
 TCMODULES += q_netem.o
+TCMODULES += q_rlim.o
 TCMODULES += f_rsvp.o
 TCMODULES += f_u32.o
 TCMODULES += f_route.o
diff --git a/tc/q_rlim.c b/tc/q_rlim.c
new file mode 100644
index 000..5f634a8
--- /dev/null
+++ b/tc/q_rlim.c
@@ -0,0 +1,115 @@
+/*
+ * q_rlim.cRLIM.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Stephen Hemminger [EMAIL PROTECTED]
+ *
+ */
+
+#include stdio.h
+#include stdlib.h
+#include unistd.h
+#include syslog.h
+#include fcntl.h
+#include sys/socket.h
+#include netinet/in.h
+#include arpa/inet.h
+#include string.h
+
+#include utils.h
+#include tc_util.h
+
+static void explain(void)
+{
+   fprintf(stderr, Usage: ... rlim limit PACKETS rate KBPS\n);
+}
+
+static void explain1(char *arg)
+{
+   fprintf(stderr, Illegal \%s\\n, arg);
+}
+
+
+#define usage() return(-1)
+
+static int rlim_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct 
nlmsghdr *n)
+{
+   struct tc_rlim_qopt opt;
+   unsigned x;
+
+   memset(opt, 0, sizeof(opt));
+
+   while (argc  0) {
+   if (matches(*argv, limit) == 0) {
+   NEXT_ARG();
+   if (opt.limit) {
+   fprintf(stderr, Double \limit\ spec\n);
+   return -1;
+   }
+   if (get_size(opt.limit, *argv)) {
+   explain1(limit);
+   return -1;
+   }
+   } else if (strcmp(*argv, rate) == 0) {
+   NEXT_ARG();
+   if (opt.rate) {
+   fprintf(stderr, Double \rate\ spec\n);
+   return -1;
+   }
+
+   if (get_rate(x, *argv)) {
+   explain1(rate);
+   return -1;
+   }
+   opt.rate = x;
+   } else if (strcmp(*argv, help) == 0) {
+   explain();
+   return -1;
+   } else {
+   fprintf(stderr, What is \%s\?\n, *argv);
+   explain();
+   return -1;
+   }
+   argc--; argv++;
+   }
+
+   if (opt.rate == 0) {
+   fprintf(stderr, \rate\ is required.\n);
+   return -1;
+   }
+
+   if (opt.limit == 0) {
+   fprintf(stderr, \limit\ is required.\n);
+   return -1;
+   }
+
+   addattr_l(n, 1024, TCA_OPTIONS, opt, sizeof(opt));
+   return 0;
+}
+
+static int rlim_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+   struct tc_rlim_qopt *qopt;
+   SPRINT_BUF(b1);
+
+   if (opt == NULL)
+   return 0;
+
+   if (RTA_PAYLOAD(opt)   sizeof(*qopt))
+   return -1;
+   qopt = RTA_DATA(opt);
+   fprintf(f, limit %up rate %s, qopt-limit, sprint_rate(qopt-rate, 
b1));
+
+   return 0;
+}
+
+struct qdisc_util rlim_qdisc_util = {
+   .id = rlim,
+   .parse_qopt = rlim_parse_opt,
+   .print_qopt = rlim_print_opt,
+};
+
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25] qdisc: new rate limiter

2007-12-07 Thread Stephen Hemminger

This is a time based rate limiter for use in network testing. When doing
network tests it is often useful to test at reduced bandwidths. The existing
Token Bucket Filter provides rate control, but causes bursty traffic that
can cause different performance than real world. Another alternative is
the PSPacer, but it depends on pause frames which may also cause issues.

The qdisc depends on high resolution timers and clocks, so it will probably
use more CPU than others making it a poor choice for use when doing traffic
shaping for QOS. 

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/include/linux/pkt_sched.h 2007-10-30 09:18:29.0 -0700
+++ b/include/linux/pkt_sched.h 2007-12-07 13:37:50.0 -0800
@@ -475,4 +475,10 @@ struct tc_netem_corrupt
 
 #define NETEM_DIST_SCALE   8192
 
+struct tc_rlim_qopt
+{
+   __u32   limit;  /* fifo limit (packets) */
+   __u32   rate;   /* bits per sec */
+};
+
 #endif
--- a/net/sched/Kconfig 2007-12-07 13:37:25.0 -0800
+++ b/net/sched/Kconfig 2007-12-07 13:37:50.0 -0800
@@ -196,6 +196,19 @@ config NET_SCH_NETEM
 
  If unsure, say N.
 
+config NET_SCH_RLIM
+   tristate Network Rate Limiter
+   ---help---
+ Say Y here if you want to use timer based network rate limiter
+ algorithm.
+
+ See the top of file:net/sched/sch_rlim.c for more details.
+
+ To compile this code as a module, choose M here: the
+ module will be called sch_rlim.
+
+ If unsure, say N.
+
 config NET_SCH_INGRESS
tristate Ingress Qdisc
---help---
--- a/net/sched/Makefile2007-10-30 09:18:30.0 -0700
+++ b/net/sched/Makefile2007-12-07 13:37:50.0 -0800
@@ -28,6 +28,7 @@ obj-$(CONFIG_NET_SCH_TEQL)+= sch_teql.o
 obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o
 obj-$(CONFIG_NET_SCH_ATM)  += sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)+= sch_netem.o
+obj-$(CONFIG_NET_SCH_RLIM) += sch_rlim.o
 obj-$(CONFIG_NET_CLS_U32)  += cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
 obj-$(CONFIG_NET_CLS_FW)   += cls_fw.o
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/net/sched/sch_rlim.c  2007-12-07 16:22:10.0 -0800
@@ -0,0 +1,350 @@
+/*
+ * net/sched/sch_rate.cTimer based rate control
+ *
+ * Copyright (c) 2007 Stephen Hemminger [EMAIL PROTECTED]
+ *
+ */
+
+#include linux/module.h
+#include linux/types.h
+#include linux/kernel.h
+#include linux/string.h
+#include linux/errno.h
+#include linux/skbuff.h
+#include net/netlink.h
+#include net/pkt_sched.h
+#include asm/div64.h
+
+/* Simple Rate control
+
+   Algorthim used in NISTnet and others.
+   Logically similar to Token Bucket, but more real time and less lumpy.
+
+   A packet is not allowed to be dequeued until a after the deadline.
+   Each packet dequeued increases the deadline by rate * size.
+
+   If qdisc throttles, it starts a timer, which will wake it up
+   when it is ready to transmit. This scheduler works much better
+   if high resolution timers are available.
+
+   Like classful TBF, limit is just kept for backwards compatibility.
+   It is passed to the default pfifo qdisc - if the inner qdisc is
+   changed the limit is not effective anymore.
+
+*/
+
+/* Use scaled math to get 1/64 ns resolution */
+#define NSEC_SCALE 6
+
+struct rlim_sched_data {
+   ktime_t next_send;  /* next scheduled departure */
+   u64 cost;   /* nsec/byte * 64 */
+   u32 limit;  /* upper bound on fifo (packets) */
+
+   struct Qdisc *qdisc;/* Inner qdisc, default - bfifo queue */
+   struct qdisc_watchdog watchdog;
+};
+
+static int rlim_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+   int ret;
+
+   ret = q-qdisc-enqueue(skb, q-qdisc);
+   if (ret)
+   sch-qstats.drops++;
+   else {
+   sch-q.qlen++;
+   sch-bstats.bytes += skb-len;
+   sch-bstats.packets++;
+   }
+
+   return ret;
+}
+
+
+static u64 pkt_time(const struct rlim_sched_data *q,
+   const struct sk_buff *skb)
+{
+   return (q-cost * skb-len)  NSEC_SCALE;
+}
+
+static unsigned int rlim_drop(struct Qdisc *sch)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+   unsigned int len = 0;
+
+   if (q-qdisc-ops-drop  (len = q-qdisc-ops-drop(q-qdisc)) != 0) {
+   sch-q.qlen--;
+   sch-qstats.drops++;
+   }
+
+   return len;
+}
+
+static struct sk_buff *rlim_dequeue(struct Qdisc *sch)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+   struct sk_buff *skb;
+   ktime_t now = ktime_get();
+
+   /* if haven't reached the correct time slot, start timer */
+   if (now.tv64  q-next_send.tv64) {
+   sch-flags |= TCQ_F_THROTTLED;
+   hrtimer_start(q-watchdog.timer,

Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters

2007-12-07 Thread Divy Le Ray


Jeff Garzik wrote:


Divy Le Ray wrote:
 Jeff Garzik wrote:
 Divy Le Ray wrote:
 From: Divy Le Ray [EMAIL PROTECTED]

 Add parity initialization for T3C adapters.

 Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
 ---

  drivers/net/cxgb3/adapter.h   |1
  drivers/net/cxgb3/cxgb3_main.c|   82 
  drivers/net/cxgb3/cxgb3_offload.c |   15 ++
  drivers/net/cxgb3/regs.h  |  248
 +
  drivers/net/cxgb3/sge.c   |   24 +++-
  drivers/net/cxgb3/t3_hw.c |  131 +---
  6 files changed, 472 insertions(+), 29 deletions(-)

 dropped patches 2-3, did not apply



 Hi Jeff,

 I noticed that you applied the first one of this 3 patches series to the
 #upstream-fixes branch.
 These patches are intended to the #upstream (2.6.25) branch, as they are
 built on top of the
 last 10 patches committed - 9 from me, and the white space clean up
 (thanks!).
 May be this is the reason why they did not apply.

Ah... you need to tell me these things.  I looked for a kernel version
in your messages but did not see one.

I had put it in the introduction mail, I should have added the kernel 
version in the patch titles.

I'll do from now on.



Does the patch #1 need to be reverted for 2.6.24?


No, it can be applied to 2.6.24.
The 2 next patches seem to apply cleanly on #upstream when patch #1 is 
popped out the patch stack.




 On this topic, I have a question: how do I get to see all the netdev-2.6
 branches ?


git fetch -f $NETDEV_URL upstream:upstream

copies the latest upstream branch from netdev-2.6.git, and stores it as
your local upstream branch.

You may do the same for #upstream-fixes too.



That made it.
Thanks a lot!

Cheers,
Divy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters

2007-12-07 Thread Divy Le Ray


Jeff Garzik wrote:

Divy Le Ray wrote:

From: Divy Le Ray [EMAIL PROTECTED]

Add parity initialization for T3C adapters.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/adapter.h   |1 
 drivers/net/cxgb3/cxgb3_main.c|   82 

 drivers/net/cxgb3/cxgb3_offload.c |   15 ++
 drivers/net/cxgb3/regs.h  |  248 
+

 drivers/net/cxgb3/sge.c   |   24 +++-
 drivers/net/cxgb3/t3_hw.c |  131 +---
 6 files changed, 472 insertions(+), 29 deletions(-)


dropped patches 2-3, did not apply




Hi Jeff,

I noticed that you applied the first one of this 3 patches series to the 
#upstream-fixes branch.
These patches are intended to the #upstream (2.6.25) branch, as they are 
built on top of the
last 10 patches committed - 9 from me, and the white space clean up 
(thanks!).

May be this is the reason why they did not apply.

On this topic, I have a question: how do I get to see all the netdev-2.6 
branches ?
After cloning a free  netdev-2.6 tree, 'git branch' shows only the 
master branch:


bash-3.1$ git --version
git version 1.5.3.rc4.29.g74276-dirty
-bash-3.1$ stg clone 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
netdev-2.6-fresh
Cloning 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
into netdev-2.6-fresh...

Initialized empty Git repository in /opt/sources/netdev-2.6-fresh/.git/
remote: Generating pack...
remote: Counting objects: 620879
Done counting 633562 objects.
remote: Deltifying 633562 objects...
remote:  100% (633562/633562) done
Indexing 633562 objects...
remote: Total 633562 (delta 517968), reused 594305 (delta 478716)
100% (633562/633562) done
Resolving 517968 deltas...
100% (517968/517968) done
Checking 23058 files out...
100% (23058/23058) done
done
-bash-3.1$ cd netdev-2.6-fresh/
-bash-3.1$ git branch
* master

Cheers,
Divy


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] s2io: fix inconsistent hardware VLAN tagging during driver init

2007-12-07 Thread Jeff Garzik


Ramkrishna Vepa wrote:

Jeff,
This patch looks good. Please accept.

Ram

-Original Message-
From: Andy Gospodarek [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 06, 2007 11:57 AM
To: netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]; Rastapur Santosh; Sivakumar Subramani;
Sreenivasa Honnur
Subject: [PATCH] s2io: fix inconsistent hardware VLAN tagging during
driver init



queued for my next patch run...



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] XFRM: RFC4303 compliant auditing

2007-12-07 Thread Paul Moore

On Friday 07 December 2007 3:52:31 pm Eric Paris wrote:
 On Fri, 2007-12-07 at 14:57 -0500, Paul Moore wrote:
  NOTE: This really is an RFC patch, it compiles and boots but that is
  pretty much all I can promise at this point.  I'm posting this patch to
  gather feedback from the audit crowd about the continued overloading of
  the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create a
  new audit message type?  Of course any other comments people may have are
  always welcome.

 I'm all for continuing to use it, but I feel like the op= strings should
 probably all get collected up in one place to ease maintenance in the
 future, might not matter but it's nice to be able to look only on place
 in the code to find all of the possible op=

Agreed.  I punted on doing anything here for two main reasons:

1. It makes sense to do this in the xfrm_audit_start() function which I 
couldn't use here without some overhaul ...
2. ... I didn't want to overhaul anything if I was going to end up using 
separate message types.

If we decide to go with a single audit message type (kinda sounds like it) 
I'll fix this up in the next version.

 The one advantage to multiple messages is the ability to exclude and not
 audit certain things.  How often will these extra messages actually pop
 out of a system?  Enough that people would likely still care about some
 of them but decide they don't want others?  I don't know this stuff, so
 tell me how often would any of these show up?

Bingo, this is the whole reason why I was wondering about a different message 
type.  Currently only SAD and SPD changes are audited and only because they 
effect the security labels that are assigned to packets as they are 
imported/exported out of the system (look at the LSPP requirements for 
auditing the import and export of data).  These new audit messages apply to 
individual packets and/or a particular SA and have nothing to do with 
security labels, rather they indicate error conditions found during normal 
IPsec processing.  It would be difficult to think of all of the particular 
cases where these error conditions but in general I would say that these 
audit messages should not be common.

The only reason for creating a separate audit message type for these packet/SA 
messages would be to meet a RFC requirement that states that the 
implementation MUST allow the administrator to enable and disable ESP 
auditing.  Now, we can probably say we fulfill that requirement regardless, 
but more message types allow us greater granularity and flexibility ...

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] s2io: fix inconsistent hardware VLAN tagging during driver init

2007-12-07 Thread Ramkrishna Vepa

Jeff,
This patch looks good. Please accept.

Ram
 -Original Message-
 From: Andy Gospodarek [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 06, 2007 11:57 AM
 To: netdev@vger.kernel.org
 Cc: [EMAIL PROTECTED]; Rastapur Santosh; Sivakumar Subramani;
 Sreenivasa Honnur
 Subject: [PATCH] s2io: fix inconsistent hardware VLAN tagging during
 driver init

 The s2io driver keeps a local variable around (vlan_strip_flag) to
keep
 track of the current state of the hardware and whether or not it will
 strip VLAN tags on incoming packets.  It seems as though the hardware
 default is to strip them, but that variable is not set correctly
during
 initialization if the default setup is used.  This check ensures
 vlan_strip_flag and the hardware setting are in sync.

 These variables were introduced by this patch:

 commit 926930b202d56c3dfb6aea0a0c6bfba2b87a8c03
 Author: Sivakumar Subramani [EMAIL PROTECTED]
 Date:   Sat Feb 24 01:59:39 2007 -0500

 so this problem hasn't been around forever.

 Recent patches from Ramkrishna Vepa [EMAIL PROTECTED] removed
this
 variable and would have worked around the problem, but they were not
 accepted.

 Signed-off-by: Andy Gospodarek [EMAIL PROTECTED]

 ---

  s2io.c |5 +
  1 files changed, 5 insertions(+)

 diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
 index 8b9f0ea..08c08de 100644
 --- a/drivers/net/s2io.c
 +++ b/drivers/net/s2io.c
 @@ -2151,6 +2151,11 @@ static int start_nic(struct s2io_nic *nic)
   val64 = ~RX_PA_CFG_STRIP_VLAN_TAG;
   writeq(val64, bar0-rx_pa_cfg);
   vlan_strip_flag = 0;
 + } else {
 + val64 = readq(bar0-rx_pa_cfg);
 + val64 |= RX_PA_CFG_STRIP_VLAN_TAG;
 + writeq(val64, bar0-rx_pa_cfg);
 + vlan_strip_flag = 1;
   }

   /*
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] XFRM: RFC4303 compliant auditing

2007-12-07 Thread Eric Paris


On Fri, 2007-12-07 at 14:57 -0500, Paul Moore wrote:
 NOTE: This really is an RFC patch, it compiles and boots but that is pretty
   much all I can promise at this point.  I'm posting this patch to gather
   feedback from the audit crowd about the continued overloading of
   the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create
   a new audit message type?  Of course any other comments people may have
   are always welcome.

I'm all for continuing to use it, but I feel like the op= strings should
probably all get collected up in one place to ease maintenance in the
future, might not matter but it's nice to be able to look only on place
in the code to find all of the possible op=

The one advantage to multiple messages is the ability to exclude and not
audit certain things.  How often will these extra messages actually pop
out of a system?  Enough that people would likely still care about some
of them but decide they don't want others?  I don't know this stuff, so
tell me how often would any of these show up?

-Eric

 
 This patch adds a number of new IPsec audit events to meet the auditing
 requirements of RFC4303.  This includes audit hooks for the following events:
 
  * Could not find a valid SA [sections 2.1, 3.4.2]
. xfrm_audit_state_notfound()
. xfrm_audit_state_notfound_simple()
 
  * Sequence number overflow [section 3.3.3]
. xfrm_audit_state_replay_overflow()
 
  * Replayed packet [section 3.4.3]
. xfrm_audit_state_replay()
 
  * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
. xfrm_audit_state_icvfail()
 
 While RFC4304 deals only with ESP most of the changes in this patch apply to
 IPsec in general, i.e. both AH and ESP.  The one case, integrity check
 failure, where ESP specific code had to be modified the same was done to the
 AH code for the sake of consistency.
 ---
 
  include/net/xfrm.h |   14 
  net/ipv4/ah4.c |1 
  net/ipv4/esp4.c|1 
  net/ipv4/xfrm4_input.c |6 +-
  net/ipv6/ah6.c |1 
  net/ipv6/esp6.c|1 
  net/ipv6/xfrm6_input.c |   10 ++-
  net/xfrm/xfrm_output.c |2 +
  net/xfrm/xfrm_state.c  |  155 
 ++--
  9 files changed, 177 insertions(+), 14 deletions(-)
 
 diff --git a/include/net/xfrm.h b/include/net/xfrm.h
 index c02e230..85ce8c1 100644
 --- a/include/net/xfrm.h
 +++ b/include/net/xfrm.h
 @@ -492,11 +492,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, 
 int result,
u32 auid, u32 secid);
  extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
   u32 auid, u32 secid);
 +extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
 +  struct sk_buff *skb);
 +extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 
 family);
 +extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family,
 +   __be32 net_spi, __be32 net_seq);
 +extern void xfrm_audit_state_icvfail(struct xfrm_state *x,
 +  struct sk_buff *skb, u8 proto);
  #else
  #define xfrm_audit_policy_add(x, r, a, s)do { ; } while (0)
  #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0)
  #define xfrm_audit_state_add(x, r, a, s) do { ; } while (0)
  #define xfrm_audit_state_delete(x, r, a, s)  do { ; } while (0)
 +#define xfrm_audit_state_replay_overflow(x, s)   do { ; } while (0)
 +#define xfrm_audit_state_notfound_simple(s, f)   do { ; } while (0)
 +#define xfrm_audit_state_notfound(s, f, sp, sq)  do { ; } while (0)
 +#define xfrm_audit_state_icvfail(x, s, p)do { ; } while (0)
  #endif /* CONFIG_AUDITSYSCALL */
  
  static inline void xfrm_pol_hold(struct xfrm_policy *policy)
 @@ -1045,7 +1056,8 @@ extern int xfrm_state_delete(struct xfrm_state *x);
  extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info);
  extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si);
  extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si);
 -extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq);
 +extern int xfrm_replay_check(struct xfrm_state *x,
 +  struct sk_buff *skb, __be32 seq);
  extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq);
  extern void xfrm_replay_notify(struct xfrm_state *x, int event);
  extern int xfrm_state_mtu(struct xfrm_state *x, int mtu);
 diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
 index 5fc346d..8eb19c9 100644
 --- a/net/ipv4/ah4.c
 +++ b/net/ipv4/ah4.c
 @@ -180,6 +180,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff 
 *skb)
   err = -EINVAL;
   if (memcmp(ahp-work_icv, auth_data, ahp-icv_trunc_len)) {
   x-stats.integrity_failed++;
 + xfrm_audit_state_icvfail(x, skb, IPPROTO_AH);
   goto out;
   }
   }
 diff --git

[git patches] net driver fixes

2007-12-07 Thread Jeff Garzik

Nothing remarkable.  Mainly bonding fixes and bringing ibm_newemac up to
snuff.


Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 Documentation/networking/bonding.txt |   29 -
 arch/powerpc/boot/dts/sequoia.dts|5 ++
 drivers/net/Kconfig  |1 +
 drivers/net/bonding/bond_main.c  |  116 +-
 drivers/net/bonding/bond_sysfs.c |   94 +---
 drivers/net/bonding/bonding.h|4 +-
 drivers/net/cxgb3/regs.h |   27 -
 drivers/net/cxgb3/t3_hw.c|6 +-
 drivers/net/cxgb3/xgmac.c|   44 +-
 drivers/net/e100.c   |6 +-
 drivers/net/e1000/e1000_ethtool.c|2 +-
 drivers/net/e1000e/ethtool.c |2 +-
 drivers/net/ibm_newemac/core.c   |   56 +++-
 drivers/net/ibm_newemac/core.h   |   11 +++-
 drivers/net/ibm_newemac/debug.c  |5 ++
 drivers/net/ibm_newemac/debug.h  |5 ++
 drivers/net/ibm_newemac/emac.h   |5 ++
 drivers/net/ibm_newemac/mal.c|5 ++
 drivers/net/ibm_newemac/mal.h|5 ++
 drivers/net/ibm_newemac/phy.c|   81 +++
 drivers/net/ibm_newemac/phy.h|5 ++
 drivers/net/ibm_newemac/rgmii.c  |   25 +---
 drivers/net/ibm_newemac/rgmii.h  |   10 +++-
 drivers/net/ibm_newemac/tah.c|8 ++-
 drivers/net/ibm_newemac/tah.h|5 ++
 drivers/net/ibm_newemac/zmii.c   |9 +++-
 drivers/net/ibm_newemac/zmii.h   |5 ++
 drivers/net/s2io-regs.h  |1 +
 drivers/net/s2io.c   |   16 +-
 include/linux/if_bonding.h   |3 +-
 30 files changed, 423 insertions(+), 173 deletions(-)

Auke Kok (1):
  e100: cleanup unneeded math

Benjamin Herrenschmidt (5):
  ibm_newemac: Fix ZMII refcounting bug
  ibm_newemac: Workaround reset timeout when no link
  ibm_newemac: Cleanup/Fix RGMII MDIO support detection
  ibm_newemac: Cleanup/fix support for STACR register variants
  ibm_newemac: Update file headers copyright notices

David Sterba (1):
  bonding: Fix time comparison

Divy Le Ray (1):
  cxgb3 - T3C support update

Eliezer Tamir (1):
  make bnx2x select ZLIB_INFLATE

Hugh Blemings (1):
  ibm_newemac: Skip EMACs that are marked unused by the firmware

Jay Vosburgh (2):
  bonding: Add new layer2+3 hash for xor/802.3ad modes
  bonding: Fix race at module unload

Roel Kluin (1):
  e1000: fix memcpy in e1000_get_strings

Sreenivasa Honnur (1):
  S2io: Check for register initialization completion before accesing device 
registers

Stefan Roese (2):
  ibm_newemac: Add BCM5248 and Marvell 88E PHY support
  ibm_newemac: Add ET1011c PHY support

Valentine Barshak (3):
  ibm_newemac: Correct opb_bus_freq value
  ibm_newemac: Fix typo reading TAH channel info
  ibm_newemac: Call dev_set_drvdata() before tah_reset()

Wagner Ferenc (5):
  bonding: Remove trailing NULs from sysfs interface.
  bonding: Return nothing for not applicable values
  bonding: Purely cosmetic: rename a local variable
  bonding: Coding style: break line after the if condition
  bonding: Allow setting and querying xmit policy regardless of mode

diff --git a/Documentation/networking/bonding.txt 
b/Documentation/networking/bonding.txt
index 1134062..6cc30e0 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -554,6 +554,30 @@ xmit_hash_policy
 
This algorithm is 802.3ad compliant.
 
+   layer2+3
+
+   This policy uses a combination of layer2 and layer3
+   protocol information to generate the hash.
+
+   Uses XOR of hardware MAC addresses and IP addresses to
+   generate the hash.  The formula is
+
+   (((source IP XOR dest IP) AND 0x) XOR
+   ( source MAC XOR destination MAC ))
+   modulo slave count
+
+   This algorithm will place all traffic to a particular
+   network peer on the same slave.  For non-IP traffic,
+   the formula is the same as for the layer2 transmit
+   hash policy.
+
+   This policy is intended to provide a more balanced
+   distribution of traffic than layer2 alone, especially
+   in environments where a layer3 gateway device is
+   required to reach most destinations.
+
+   This algorithm is 802.3ad complient.
+
layer3+4
 
This policy uses upper layer protocol information,
@@ -589,8 +613,9 @@ xmit_hash_policy
or may not tolerate this noncompliance.
 
The default value is layer2.  This option was added in bonding
-version 2.6.3.  In earlier

Re: [PATCH 2.6.24 1/1]S2io: Check for register initialization completion before accesing device registers

2007-12-07 Thread Jeff Garzik


Sreenivasa Honnur wrote:

- Making sure register initialisation is complete before proceeding further.
  The driver must wait until initialization is complete before attempting to 
  access any other device registers.


Signed-off-by: Surjit Reang [EMAIL PROTECTED]
Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED]


applied #upstream-fixes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/11] ibm_newemac: Add BCM5248 and Marvell 88E1111 PHY support

2007-12-07 Thread Jeff Garzik


Benjamin Herrenschmidt wrote:

From: Stefan Roese [EMAIL PROTECTED]

This patch adds BCM5248 and Marvell 88E PHY support to NEW EMAC driver.
These PHY chips are used on PowerPC 440EPx boards.
The PHY code is based on the previous work by Stefan Roese [EMAIL PROTECTED]

Signed-off-by: Stefan Roese [EMAIL PROTECTED]
Signed-off-by: Valentine Barshak [EMAIL PROTECTED]
Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

 drivers/net/ibm_newemac/phy.c |   39 +++
 1 file changed, 39 insertions(+)


applied 1-11 #upstream-fixes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] e1000: fix memcpy in e1000_get_strings

2007-12-07 Thread Jeff Garzik


Auke Kok wrote:

From: Roel Kluin [EMAIL PROTECTED]

drivers/net/e1000/e1000_ethtool.c:113:
#define E1000_TEST_LEN sizeof(e1000_gstrings_test) / ETH_GSTRING_LEN

drivers/net/e1000e/ethtool.c:106:
#define E1000_TEST_LEN sizeof(e1000_gstrings_test) / ETH_GSTRING_LEN

E1000_TEST_LEN*ETH_GSTRING_LEN will expand to
sizeof(e1000_gstrings_test) / (ETH_GSTRING_LEN * ETH_GSTRING_LEN)

A lack of parentheses around defines causes unexpected results due to
operator precedences.

Signed-off-by: Roel Kluin [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000/e1000_ethtool.c |2 +-
 drivers/net/e1000e/ethtool.c  |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


applied 1-2 to #upstream-fixes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters

2007-12-07 Thread Jeff Garzik


Divy Le Ray wrote:

From: Divy Le Ray [EMAIL PROTECTED]

Add parity initialization for T3C adapters.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/adapter.h   |1 
 drivers/net/cxgb3/cxgb3_main.c|   82 

 drivers/net/cxgb3/cxgb3_offload.c |   15 ++
 drivers/net/cxgb3/regs.h  |  248 +
 drivers/net/cxgb3/sge.c   |   24 +++-
 drivers/net/cxgb3/t3_hw.c |  131 +---
 6 files changed, 472 insertions(+), 29 deletions(-)


dropped patches 2-3, did not apply


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/8] bonding: Remove trailing NULs from sysfs interface.

2007-12-07 Thread Jeff Garzik

Jay Vosburgh wrote:

From: Wagner Ferenc [EMAIL PROTECTED]

From: Wagner Ferenc [EMAIL PROTECTED]

Also remove trailing spaces from multivalued files.

This fixes output like for example:

$ od -c /sys/class/net/bond0/bonding/slaves
000   e   t   h   -   l   e   f   t   e   t   h   -   r   i   g
020   h   t  \n  \0
025

It mostly entails deleting '+1'-s after sprintf() calls: the return value
of sprintf is the number of characters printed, without the closing NUL,
ie. exactly what the sysfs interface requires.  The three multivalue
cases are different, because they also have to swallow back a trailing
space.

Signed-off-by: Ferenc Wagner [EMAIL PROTECTED]
Acked-by: Jay Vosburgh [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_sysfs.c |   66 +
 1 files changed, 30 insertions(+), 36 deletions(-)

applied 1-8 to #upstream-fixes

Your script is duplicating the From:  line twice

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] XFRM: RFC4303 compliant auditing

2007-12-07 Thread Paul Moore

NOTE: This really is an RFC patch, it compiles and boots but that is pretty
  much all I can promise at this point.  I'm posting this patch to gather
  feedback from the audit crowd about the continued overloading of
  the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create
  a new audit message type?  Of course any other comments people may have
  are always welcome.

This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.
---

 include/net/xfrm.h |   14 
 net/ipv4/ah4.c |1 
 net/ipv4/esp4.c|1 
 net/ipv4/xfrm4_input.c |6 +-
 net/ipv6/ah6.c |1 
 net/ipv6/esp6.c|1 
 net/ipv6/xfrm6_input.c |   10 ++-
 net/xfrm/xfrm_output.c |2 +
 net/xfrm/xfrm_state.c  |  155 ++--
 9 files changed, 177 insertions(+), 14 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index c02e230..85ce8c1 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -492,11 +492,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, 
int result,
 u32 auid, u32 secid);
 extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
u32 auid, u32 secid);
+extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
+struct sk_buff *skb);
+extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family);
+extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family,
+ __be32 net_spi, __be32 net_seq);
+extern void xfrm_audit_state_icvfail(struct xfrm_state *x,
+struct sk_buff *skb, u8 proto);
 #else
 #define xfrm_audit_policy_add(x, r, a, s)  do { ; } while (0)
 #define xfrm_audit_policy_delete(x, r, a, s)   do { ; } while (0)
 #define xfrm_audit_state_add(x, r, a, s)   do { ; } while (0)
 #define xfrm_audit_state_delete(x, r, a, s)do { ; } while (0)
+#define xfrm_audit_state_replay_overflow(x, s) do { ; } while (0)
+#define xfrm_audit_state_notfound_simple(s, f) do { ; } while (0)
+#define xfrm_audit_state_notfound(s, f, sp, sq)do { ; } while (0)
+#define xfrm_audit_state_icvfail(x, s, p)  do { ; } while (0)
 #endif /* CONFIG_AUDITSYSCALL */
 
 static inline void xfrm_pol_hold(struct xfrm_policy *policy)
@@ -1045,7 +1056,8 @@ extern int xfrm_state_delete(struct xfrm_state *x);
 extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info);
 extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si);
 extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si);
-extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq);
+extern int xfrm_replay_check(struct xfrm_state *x,
+struct sk_buff *skb, __be32 seq);
 extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq);
 extern void xfrm_replay_notify(struct xfrm_state *x, int event);
 extern int xfrm_state_mtu(struct xfrm_state *x, int mtu);
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 5fc346d..8eb19c9 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -180,6 +180,7 @@ static int ah_input(struct xfrm_state *x, struct sk_buff 
*skb)
err = -EINVAL;
if (memcmp(ahp-work_icv, auth_data, ahp-icv_trunc_len)) {
x-stats.integrity_failed++;
+   xfrm_audit_state_icvfail(x, skb, IPPROTO_AH);
goto out;
}
}
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index c31bccb..00ec285 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -183,6 +183,7 @@ static int esp_input(struct xfrm_state *x, struct sk_buff 
*skb)
 
if (unlikely(memcmp(esp-auth.work_icv, sum, alen))) {
x-stats.integrity_failed++;
+   xfrm_audit_state_icvfail(x, skb, IPPROTO_ESP);
goto out;
}
}
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 5e95c8a..6d7be5e 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -56,8 +56,10 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 
spi,
 
x =

[PATCH net-2.6.25] Remove unused devconf macros

2007-12-07 Thread Pavel Emelyanov

The SNMP_INC_STATS_OFFSET_BH is used only by ICMP6_INC_STATS_OFFSET_BH.
The ICMP6_INC_STATS_OFFSET_BH is unused.

Can we drop them?

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index a84f3f6..38df94b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -143,14 +143,6 @@ DECLARE_SNMP_STAT(struct icmpv6msg_mib, 
icmpv6msg_statistics);
 #define ICMP6_INC_STATS_BH(idev, field)_DEVINC(icmpv6, _BH, idev, 
field)
 #define ICMP6_INC_STATS_USER(idev, field) _DEVINC(icmpv6, _USER, idev, field)
 
-#define ICMP6_INC_STATS_OFFSET_BH(idev, field, offset) ({  
\
-   struct inet6_dev *_idev = idev; 
\
-   __typeof__(offset) _offset = (offset);  
\
-   if (likely(_idev != NULL))  
\
-   SNMP_INC_STATS_OFFSET_BH(_idev-stats.icmpv6, field, _offset);  
\
-   SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset);
\
-})
-
 #define ICMP6MSGOUT_INC_STATS(idev, field) \
_DEVINC(icmpv6msg, , idev, field +256)
 #define ICMP6MSGOUT_INC_STATS_BH(idev, field) \
diff --git a/include/net/snmp.h b/include/net/snmp.h
index ea206bf..9c5793d 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -134,8 +134,6 @@ struct linux_mib {
 
 #define SNMP_INC_STATS_BH(mib, field)  \
(per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field]++)
-#define SNMP_INC_STATS_OFFSET_BH(mib, field, offset)   \
-   (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field + (offset)]++)
 #define SNMP_INC_STATS_USER(mib, field) \
(per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++)
 #define SNMP_INC_STATS(mib, field) \
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] XFRM: assorted IPsec fixups

2007-12-07 Thread Paul Moore

This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

 * Use the 'audit_enabled' variable already in include/linux/audit.h
   Removed the need for extern declarations local to each XFRM audit fuction

 * Convert 'sid' to 'secid'
   The 'sid' name is specific to SELinux, 'secid' is the common naming
   convention used by the kernel when refering to tokenized LSM labels

 * Convert address display to use standard NIP* macros
   Similar to what was recently done with the SPD audit code, this also
   includes the removal of some unnecessary memcpy() calls

 * Move common code to xfrm_audit_common_stateinfo()
   Code consolidation from the less is more book on software development

 * Convert the SPI in audit records to host byte order
   The current SPI values in the audit record are being displayed in
   network byte order, probably not what was intended

 * Proper spacing around commas in function arguments
   Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore [EMAIL PROTECTED]
---

 include/linux/xfrm.h|2 +
 include/net/xfrm.h  |   18 ++--
 net/xfrm/xfrm_policy.c  |   15 +-
 net/xfrm/xfrm_state.c   |   69 +--
 security/selinux/xfrm.c |   20 +++---
 5 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
index b58adc5..f75a337 100644
--- a/include/linux/xfrm.h
+++ b/include/linux/xfrm.h
@@ -31,7 +31,7 @@ struct xfrm_sec_ctx {
__u8ctx_doi;
__u8ctx_alg;
__u16   ctx_len;
-   __u32   ctx_sid;
+   __u32   ctx_secid;
charctx_str[0];
 };
 
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 58dfa82..c02e230 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -462,7 +462,7 @@ struct xfrm_audit
 };
 
 #ifdef CONFIG_AUDITSYSCALL
-static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid)
+static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid)
 {
struct audit_buffer *audit_buf = NULL;
char *secctx;
@@ -475,8 +475,8 @@ static inline struct audit_buffer *xfrm_audit_start(u32 
auid, u32 sid)
 
audit_log_format(audit_buf, auid=%u, auid);
 
-   if (sid != 0 
-   security_secid_to_secctx(sid, secctx, secctx_len) == 0) {
+   if (secid != 0 
+   security_secid_to_secctx(secid, secctx, secctx_len) == 0) {
audit_log_format(audit_buf,  subj=%s, secctx);
security_release_secctx(secctx, secctx_len);
} else
@@ -485,13 +485,13 @@ static inline struct audit_buffer *xfrm_audit_start(u32 
auid, u32 sid)
 }
 
 extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
- u32 auid, u32 sid);
+ u32 auid, u32 secid);
 extern void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result,
- u32 auid, u32 sid);
+ u32 auid, u32 secid);
 extern void xfrm_audit_state_add(struct xfrm_state *x, int result,
-u32 auid, u32 sid);
+u32 auid, u32 secid);
 extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
-   u32 auid, u32 sid);
+   u32 auid, u32 secid);
 #else
 #define xfrm_audit_policy_add(x, r, a, s)  do { ; } while (0)
 #define xfrm_audit_policy_delete(x, r, a, s)   do { ; } while (0)
@@ -621,13 +621,13 @@ extern int xfrm_selector_match(struct xfrm_selector *sel, 
struct flowi *fl,
 
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
 /* If neither has a context -- match
- * Otherwise, both must have a context and the sids, doi, alg must match
+ * Otherwise, both must have a context and the secids, doi, alg must match
  */
 static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct 
xfrm_sec_ctx *s2)
 {
return ((!s1  !s2) ||
(s1  s2 
-(s1-ctx_sid == s2-ctx_sid) 
+(s1-ctx_secid == s2-ctx_secid) 
 (s1-ctx_doi == s2-ctx_doi) 
 (s1-ctx_alg == s2-ctx_alg)));
 }
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b702bd8..75f25c4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -23,6 +23,7 @@
 #include linux/netfilter.h
 #include linux/module.h
 #include linux/cache.h
+#include linux/audit.h
 #include net/xfrm.h
 #include net/ip.h
 
@@ -2150,15 +2151,14 @@ static inline void xfrm_audit_common_policyinfo(struct 
xfrm_policy *xp,
}
 }
 
-void
-xfrm_audit_policy_add(struct xfrm_policy *xp, int result, u32 auid, u32 sid)
+void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
+  u32 auid, u32 secid)
 {
struct audit_buffer *audit_buf;
-   extern int audit_enabled;
 
if (audit_enabled == 0)

Re: TCP event tracking via netlink...

2007-12-07 Thread Ilpo Järvinen

On Thu, 6 Dec 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET)

  On Wed, 5 Dec 2007, David Miller wrote:

   I assume you're using something like carefully crafted printk's,
   kprobes, or even ad-hoc statistic counters.  That's what I used to do
   :-)

  No, that's not at all what I do :-). I usually look time-seq graphs 
  expect for the cases when I just find things out by reading code (or
  by just thinking of it).

 Can you briefly detail what graph tools and command lines
 you are using?

I have a tool called Sealion but it's behind NDA (making it open source 
has been talked for long but I don't have idea why it hasn't realized 
yet). It's mostly tcl/tk code is, by no means nice or clean desing nor 
quality (I'll leave details why I think it's that way out of this 
discussion :-)). Produces svgs. Usually I'm have the things I need in 
the standard sent+ACK+SACKs(+win) graph it produces. The result is quite 
similar to what tcptrace+xplot produces but xplot UI is really horrible, 
IMHO.

If I have to deal with tcpdump output only, it takes considerable amount 
of time to do computations with bc to come up with the same understanding 
by just reading tcpdumps.

 The last time I did graphing to analyze things, the tools
 were hit-or-miss.

Yeah, this is definately true. Open source graphing tools I know are 
really not that astonishing :-(. I've tried to look for better tools
as well but with little success.

  Much of the info is available in tcpdump already, it's just hard to read 
  without graphing it first because there are some many overlapping things 
  to track in two-dimensional space.

  ...But yes, I have to admit that couple of problems come to my mind
  where having some variable from tcp_sock would have made the problem
  more obvious.

 The most important are the cwnd and ssthresh, which you could guess
 using graphs but it is important to know on a packet to packet
 basis why we might have sent a packet or not because this has
 rippling effects down the rest of the RTT.

Couple of points:

In order to evaluate validity of some action, one might need more than
one packet from the history.

Answer to the why we have sent a packet is rather simple (excluding RTOs): 
cwnd  packets_in_flight and data was available. No, it's not at all 
complicated. Though I might be too biased toward non-application limited 
cases which make the formula even simpler because everything is basically 
ACK clocked.

To really tell what caused changes between cwnd and/or packets_in_flight 
one usually needs some history or more fine-grained approach, once per 
packet is way too wide gap. It tells just what happened, not why, unless 
you're really familiar with the state machine and can make the right 
guess.

  Not sure what is the benefit of having distributions with it because 
  those people hardly report problems anyway to here, they're just too 
  happy with TCP performance unless we print something to their logs,
  which implies that we must setup a *_ON() condition :-(.

 That may be true, but if we could integrate the information with
 tcpdumps, we could gather internal state using tools the user
 already has available.

It would definately help if we could, but that of course depends on 
getting the reports in the first place.

 Imagine if tcpdump printed out:

 02:26:14.865805 IP $SRC  $DEST: . 11226:12686(1460) ack 0 win 108
   ss_thresh: 129 cwnd: 133 packets_out: 132

 or something like that.

How about this:

02:26:14.865805 IP $SRC  $DEST: . ack 11226 win 108 ...sack 1 {15606:18526}
17066:18526 0-S sacktag_one l0 s1 r0 f4 pc1 ...
11226:12686  clean_rtx_queue ...
11226:12686 0-L mark_head_lost l1 s1 r0 f4 pc1 ...
12686:14146 0-L mark_head_lost l2 s1 r0 f4 pc1 ...
11226:12686 L-LRe retransmit_skb l2 s1 r1 f4 pc1 ...

...would make the bug in sack processing relatively obvious (yes, it 
has an intentional flaw in it, points from find it :-))... That would
be something I'd like to have right now.

 But sometimes the algorithms are working as designed, it's just that
 they provide poor pipe utilization and CWND analysis embedded inside
 of a tcpdump would be one way to see that as well as determine the
 flaw in the algorithm.

Fair enough.

 It is untested since I didn't write the userland app yet to see that
 proper things get logged.  Basically you could run a daemon that
 writes per-connection traces into files based upon the incoming
 netlink events.  Later, using the binary pcap file and these traces,
 you can piece together traces like the above using the timestamps
 etc. to match up pcap packets to ones from the TCP logger.

 The userland tools could do analysis and print pre-cooked state diff
 logs, like this ACK raised CWND by one or whatever else you wanted
 to know.

Obviously a collection of useful userland tools seems here at least as 
important as the existance of the interface.

 It's nice that an expert

[PATCH] sky2: RX lockup fix

2007-12-07 Thread Stephen Hemminger

I'm using a Marvell 88E8062 on a custom PPC64 blade and ran into RX
lockups while validating the sky2 driver.  The receive MAC FIFO would
become stuck during testing with high traffic.  One port of the 88E8062
would lockup, while the other port remained functional.  Re-inserting
the sky2 module would not fix the problem - only a power cycle would.

I looked over Marvell's most recent sk98lin driver and it looks like
they had a workaround for the Yukon XL that the sky2 doesn't have yet.
The sk98lin driver disables the RX MAC FIFO flush feature for all
revisions of the Yukon XL.

According to skgeinit.c of the sk98lin driver, Flushing must be enabled
(needed for ASF see dev. #4.29), but the flushing mask should be
disabled (see dev. #4.115).  Nice. I implemented this same change in
the sky2 driver and verified that the RX lockup I was seeing was
resolved.

Signed-off-by: Peter Tyser [EMAIL PROTECTED]
Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
Original patch reformatted to remove line wrap.

--- a/drivers/net/sky2.c2007-12-06 09:39:12.0 -0800
+++ b/drivers/net/sky2.c2007-12-06 09:54:14.0 -0800
@@ -821,8 +821,13 @@ static void sky2_mac_init(struct sky2_hw
 
sky2_write32(hw, SK_REG(port, RX_GMF_CTRL_T), rx_reg);
 
-   /* Flush Rx MAC FIFO on any flow control or error */
-   sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), GMR_FS_ANY_ERR);
+   if (hw-chip_id == CHIP_ID_YUKON_XL) {
+   /* Hardware errata - clear flush mask */
+   sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), 0);
+   } else {
+   /* Flush Rx MAC FIFO on any flow control or error */
+   sky2_write16(hw, SK_REG(port, RX_GMF_FL_MSK), GMR_FS_ANY_ERR);
+   }
 
/* Set threshold to 0xa (64 bytes) + 1 to workaround pause bug  */
reg = RX_GMF_FL_THR_DEF + 1;
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] XFRM: RFC4303 compliant auditing

2007-12-07 Thread Joy Latten

On Fri, 2007-12-07 at 16:06 -0500, Paul Moore wrote:
 On Friday 07 December 2007 3:52:31 pm Eric Paris wrote:
  On Fri, 2007-12-07 at 14:57 -0500, Paul Moore wrote:
   NOTE: This really is an RFC patch, it compiles and boots but that is
   pretty much all I can promise at this point.  I'm posting this patch to
   gather feedback from the audit crowd about the continued overloading of
   the AUDIT_MAC_IPSEC_EVENT message type - continue to use it or create a
   new audit message type?  Of course any other comments people may have are
   always welcome.
 
  I'm all for continuing to use it, but I feel like the op= strings should
  probably all get collected up in one place to ease maintenance in the
  future, might not matter but it's nice to be able to look only on place
  in the code to find all of the possible op=
 
 Agreed.  I punted on doing anything here for two main reasons:
 
 1. It makes sense to do this in the xfrm_audit_start() function which I 
 couldn't use here without some overhaul ...
 2. ... I didn't want to overhaul anything if I was going to end up using 
 separate message types.
 
 If we decide to go with a single audit message type (kinda sounds like it) 
 I'll fix this up in the next version.
 
  The one advantage to multiple messages is the ability to exclude and not
  audit certain things.  How often will these extra messages actually pop
  out of a system?  Enough that people would likely still care about some
  of them but decide they don't want others?  I don't know this stuff, so
  tell me how often would any of these show up?
 
 Bingo, this is the whole reason why I was wondering about a different message 
 type.  Currently only SAD and SPD changes are audited and only because they 
 effect the security labels that are assigned to packets as they are 
 imported/exported out of the system (look at the LSPP requirements for 
 auditing the import and export of data).  These new audit messages apply to 
 individual packets and/or a particular SA and have nothing to do with 
 security labels, rather they indicate error conditions found during normal 
 IPsec processing.  It would be difficult to think of all of the particular 
 cases where these error conditions but in general I would say that these 
 audit messages should not be common.
 

Yes, I agree. They should not happen often. Especially compared to LSPP
requirements of auditing whenever SA or SPD entries were added or
deleted, which are common events.

 The only reason for creating a separate audit message type for these 
 packet/SA 
 messages would be to meet a RFC requirement that states that the 
 implementation MUST allow the administrator to enable and disable ESP 
 auditing.  Now, we can probably say we fulfill that requirement regardless, 
 but more message types allow us greater granularity and flexibility ...
 
Also, there is great possibility of additional messages.
This is for RFC 4303, which is ESP. There are also audit messages
listed for rfc 4301-IPsec architecture and rfc 4302-AH that may 
happen later.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] [PATCH] easier PBR for dynamic source tables (via multipath)

2007-12-07 Thread Brian S Julin


This is a first swat and not in final form.  I hope folks here will help vet
my thinking on it.

This fills in a missed niche in policy routing support.  It allows multipath
routes to select nexthop based on the source realm, inside the routing
decision step, immediately after RPF is performed.  It moves RPF before
multipath selection.

This would be for people wanting to do policy routing based on a table
injected by a dynamic routing protocol, e.g. quagga, rather than static rules.

The existing methods for achieving this effect are all a bit tacky for
various reasons:

1) iptables -m realm --realm X -j ROUTE in FORWARD,mangle
because ipt_ROUTE is not a well supported iptables target
and has started to get dropped from mainstream distros.  Maybe
for lack of maintenence, but perhaps it is intentionally
deprecated. (?)

2) tc route from ... action mirred egress redirect happens
too late in the packet processing to do much else to the
packet, like say edit the MAC addresses which remain what they
were on the original output dev.  Doing this is really an
abuse of the queueing system and involves setting up qdiscs in
weird ways when one may only want to route.

3) Userspace scripts to glue loading from a kernel routing table
to a pre-routing ipset, iptables -j MARK, then ip rule add fwmark
because the kernel then has to check the source address against two
tables rather than one, and they could get quite large.  Plus it's
hackery.

This patch is a raw proof-of-concept I put together to get
things working just enough to ensure that nothing blows up when
packets are routed this way.  As such it does a couple of distasteful
things and has a couple rough edges:

  Reuses the nh_weight field as the realm
  Does not allow normal load balancing to fully mix in
  ipv4 only
  forward only, no code for local/output route.
  probably will break ifndef CONFIG_NET_CLS_ROUTE

Were this general idea to be deemed worthy, and as long as limiting
sizeof(struct fib_nh) is not a major concern to any linux routing
application.  I could work up a more thorough/cleaner patch allowing
statistical multipath and SAD policy-routing multipath to play nicely
together.

Especially needing comments on proper multipath RPF: The mainline code
only checks the selected path and if RPF fails it does not choose a
different one.  From this I assumed it is OK to do RPF on any old nexthop,
and we just assume the user won't or can't put any PR rule in that would gum
that up.  Otherwise both the mainline code and this code would have to
RPF multiple times, defeating the goal of good performance.  (Not to
mention that could get extra confusing when you are using the source
realm to choose.)  Special attention to the spec_dest handling, what
should be (?) OK since this is forward-only.

Also to consider is what this means to multipath caching should that
make a comeback.

I've only tested this code lightly so far, just bouncing things around
to static arp maps on the same if.

After patching iproute2, just substitute weight X with byrealm X to
activate it.  Probably you want to avoid realm 0.  You should be able to
put catch-all nexthops in with weight X alongside the byrealm ones
but they do not interact statistically.  Comments on that syntax
also welcome.

Sorry about the attachments, no real MUAs available here that won't
corrupt tabs.
diff -r -U2 linux-source-2.6.23-dsc/include/linux/rtnetlink.h 
linux-source-2.6.23-dsc-dsad/include/linux/rtnetlink.h
--- linux-source-2.6.23-dsc/include/linux/rtnetlink.h   2007-10-09 
16:31:38.0 -0400
+++ linux-source-2.6.23-dsc-dsad/include/linux/rtnetlink.h  2007-12-06 
20:23:25.0 -0500
@@ -294,4 +294,5 @@
 #define RTNH_F_PERVASIVE   2   /* Do recursive gateway lookup  */
 #define RTNH_F_ONLINK  4   /* Gateway is forced on link*/
+#define RTNH_F_DSAD8   /* Dynamic PBR (weight = source realm)  
*/
 
 /* Macros to handle hexthops */
diff -r -U2 linux-source-2.6.23-dsc/include/net/ip_fib.h 
linux-source-2.6.23-dsc-dsad/include/net/ip_fib.h
--- linux-source-2.6.23-dsc/include/net/ip_fib.h2007-10-09 
16:31:38.0 -0400
+++ linux-source-2.6.23-dsc-dsad/include/net/ip_fib.h   2007-12-06 
20:23:25.0 -0500
@@ -202,5 +202,6 @@
 extern int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
   struct net_device *dev, __be32 *spec_dst, u32 
*itag);
-extern void fib_select_multipath(const struct flowi *flp, struct fib_result 
*res);
+extern void fib_select_multipath(const struct flowi *flp, 
+  struct fib_result *res, u32 itag);
 
 struct rtentry;
diff -r -U2 linux-source-2.6.23-dsc/net/ipv4/fib_semantics.c 
linux-source-2.6.23-dsc-dsad/net/ipv4/fib_semantics.c
--- linux-source-2.6.23-dsc/net/ipv4/fib_semantics.c2007-10-09 
16:31:38.0 -0400
+++ linux-source-2.6.23-dsc-dsad/net/ipv4/fib_semantics.c   2007-12-07 
14:36:10.0 -0500
@@ -1164,5 +1164,6 @@
  */

Re: [PATCH net-2.6.25] qdisc: new rate limiter

2007-12-07 Thread Patrick McHardy


Stephen Hemminger wrote:

This is a time based rate limiter for use in network testing. When doing
network tests it is often useful to test at reduced bandwidths. The existing
Token Bucket Filter provides rate control, but causes bursty traffic that
can cause different performance than real world. Another alternative is
the PSPacer, but it depends on pause frames which may also cause issues.

The qdisc depends on high resolution timers and clocks, so it will probably
use more CPU than others making it a poor choice for use when doing traffic
shaping for QOS. 


Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/include/linux/pkt_sched.h 2007-10-30 09:18:29.0 -0700
+++ b/include/linux/pkt_sched.h 2007-12-07 13:37:50.0 -0800
@@ -475,4 +475,10 @@ struct tc_netem_corrupt
 
 #define NETEM_DIST_SCALE	8192
 
+struct tc_rlim_qopt

+{
+   __u32   limit;  /* fifo limit (packets) */
+   __u32   rate;   /* bits per sec */
  


This seems a bit small, 512mbit is the maximum rate.


--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/net/sched/sch_rlim.c  2007-12-07 16:22:10.0 -0800
@@ -0,0 +1,350 @@
+static struct sk_buff *rlim_dequeue(struct Qdisc *sch)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+   struct sk_buff *skb;
+   ktime_t now = ktime_get();
+
+   /* if haven't reached the correct time slot, start timer */
+   if (now.tv64  q-next_send.tv64) {
+   sch-flags |= TCQ_F_THROTTLED;
+   hrtimer_start(q-watchdog.timer, q-next_send,
+ HRTIMER_MODE_ABS);
+   return NULL;
+   }
+
+   skb = q-qdisc-dequeue(q-qdisc);
+   if (skb) {
+   q-next_send = ktime_add_ns(now, pkt_time(q, skb));
+   sch-flags = ~TCQ_F_THROTTLED;
  


qlen is not decremented here.

+   }
+   return skb;
+}
+
+static int rlim_requeue(struct sk_buff *skb, struct Qdisc *sch)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+   int ret;
+
+   ret = q-qdisc-ops-requeue(skb, q-qdisc);
+   if (!ret) {
+   q-next_send = ktime_sub_ns(q-next_send, pkt_time(q, skb));
+   sch-q.qlen++;
+   sch-qstats.requeues++;
+   }
+
+   return ret;
+}
+
+static void rlim_reset(struct Qdisc *sch)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+
+   qdisc_reset_queue(sch);

This should reset the child.


+
+   q-next_send = ktime_get();
+   qdisc_watchdog_cancel(q-watchdog);
+}
+
+static int rlim_change(struct Qdisc *sch, struct rtattr *opt)
+{
+   struct rlim_sched_data *q = qdisc_priv(sch);
+   const struct tc_rlim_qopt *qopt;
+   int err;
+
+   if (opt == NULL || RTA_PAYLOAD(opt)  sizeof(struct tc_rlim_qopt))
+   return -EINVAL;
+
+   qopt = RTA_DATA(opt);
  


Using nested attributes would make sure we don't run into
problems with extensibility.


+   err = set_fifo_limit(q-qdisc, qopt-limit);
+   if (err)
+   return err;
+
+   q-limit = qopt-limit;
+   if (qopt-rate == 0)
+   q-cost = 0; /* unlimited */
+   else {
+   q-cost = (u64)NSEC_PER_SEC  NSEC_SCALE;
+   do_div(q-cost, qopt-rate);
+   }
+
+   pr_debug(rlim_change: rate=%u cost=%llu\n,
+qopt-rate, q-cost);
+
+   return 0;
+}
+
+static struct Qdisc_class_ops rlim_class_ops = {
  


This can be const.


+   .graft = rlim_graft,
+   .leaf  = rlim_leaf,
+   .get   = rlim_get,
+   .put   = rlim_put,
+   .change= rlim_change_class,
+   .delete= rlim_delete,
+   .walk  = rlim_walk,
+   .tcf_chain = rlim_find_tcf,
+   .dump  = rlim_dump_class,
+};


  


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-2.6.25] qdisc: new rate limiter

2007-12-07 Thread Patrick McHardy


Patrick McHardy wrote:

Stephen Hemminger wrote:


+struct tc_rlim_qopt
+{
+__u32   limit;/* fifo limit (packets) */
+__u32rate;/* bits per sec */
  


This seems a bit small, 512mbit is the maximum rate.


Its 4gbit of course, so I guess its enough :)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] cxgb3 - Parity initialization for T3C adapters

2007-12-07 Thread Jeff Garzik


Divy Le Ray wrote:

Jeff Garzik wrote:

Divy Le Ray wrote:

From: Divy Le Ray [EMAIL PROTECTED]

Add parity initialization for T3C adapters.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/cxgb3/adapter.h   |1 
 drivers/net/cxgb3/cxgb3_main.c|   82 

 drivers/net/cxgb3/cxgb3_offload.c |   15 ++
 drivers/net/cxgb3/regs.h  |  248 
+

 drivers/net/cxgb3/sge.c   |   24 +++-
 drivers/net/cxgb3/t3_hw.c |  131 +---
 6 files changed, 472 insertions(+), 29 deletions(-)


dropped patches 2-3, did not apply




Hi Jeff,

I noticed that you applied the first one of this 3 patches series to the 
#upstream-fixes branch.
These patches are intended to the #upstream (2.6.25) branch, as they are 
built on top of the
last 10 patches committed - 9 from me, and the white space clean up 
(thanks!).

May be this is the reason why they did not apply.


Ah... you need to tell me these things.  I looked for a kernel version 
in your messages but did not see one.


Does the patch #1 need to be reverted for 2.6.24?


On this topic, I have a question: how do I get to see all the netdev-2.6 
branches ?
After cloning a free  netdev-2.6 tree, 'git branch' shows only the 
master branch:


bash-3.1$ git --version
git version 1.5.3.rc4.29.g74276-dirty
-bash-3.1$ stg clone 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
netdev-2.6-fresh
Cloning 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
into netdev-2.6-fresh...

Initialized empty Git repository in /opt/sources/netdev-2.6-fresh/.git/
remote: Generating pack...
remote: Counting objects: 620879
Done counting 633562 objects.
remote: Deltifying 633562 objects...
remote:  100% (633562/633562) done
Indexing 633562 objects...
remote: Total 633562 (delta 517968), reused 594305 (delta 478716)
100% (633562/633562) done
Resolving 517968 deltas...
100% (517968/517968) done
Checking 23058 files out...
100% (23058/23058) done
done
-bash-3.1$ cd netdev-2.6-fresh/
-bash-3.1$ git branch
* master


git fetch -f $NETDEV_URL upstream:upstream

copies the latest upstream branch from netdev-2.6.git, and stores it as 
your local upstream branch.


You may do the same for #upstream-fixes too.

Jeff



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] XFRM: assorted IPsec fixups

2007-12-07 Thread Paul Moore

On Friday 07 December 2007 3:36:08 pm Eric Paris wrote:
 On Fri, 2007-12-07 at 12:11 -0500, Paul Moore wrote:
  This patch fixes a number of small but potentially troublesome things in
  the XFRM/IPsec code:
 
   * Use the 'audit_enabled' variable already in include/linux/audit.h
 Removed the need for extern declarations local to each XFRM audit
  fuction

{snip}

 although it does make me wonder why audit_log_start doesn't just check
 audit_enabled itself

/me shrugs ... I have no idea, I've just always followed the lead of what was 
already written, but now that you mention it - it doesn't make much sense.  I 
suppose at some point we can go through and change all the 'audit_enabled' 
users, but I wonder if there is some point (?performance?) to having the 
callers check?

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] XFRM: assorted IPsec fixups

2007-12-07 Thread Eric Paris


On Fri, 2007-12-07 at 12:11 -0500, Paul Moore wrote:
 This patch fixes a number of small but potentially troublesome things in the
 XFRM/IPsec code:
 
  * Use the 'audit_enabled' variable already in include/linux/audit.h
Removed the need for extern declarations local to each XFRM audit fuction
 
  * Convert 'sid' to 'secid'
The 'sid' name is specific to SELinux, 'secid' is the common naming
convention used by the kernel when refering to tokenized LSM labels
 
  * Convert address display to use standard NIP* macros
Similar to what was recently done with the SPD audit code, this also
includes the removal of some unnecessary memcpy() calls
 
  * Move common code to xfrm_audit_common_stateinfo()
Code consolidation from the less is more book on software development
 
  * Convert the SPI in audit records to host byte order
The current SPI values in the audit record are being displayed in
network byte order, probably not what was intended
 
  * Proper spacing around commas in function arguments
Minor style tweak since I was already touching the code
 
 Signed-off-by: Paul Moore [EMAIL PROTECTED]

Acked-by: Eric Paris [EMAIL PROTECTED]

although it does make me wonder why audit_log_start doesn't just check
audit_enabled itself   Anyway, this patch looks good.

 ---
 
  include/linux/xfrm.h|2 +
  include/net/xfrm.h  |   18 ++--
  net/xfrm/xfrm_policy.c  |   15 +-
  net/xfrm/xfrm_state.c   |   69 
 +--
  security/selinux/xfrm.c |   20 +++---
  5 files changed, 58 insertions(+), 66 deletions(-)
 
 diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
 index b58adc5..f75a337 100644
 --- a/include/linux/xfrm.h
 +++ b/include/linux/xfrm.h
 @@ -31,7 +31,7 @@ struct xfrm_sec_ctx {
   __u8ctx_doi;
   __u8ctx_alg;
   __u16   ctx_len;
 - __u32   ctx_sid;
 + __u32   ctx_secid;
   charctx_str[0];
  };
  
 diff --git a/include/net/xfrm.h b/include/net/xfrm.h
 index 58dfa82..c02e230 100644
 --- a/include/net/xfrm.h
 +++ b/include/net/xfrm.h
 @@ -462,7 +462,7 @@ struct xfrm_audit
  };
  
  #ifdef CONFIG_AUDITSYSCALL
 -static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid)
 +static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid)
  {
   struct audit_buffer *audit_buf = NULL;
   char *secctx;
 @@ -475,8 +475,8 @@ static inline struct audit_buffer *xfrm_audit_start(u32 
 auid, u32 sid)
  
   audit_log_format(audit_buf, auid=%u, auid);
  
 - if (sid != 0 
 - security_secid_to_secctx(sid, secctx, secctx_len) == 0) {
 + if (secid != 0 
 + security_secid_to_secctx(secid, secctx, secctx_len) == 0) {
   audit_log_format(audit_buf,  subj=%s, secctx);
   security_release_secctx(secctx, secctx_len);
   } else
 @@ -485,13 +485,13 @@ static inline struct audit_buffer *xfrm_audit_start(u32 
 auid, u32 sid)
  }
  
  extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
 -   u32 auid, u32 sid);
 +   u32 auid, u32 secid);
  extern void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result,
 -   u32 auid, u32 sid);
 +   u32 auid, u32 secid);
  extern void xfrm_audit_state_add(struct xfrm_state *x, int result,
 -  u32 auid, u32 sid);
 +  u32 auid, u32 secid);
  extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
 - u32 auid, u32 sid);
 + u32 auid, u32 secid);
  #else
  #define xfrm_audit_policy_add(x, r, a, s)do { ; } while (0)
  #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0)
 @@ -621,13 +621,13 @@ extern int xfrm_selector_match(struct xfrm_selector 
 *sel, struct flowi *fl,
  
  #ifdef CONFIG_SECURITY_NETWORK_XFRM
  /*   If neither has a context -- match
 - *   Otherwise, both must have a context and the sids, doi, alg must match
 + *   Otherwise, both must have a context and the secids, doi, alg must match
   */
  static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct 
 xfrm_sec_ctx *s2)
  {
   return ((!s1  !s2) ||
   (s1  s2 
 -  (s1-ctx_sid == s2-ctx_sid) 
 +  (s1-ctx_secid == s2-ctx_secid) 
(s1-ctx_doi == s2-ctx_doi) 
(s1-ctx_alg == s2-ctx_alg)));
  }
 diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
 index b702bd8..75f25c4 100644
 --- a/net/xfrm/xfrm_policy.c
 +++ b/net/xfrm/xfrm_policy.c
 @@ -23,6 +23,7 @@
  #include linux/netfilter.h
  #include linux/module.h
  #include linux/cache.h
 +#include linux/audit.h
  #include net/xfrm.h
  #include net/ip.h
  
 @@ -2150,15 +2151,14 @@ static inline void 
 xfrm_audit_common_policyinfo(struct xfrm_policy *xp,
   }
  }
  
 -void
 -xfrm_audit_policy_add(struct

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread Ilpo Järvinen

On Fri, 7 Dec 2007, Ilpo Järvinen wrote:

 On Fri, 7 Dec 2007, David Miller wrote:

  From: Ilpo_Järvinen [EMAIL PROTECTED]
  Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)

   I guess if you get a large cumulative ACK, the amount of processing is 
   still overwhelming (added DaveM if he has some idea how to combat it).

   Even a simple scenario (this isn't anything fancy at all, will occur all 
   the time): Just one loss = rest skbs grow one by one into a single 
   very large SACK block (and we do that efficiently for sure) = then the 
   fast retransmit gets delivered and a cumulative ACK for whole orig_window 
   arrives = clean_rtx_queue has to do a lot of processing. In this case we 
   could optimize RB-tree cleanup away (by just blanking it all) but still 
   getting rid of all those skbs is going to take a larger moment than I'd 
   like to see.

   That tree blanking could be extended to cover anything which ACK more 
   than 
   half of the tree by just replacing the root (and dealing with potential 
   recolorization of the root).

  Yes, it's the classic problem.  But it ought to be at least
  partially masked when TSO is in use, because we'll only process
  a handful of SKBs.  The more effectively TSO batches, the
  less work clean_rtx_queue() will do.

 No, that's not what is going to happen, TSO won't help at all
 because one-by-one SACKs will fragment every single one of them
 (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
 case, or am I missing something?

Hmm... this could be solved though by postponing the fragmentation of a 
partially sacked skb when the first sack block can (is likely) to still 
grow and remove the need for fragmentation. Has some implications to 
packet processing, increases burstiness a bit  tcp_max_burst kicks in too 
easily.

-- 
 i.

Re: [PATCH 1/2] cxgb3 - T3C support update

2007-12-07 Thread Jeff Garzik


Divy Le Ray wrote:

From: Divy Le Ray [EMAIL PROTECTED]

Update GPIO mapping for T3C.
Update xgmac for T3C support.
Fix typo in mtu table.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]


applied #upstream-fixes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][IPV4] Swap the ifa allocation with theipv4_devconf_setall call

2007-12-07 Thread Pavel Emelyanov

According to Herbert, the ipv4_devconf_setall should be called
only when the ifa is added to the device. However, failed 
ifa allocation may bring things into inconsistent state.

Move the call to ipv4_devconf_setall after the ifa allocation.

Fits both net-2.6 (with offsets) and net-2.6.25 (cleanly).

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0b5f042..1c3e20c 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -519,8 +519,6 @@ static struct in_ifaddr *rtm_to_ifaddr(struct nlmsghdr *nlh)
goto errout;
}
 
-   ipv4_devconf_setall(in_dev);
-
ifa = inet_alloc_ifa();
if (ifa == NULL) {
/*
@@ -531,6 +529,7 @@ static struct in_ifaddr *rtm_to_ifaddr(struct nlmsghdr *nlh)
goto errout;
}
 
+   ipv4_devconf_setall(in_dev);
in_dev_hold(in_dev);
 
if (tb[IFA_ADDRESS] == NULL)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][VLAN] Merge tree equal tails in vlan_skb_recv

2007-12-07 Thread Pavel Emelyanov

There are tree paths in it, that set the skb-proto and then
perform common receive manipulations (basically call netif_rx()).

I think, that we can make this code flow easier to understand
by introducing the vlan_set_encap_proto() function (I hope the 
name is good) to setup the skb proto and merge the paths calling 
netif_rx() together.

Surprisingly, but gcc detects this thing and merges these paths
by itself, so this patch doesn't make the vlan module smaller.

Fits both net-2.6 and net-2.6.25.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4f99bb8..11198c1 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -90,6 +90,40 @@ static inline struct sk_buff 
*vlan_check_reorder_header(struct sk_buff *skb)
return skb;
 }
 
+static inline void vlan_set_encap_proto(struct sk_buff *skb,
+   struct vlan_hdr *vhdr)
+{
+   __be16 proto;
+   unsigned char *rawp;
+
+   /*
+* Was a VLAN packet, grab the encapsulated protocol, which the layer
+* three protocols care about.
+*/
+
+   proto = vhdr-h_vlan_encapsulated_proto;
+   if (ntohs(proto) = 1536) {
+   skb-protocol = proto;
+   return;
+   }
+
+   rawp = skb-data;
+   if (*(unsigned short *)rawp == 0x)
+   /*
+* This is a magic hack to spot IPX packets. Older Novell
+* breaks the protocol design and runs IPX over 802.3 without
+* an 802.2 LLC layer. We look for  which isn't a used
+* 802.2 SSAP/DSAP. This won't work for fault tolerant netware
+* but does for the rest.
+*/
+   skb-protocol = htons(ETH_P_802_3);
+   else
+   /*
+* Real 802.2 LLC
+*/
+   skb-protocol = htons(ETH_P_802_2);
+}
+
 /*
  * Determine the packet's protocol ID. The rule here is that we
  * assume 802.3 if the type field is short enough to be a length.
@@ -115,12 +149,10 @@ static inline struct sk_buff 
*vlan_check_reorder_header(struct sk_buff *skb)
 int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
  struct packet_type* ptype, struct net_device *orig_dev)
 {
-   unsigned char *rawp = NULL;
struct vlan_hdr *vhdr;
unsigned short vid;
struct net_device_stats *stats;
unsigned short vlan_TCI;
-   __be16 proto;
 
if (dev-nd_net != init_net) {
kfree_skb(skb);
@@ -236,70 +268,11 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device 
*dev,
break;
}
 
-   /*  Was a VLAN packet, grab the encapsulated protocol, which the layer
-* three protocols care about.
-*/
-   /* proto = get_unaligned(vhdr-h_vlan_encapsulated_proto); */
-   proto = vhdr-h_vlan_encapsulated_proto;
-
-   skb-protocol = proto;
-   if (ntohs(proto) = 1536) {
-   /* place it back on the queue to be handled by
-* true layer 3 protocols.
-*/
-
-   /* See if we are configured to re-write the VLAN header
-* to make it look like ethernet...
-*/
-   skb = vlan_check_reorder_header(skb);
-
-   /* Can be null if skb-clone fails when re-ordering */
-   if (skb) {
-   netif_rx(skb);
-   } else {
-   /* TODO:  Add a more specific counter here. */
-   stats-rx_errors++;
-   }
-   rcu_read_unlock();
-   return 0;
-   }
-
-   rawp = skb-data;
-
-   /*
-* This is a magic hack to spot IPX packets. Older Novell breaks
-* the protocol design and runs IPX over 802.3 without an 802.2 LLC
-* layer. We look for  which isn't a used 802.2 SSAP/DSAP. This
-* won't work for fault tolerant netware but does for the rest.
-*/
-   if (*(unsigned short *)rawp == 0x) {
-   skb-protocol = htons(ETH_P_802_3);
-   /* place it back on the queue to be handled by true layer 3 
protocols.
-*/
-
-   /* See if we are configured to re-write the VLAN header
-* to make it look like ethernet...
-*/
-   skb = vlan_check_reorder_header(skb);
-
-   /* Can be null if skb-clone fails when re-ordering */
-   if (skb) {
-   netif_rx(skb);
-   } else {
-   /* TODO:  Add a more specific counter here. */
-   stats-rx_errors++;
-   }
-   rcu_read_unlock();
-   return 0;
-   }
-
-   /*
-*  Real 802.2 LLC
-*/
-   skb-protocol = htons(ETH_P_802_2);
/* place it back on the queue to be handled by upper layer

Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2007-12-07 Thread Andrew Morton

On Fri, 07 Dec 2007 04:51:37 + David Woodhouse [EMAIL PROTECTED] wrote:

 On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote:
  Well I clearly goofed when I added the initial network namespace support
  for /proc/net.  Currently things work but there are odd details visible
  to user space, even when we have a single network namespace.
  
  Since we do not cache proc_dir_entry dentries at the moment we can
  just modify -lookup to return a different directory inode depending
  on the network namespace of the process looking at /proc/net, replacing
  the current technique of using a magic and fragile follow_link method.
  
  To accomplish that this patch:
  - introduces a shadow_proc method to allow different dentries to
be returned from proc_lookup.
  - Removes the old /proc/net follow_link magic
  - Fixes a weakness in our not caching of proc generic dentries.
  
  As shadow_proc uses a task struct to decided which dentry to return we
  can go back later and fix the proc generic caching without modifying any 
  code that
  uses the shadow_proc method.
  
  Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
  ---
   fs/proc/generic.c   |   12 ++-
   fs/proc/proc_net.c  |   86 
  +++
   include/linux/proc_fs.h |3 ++
   3 files changed, 19 insertions(+), 82 deletions(-)
 
 (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
 
 This seems to have broken the use of /proc/bus/usb as a mountpoint. It
 always appears empty now, whatever's supposed to be mounted there.
 

Yes.  Denis and Eric are tossing around competing patches but afaik nobody
is happy with any of them.  Guys, could we get this sorted soonish please?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx.

2007-12-07 Thread YOSHIFUJI Hideaki / 吉藤英明

RTCF_xxx flags, defined in include/linux/in_route.h) are available for
IPv4 route (rtable) entries only.  Use RTF_xxx flags instead,
defined in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info).

Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

--
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 82e27b8..b8e9eb4 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -233,7 +233,7 @@ __xfrm6_bundle_create(struct xfrm_policy *policy, struct 
xfrm_state **xfrm, int
dst_prev-output = dst_prev-xfrm-outer_mode-afinfo-output;
/* Sheit... I remember I did this right. Apparently,
 * it was magically lost, so this code needs audit */
-   x-u.rt6.rt6i_flags= 
rt0-rt6i_flags(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL);
+   x-u.rt6.rt6i_flags= 
rt0-rt6i_flags(RTF_ANYCAST|RTF_LOCAL);
x-u.rt6.rt6i_metric   = rt0-rt6i_metric;
x-u.rt6.rt6i_node = rt0-rt6i_node;
x-u.rt6.rt6i_gateway  = rt0-rt6i_gateway;

-- 
YOSHIFUJI Hideaki @ USAGI Project  [EMAIL PROTECTED]
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch resend build-breakage] make bnx2x select ZLIB_INFLATE

2007-12-07 Thread Jeff Garzik


Eliezer Tamir wrote:

The bnx2x module depends on the zlib_inflate functions.  The
build will fail if ZLIB_INFLATE has not been selected manually
or by building another module that automatically selects it.

Modify BNX2X config option to 'select ZLIB_INFLATE' like BNX2
and others.  This seems to fix it.


Signed-off-by:  Lee Schermerhorn [EMAIL PROTECTED]
Acked-by: Eliezer Tamir [EMAIL PROTECTED]
---
 drivers/net/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 5bafb30..b9d7f5b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2594,6 +2594,7 @@ config TEHUTI
 config BNX2X
tristate Broadcom NetXtremeII 10Gb support
depends on PCI
+   select ZLIB_INFLATE
help


applied #upstream-fixes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] TCP illinois max rtt aging

2007-12-07 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Fri, 7 Dec 2007 15:05:59 +0200 (EET)

 On Fri, 7 Dec 2007, David Miller wrote:

  From: Ilpo_Järvinen [EMAIL PROTECTED]
  Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)

   I guess if you get a large cumulative ACK, the amount of processing is 
   still overwhelming (added DaveM if he has some idea how to combat it).

   Even a simple scenario (this isn't anything fancy at all, will occur all 
   the time): Just one loss = rest skbs grow one by one into a single 
   very large SACK block (and we do that efficiently for sure) = then the 
   fast retransmit gets delivered and a cumulative ACK for whole orig_window 
   arrives = clean_rtx_queue has to do a lot of processing. In this case we 
   could optimize RB-tree cleanup away (by just blanking it all) but still 
   getting rid of all those skbs is going to take a larger moment than I'd 
   like to see.

   That tree blanking could be extended to cover anything which ACK more 
   than 
   half of the tree by just replacing the root (and dealing with potential 
   recolorization of the root).

  Yes, it's the classic problem.  But it ought to be at least
  partially masked when TSO is in use, because we'll only process
  a handful of SKBs.  The more effectively TSO batches, the
  less work clean_rtx_queue() will do.

 No, that's not what is going to happen, TSO won't help at all
 because one-by-one SACKs will fragment every single one of them
 (see tcp_match_skb_to_sack) :-(. ...So we're back in non-TSO
 case, or am I missing something?

You're of course right, and it's ironic that I wrote the SACK
splitting code so I should have known this :-)

A possible approach just occurred to me wherein we maintain
the SACK state external to the SKBs so that we don't need to
mess with them at all.

That would allow us to eliminate the TSO splitting but it would
not remove the general problem of clean_rtx_queue()'s overhead.

I'll try to give some thought to this over the weekend.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

77 matches

Mail list logo