Re: L2 network namespace benchmarking

2007-03-28 Thread Daniel Lezcano

Eric W. Biederman wrote:

Daniel Lezcano [EMAIL PROTECTED] writes:


3. General observations
---

The objective to have no performances degrations, when the network
namespace is off in the kernel, is reached in both solutions.

When the network is used outside the container and the network
namespace are compiled in, there is no performance degradations.

Eric's patchset allows to move network devices between namespaces and
this is clearly a good feature, missing in the Dmitry's patchset. This
feature helps us to see that the network namespace code does not add
overhead when using directly the physical network device into the
container.


Assuming these results are not contradicted this says that the extra
dereference where we need it does not add measurable to the overhead
in the Linus network stack.  Performance wise this should be good
enough to allow merging the code into the linux kernel, as it does
not measurably affect networking when we do not have multiple
containers in use.


I have a few questions about merging code into the linux kernel.

* How do you plan to do that ?
* When do you expect to have the network namespace into mainline ?
* Are Dave Miller and Alexey Kuznetov aware of the network namespace ?
* Did they saw your patchset or ever know it exists ?
* Do you have any feedbacks from netdev about the network namespace ?



Things are good enough that we can even consider not providing
an option to compile the support out.


The loss of performances is very noticeable inside the container and
seems to be directly related to the usage of the pair device and the
specific network configuration needed for the container. When the
packets are sent by the container, the mac address is for the pair
device but the IP address is not owned by the host. That directly
implies to have the host to act as a router and the packets to be
forwarded. That adds a lot of overhead.


Well it adds measurable overhead.


A hack has been made in the ip_forward function to avoid useless
skb_cow when using the pair device/tunnel device and the overhead
is reduced by the half.


To be fully satisfactory how we get the packets to the namespace
still appears to need work.

We have overhead in routing.  That may simply be the cost of
performing routing or there may be some optimizations opportunities
there.
We have about the same overhead when performing bridging which I
actually find more surprising, as the bridging code should involve
less packet handling.


Yep. I will try to figure out what is happening.


Ideally we can optimize the bridge code or something equivalent to
it so that we can take one look at the destination mac address and
know which network namespace we should be in.  Potentially moving this
work to hardware when the hardware supports multiple queues.

If we can get the overhead out of the routing code that would be
tremendous.  However I think it may be more realistic to get the
overhead out of the ethernet bridging code where we know we don't need
to modify the packet.


The routing was optimized for the loopback, no ? Why can't we do the 
same for the etun device ?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ping6 to own link-local address

2007-03-28 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Wed, 21 Mar 2007 00:26:09 +0100 (CET)), 
YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] says:

 In article [EMAIL PROTECTED] (at Tue, 20 Mar 2007 15:16:40 -0700), Sridhar 
 Samudrala [EMAIL PROTECTED] says:
 
  On Tue, 2007-03-20 at 10:19 +0100, YOSHIFUJI Hideaki / 吉藤英明 wrote:
   Hello.
   
   Recent 2.6.21-git kernels do not respond to ping6 queries
   to our own (local) link-local address.  Now bisecting...
  
  The following patch seems to be the cause for this regression.
  
  [IPV6] ROUTE: Do not route packets to link-local address on other device.
  
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f
 
 Right.  Hmm...

Well, 2.6.21 is coming.  I think it is better to revert it for now.
Current situation is more critical than the original.

Dave?

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/1] netlink: no need to crash if table does not exist.

2007-03-28 Thread Evgeniy Polyakov
On Tue, Mar 27, 2007 at 04:41:54PM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
  There is no problem as-is, but I implement unified cache for different
  sockets (currently tcp/udp/raw and netlink are supported), which does
  not use that table, so I currently wrap all access code into special
  ifdefs, this one can be wrapped too, but since it is not needed, it
  saves couple of lines of code.
 
 It is needed.  It is there to make sure that a kernel netlink
 socket is not created before the af_netlink init code runs.
 
 We've had sequencing bugs like that in the initcall call chain
 in the past, that's why the check is there.

Argh, I see.
I fail to find exact commit (at least it was not in 2.4 and was created
before 2.6.12), but it is ineed neeed.

Thanks for explaination.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ping6 to own link-local address

2007-03-28 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 16:57:48 +0900 (JST)

 In article [EMAIL PROTECTED] (at Wed, 21 Mar 2007 00:26:09 +0100 (CET)), 
 YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] says:
 
  In article [EMAIL PROTECTED] (at Tue, 20 Mar 2007 15:16:40 -0700), 
  Sridhar Samudrala [EMAIL PROTECTED] says:
  
   The following patch seems to be the cause for this regression.
   
   [IPV6] ROUTE: Do not route packets to link-local address on other device.
   
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f
  
  Right.  Hmm...
 
 Well, 2.6.21 is coming.  I think it is better to revert it for now.
 Current situation is more critical than the original.
 
 Dave?

I will look into this and make some kind of decision on how
to proceed tomorrow.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch:replace with time_after in drivers/net/eexpress.c

2007-03-28 Thread Alan Cox
On Wed, Mar 28, 2007 at 10:44:31AM +0530, Shani wrote:
 Hi,
 
 Replacing with time_after in drivers/net/eexpress.c
 Applies and compiles clean on latest tree.Not tested.
 
 thanks.
 
 Signed-off-by: Shani Moideen [EMAIL PROTECTED]

NAK as not tested. The existing code is known to work so ugly or not
it is better than untested changes


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ppp_generic: lockdep warning Re: [Bug 8132] New: pptp server lockup ...

2007-03-28 Thread Jarek Poplawski
On Mon, Mar 19, 2007 at 10:49:12AM +0300, Yuriy N. Shkandybin wrote:
 I've changed kernel to rc4 and completely changed hardware.
 Now this is
 
 I've got new trace, but this is another problem as i can see and connected 
 with pppoe
 
 ===
 [ INFO: possible circular locking dependency detected ]
 2.6.21-rc4 #1
 ---
 pppd/8926 is trying to acquire lock:
 (vlan_netdev_xmit_lock_key){-...}, at: [c0265486] 
 dev_queue_xmit+0x247/0x2f1
 
 but task is already holding lock:
 (pch-downl){-+..}, at: [c0230c72] ppp_channel_push+0x19/0x9a
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 - #3 (pch-downl){-+..}:
   [c013642b] __lock_acquire+0xe62/0x1010
   [c0136642] lock_acquire+0x69/0x83
   [c02afc13] _spin_lock_bh+0x30/0x3d
   [c022f715] ppp_push+0x5a/0x9a
   [c022fb40] ppp_xmit_process+0x2e/0x511
   [c0231a05] ppp_write+0xb8/0xf2
   [c015ec26] vfs_write+0x7f/0xba
   [c015f158] sys_write+0x3d/0x64
   [c01027de] sysenter_past_esp+0x5f/0x99
   [] 0x
 
 - #2 (ppp-wlock){-+..}:
   [c013642b] __lock_acquire+0xe62/0x1010
   [c0136642] lock_acquire+0x69/0x83
   [c02afc13] _spin_lock_bh+0x30/0x3d
   [c022fb2b] ppp_xmit_process+0x19/0x511
   [c02318d3] ppp_start_xmit+0x18a/0x204
   [c0263a6f] dev_hard_start_xmit+0x1f6/0x2c4
   [c026ded3] __qdisc_run+0x81/0x1bc
   [c026549e] dev_queue_xmit+0x25f/0x2f1
   [c027c75f] ip_output+0x1be/0x25f
   [c02788ce] ip_forward+0x159/0x22b
   [c027745c] ip_rcv+0x297/0x4dd
   [c0263698] netif_receive_skb+0x164/0x1f2
   [c022199d] e1000_clean_rx_irq+0x12a/0x4b7
   [c02209bc] e1000_clean+0x3ff/0x5dd
   [c0265084] net_rx_action+0x7d/0x12b
   [c011e442] __do_softirq+0x82/0xf2
   [c011e509] do_softirq+0x57/0x59
   [c011e877] irq_exit+0x7f/0x81
   [c0105011] do_IRQ+0x45/0x84
   [c0103252] common_interrupt+0x2e/0x34
   [c0100b66] mwait_idle+0x12/0x14
   [c0100c60] cpu_idle+0x6c/0x86
   [c01001cd] rest_init+0x23/0x36
   [c0377d89] start_kernel+0x3ca/0x461
   [] 0x0
   [] 0x
 
 - #1 (dev-_xmit_lock){-+..}:
   [c013642b] __lock_acquire+0xe62/0x1010
   [c0136642] lock_acquire+0x69/0x83
   [c02afc13] _spin_lock_bh+0x30/0x3d
   [c0266861] dev_mc_add+0x34/0x16a
   [c02ab5c7] vlan_dev_set_multicast_list+0x88/0x25c
   [c0266592] __dev_mc_upload+0x22/0x24
   [c0266914] dev_mc_add+0xe7/0x16a
   [c029f323] igmp_group_added+0xe6/0xeb
   [c029f50b] ip_mc_inc_group+0x13f/0x210
   [c029f5fa] ip_mc_up+0x1e/0x61
   [c029ab81] inetdev_event+0x154/0x2c7
   [c0125a46] notifier_call_chain+0x2c/0x39
   [c0125a7c] raw_notifier_call_chain+0x8/0xa
   [c026477a] dev_open+0x6d/0x71
   [c0263028] dev_change_flags+0x51/0x101
   [c029b7ca] devinet_ioctl+0x4df/0x644
   [c029bc03] inet_ioctl+0x5c/0x6f
   [c02596e0] sock_ioctl+0x4f/0x1e8
   [c0168c32] do_ioctl+0x22/0x71
   [c0168cd6] vfs_ioctl+0x55/0x27e
   [c0168f32] sys_ioctl+0x33/0x51
   [c01027de] sysenter_past_esp+0x5f/0x99
   [] 0x
 
 - #0 (vlan_netdev_xmit_lock_key){-...}:
   [c0136289] __lock_acquire+0xcc0/0x1010
   [c0136642] lock_acquire+0x69/0x83
   [c02afbd6] _spin_lock+0x2b/0x38
   [c0265486] dev_queue_xmit+0x247/0x2f1
   [c02334f6] __pppoe_xmit+0x1a9/0x215
   [c023356c] pppoe_xmit+0xa/0xc
   [c0230c9a] ppp_channel_push+0x41/0x9a
   [c0231a13] ppp_write+0xc6/0xf2
   [c015ec26] vfs_write+0x7f/0xba
   [c015f158] sys_write+0x3d/0x64
   [c01027de] sysenter_past_esp+0x5f/0x99
   [] 0x
 
 other info that might help us debug this:
 
 1 lock held by pppd/8926:
 #0:  (pch-downl){-+..}, at: [c0230c72] ppp_channel_push+0x19/0x9a
 
 stack backtrace:
 [c0103834] show_trace_log_lvl+0x1a/0x30
 [c0103f16] show_trace+0x12/0x14
 [c0103f9d] dump_stack+0x16/0x18
 [c01343cd] print_circular_bug_tail+0x68/0x71
 [c0136289] __lock_acquire+0xcc0/0x1010
 [c0136642] lock_acquire+0x69/0x83
 [c02afbd6] _spin_lock+0x2b/0x38
 [c0265486] dev_queue_xmit+0x247/0x2f1
 [c02334f6] __pppoe_xmit+0x1a9/0x215
 [c023356c] pppoe_xmit+0xa/0xc
 [c0230c9a] ppp_channel_push+0x41/0x9a
 [c0231a13] ppp_write+0xc6/0xf2
 [c015ec26] vfs_write+0x7f/0xba
 [c015f158] sys_write+0x3d/0x64
 [c01027de] sysenter_past_esp+0x5f/0x99
 ===
 Clocksource tsc unstable (delta = 4686844667 ns)
 Time: acpi_pm clocksource has been installed.
...

lockdep has seen locks - #0 - - #3 taken in circular
order, but IMHO, lock - #3 (pch-downl) taken after
- #2 (ppp-wlock) differs from pch-downl lock taken in
- #0 (before vlan_netdev_xmit_lock_key) and lockdep
should be notified about this.

This patch proposal needs confirmation by some PPP expert
that channels processed in ppp_channel_push() differ from
channels 

Re: [PATCH]: Fix ipv6 round-robin locking

2007-03-28 Thread YOSHIFUJI Hideaki / 吉藤英明
Hello.

In article [EMAIL PROTECTED] (at Sat, 24 Mar 2007 12:44:36 -0700 (PDT)), 
David Miller [EMAIL PROTECTED] says:

 The fix for the most serious of them is below, and I'd appreciate
 any feedback if people spot any problems or holes in that approach.

I hoped we could save some memory per fib6_node,
but I'm fine with it.

Regards,

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Inline net_device_stats

2007-03-28 Thread Rusty Russell
Hi all,

Does something like this make sense for future drivers?

Cheers,
Rusty.
===
Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them.  It would be nice if
this were done by default.

1) Add a new stats field to struct net_device.
2) Add a new feature field to say this driver uses the internal one
3) Have a default get_stats which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
   -get_stats is set.

This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.

Lightly tested: works for a modified lguest network driver.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 0 files changed

diff -r 1ccab0a087b7 arch/s390/appldata/appldata_net_sum.c
--- a/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 13:46:10 2007 +1000
+++ b/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 14:28:47 2007 +1000
@@ -108,10 +108,10 @@ static void appldata_get_net_sum_data(vo
collisions = 0;
read_lock(dev_base_lock);
for (dev = dev_base; dev != NULL; dev = dev-next) {
-   if (dev-get_stats == NULL) {
+   stats = dev-get_stats(dev);
+   if (stats == NULL) {
continue;
}
-   stats = dev-get_stats(dev);
rx_packets += stats-rx_packets;
tx_packets += stats-tx_packets;
rx_bytes   += stats-rx_bytes;
diff -r 1ccab0a087b7 drivers/net/bonding/bond_main.c
--- a/drivers/net/bonding/bond_main.c   Tue Mar 27 13:46:10 2007 +1000
+++ b/drivers/net/bonding/bond_main.c   Tue Mar 27 14:29:08 2007 +1000
@@ -3621,9 +3621,8 @@ static struct net_device_stats *bond_get
read_lock_bh(bond-lock);
 
bond_for_each_slave(bond, slave, i) {
-   if (slave-dev-get_stats) {
-   sstats = slave-dev-get_stats(slave-dev);
-
+   sstats = slave-dev-get_stats(slave-dev);
+   if (sstats) {
stats-rx_packets += sstats-rx_packets;
stats-rx_bytes += sstats-rx_bytes;
stats-rx_errors += sstats-rx_errors;
diff -r 1ccab0a087b7 drivers/parisc/led.c
--- a/drivers/parisc/led.c  Tue Mar 27 13:46:10 2007 +1000
+++ b/drivers/parisc/led.c  Tue Mar 27 14:29:17 2007 +1000
@@ -372,9 +372,9 @@ static __inline__ int led_get_net_activi
continue;
if (LOOPBACK(in_dev-ifa_list-ifa_local))
continue;
-   if (!dev-get_stats) 
+   stats = dev-get_stats(dev);
+   if (!stats) 
continue;
-   stats = dev-get_stats(dev);
rx_total += stats-rx_packets;
tx_total += stats-tx_packets;
}
diff -r 1ccab0a087b7 include/linux/netdevice.h
--- a/include/linux/netdevice.h Tue Mar 27 13:46:10 2007 +1000
+++ b/include/linux/netdevice.h Tue Mar 27 14:21:09 2007 +1000
@@ -325,6 +325,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#define NETIF_F_INTERNAL_STATS 8192/* Use stats structure in net_device */
 
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
@@ -349,6 +350,7 @@ struct net_device
 
 
struct net_device_stats* (*get_stats)(struct net_device *dev);
+   struct net_device_stats stats;
 
 #ifdef CONFIG_WIRELESS_EXT
/* List of functions to handle Wireless Extensions (instead of ioctl).
diff -r 1ccab0a087b7 net/core/dev.c
--- a/net/core/dev.cTue Mar 27 13:46:10 2007 +1000
+++ b/net/core/dev.cTue Mar 27 14:30:05 2007 +1000
@@ -825,7 +825,6 @@ static int default_rebuild_header(struct
return 1;
 }
 
-
 /**
  * dev_open- prepare an interface for use.
  * @dev:   device to open
@@ -2120,9 +2119,9 @@ void dev_seq_stop(struct seq_file *seq, 
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
 {
-   if (dev-get_stats) {
-   struct net_device_stats *stats = dev-get_stats(dev);
-
+   struct net_device_stats *stats = dev-get_stats(dev);
+
+   if (stats) {
seq_printf(seq, %6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu 
%8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n,
   dev-name, stats-rx_bytes, stats-rx_packets,
@@ -3146,6 +3145,13 @@ out:
mutex_unlock(net_todo_run_mutex);
 }
 
+static struct net_device_stats *maybe_internal_stats(struct net_device *dev)
+{
+   if (dev-features  NETIF_F_INTERNAL_STATS)
+   return dev-stats;
+   return NULL;
+}
+
 /**
  * alloc_netdev - allocate network device
  * @sizeof_priv:   size of private data to allocate space for
@@ -3181,6 +3187,7 

Re: [PATCH]: Fix ipv6 round-robin locking

2007-03-28 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 18:16:42 +0900 (JST)

 I hoped we could save some memory per fib6_node,
 but I'm fine with it.

I know, I did not want to add it either :(

Speaking of which, several of the potential fixes for the rt6_probe()
deadlock require adding even more things to the fib6_node (a linked
list which some workqueue or similar can run, or a timer, etc.).

So, I'm trying to figure out a way to get the rt6_probe() to run
outside of the per-table rwlock without adding more state to
fib6_node.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen

 
  But I think it can be mostly ignored.
 
 With all due respect, it cannot. An attacker with a small-sized botnet 
 (which is ~250 hosts) can create chains that contain well in excess of 3000 
 items. 

Most likely they can also easily generate enough latency data to crack any 
simple
hash function then.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Evgeniy Polyakov
On Wed, Mar 28, 2007 at 11:29:43AM +0200, Andi Kleen ([EMAIL PROTECTED]) wrote:
   But I think it can be mostly ignored.
  
  With all due respect, it cannot. An attacker with a small-sized botnet 
  (which is ~250 hosts) can create chains that contain well in excess of 3000 
  items. 
 
 Most likely they can also easily generate enough latency data to crack any 
 simple
 hash function then.

Jenkins hash is far from being simple to crack, although with some
knowledge it can be done faster.

SHA or something else is essentially the same, except it has different
set of rounds - we only hashes 3 32 bit values, so Jenkins result is
really good.

For the hash tables it is a good solution, but we can move further.
I created multidimensional trie with that problem in mind, but it looks
like right now it is not absolutely required solution - I will continue
to work on it to check if we can be faster than properly sized hash
table with additional trie allocation overhead, but likely I should not
force people include it as is, only for information I think.

 -Andi

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications

2007-03-28 Thread Jarek Poplawski
On 27-03-2007 14:38, Thomas Graf wrote:
 The results of FIB rules lookups are cached in the routing cache
 except for IPv6 as no such cache exists. So far, it was the
 responsibility of the user to flush the cache after modifying any
 rules. This lead to many false bug reports due to misunderstanding
 of this concept.
 
 This patch automatically flushes the route cache after inserting
 or deleting a rule.

I hope I'm wrong, but isn't this at the cost of admins
working with long rules' sets, which (probably) take extra
time now?

Regards,
Jarek P. 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: L2 network namespace benchmarking

2007-03-28 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 Daniel Lezcano [EMAIL PROTECTED] writes:

 3. General observations
 ---

 The objective to have no performances degrations, when the network
 namespace is off in the kernel, is reached in both solutions.

 When the network is used outside the container and the network
 namespace are compiled in, there is no performance degradations.

 Eric's patchset allows to move network devices between namespaces and
 this is clearly a good feature, missing in the Dmitry's patchset. This
 feature helps us to see that the network namespace code does not add
 overhead when using directly the physical network device into the
 container.

 Assuming these results are not contradicted this says that the extra
 dereference where we need it does not add measurable to the overhead
 in the Linus network stack.  Performance wise this should be good
 enough to allow merging the code into the linux kernel, as it does
 not measurably affect networking when we do not have multiple
 containers in use.

 I have a few questions about merging code into the linux kernel.

 * How do you plan to do that ?
One small comprehensible piece at a time.

Basically some variant of etun should not be a problem to merge
then I have to get some part of the network namespace code merged,
and the concept accepted.

Once the basic acceptance occurs it just becomes a long slog of
merging more and more patches.


 * When do you expect to have the network namespace into mainline ?
My current goal is to finish my rebase against 2.6.linus_lastest in
the next couple of days after having figured out how to deal with sysfs.

I have been doing reviewing in more code then I know what to do with,
and fighting some very strange bugs during the stabilization window.
Which has kept me from doing additional development.  Plus I have
had a cold.

 * Are Dave Miller and Alexey Kuznetov aware of the network namespace ?
Aware yes, reviewed not yet.  I believe Alexey is a little more
familiar with the OpenVZ work.  The high level concepts still apply.

 * Did they saw your patchset or ever know it exists ?
Yes.

 * Do you have any feedbacks from netdev about the network namespace ?
Not really.  Except that Dave Miller wanted to review what I posted
last time but the timing was bad and he failed to get around to it.

 To be fully satisfactory how we get the packets to the namespace
 still appears to need work.

 We have overhead in routing.  That may simply be the cost of
 performing routing or there may be some optimizations opportunities
 there.
 We have about the same overhead when performing bridging which I
 actually find more surprising, as the bridging code should involve
 less packet handling.

 Yep. I will try to figure out what is happening.

Thanks.

 Ideally we can optimize the bridge code or something equivalent to
 it so that we can take one look at the destination mac address and
 know which network namespace we should be in.  Potentially moving this
 work to hardware when the hardware supports multiple queues.

 If we can get the overhead out of the routing code that would be
 tremendous.  However I think it may be more realistic to get the
 overhead out of the ethernet bridging code where we know we don't need
 to modify the packet.

 The routing was optimized for the loopback, no ? Why can't we do the same for
 the etun device ?

I have no problem with it if we can use valid optimizations.  Avoiding a
packet copy when the packet is marked as having a second copy somewhere
else does not sound like a valid optimization to me.

Routing through both network namespaces so that we can set up a dst
cache entry that takes you to the final destination I am will to
working with.  Perhaps something that hits this piece of the etun driver,
so we don't have to make a second set of routing decisions.

if (skb-dst)
skb-dst = dst_pop(skb-dst);   /* Allow for smart routing */

tcpdump at any phase of the process should be able to do the right thing.

Mostly I care right now in that it is interesting to know where the
performance overhead is coming from.  Unless it is something of a
merge stopper I don't much care about how we are going to fix it yet,
especially if it is only cross network namespace traffic.

If I read the results right it took a 32bit machine from AMD with
a gigabit interface before you could measure a throughput difference.
That isn't shabby for a non-optimized code path.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626)

2007-03-28 Thread Patrick McHardy
I got this warning with the current net-2.6.22 tree:

KERNEL: assertion ((int)tp-sacked_out = 0) failed at
net/ipv4/tcp_input.c (2626)
Leak s=4294967292 3

Can't say what exactly triggered it.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: L2 network namespace benchmarking

2007-03-28 Thread Eric W. Biederman
Kirill Korotaev [EMAIL PROTECTED] writes:


 Ideally we can optimize the bridge code or something equivalent to
 it so that we can take one look at the destination mac address and
 know which network namespace we should be in.  Potentially moving this
 work to hardware when the hardware supports multiple queues.
 yes, we can hack the bridge, so that packets coming out of eth devices
 can go directly to the container and get out of veth devices from
 inside the container.

 If we can get the overhead out of the routing code that would be
 tremendous.  However I think it may be more realistic to get the
 overhead out of the ethernet bridging code where we know we don't need
 to modify the packet.
 Why not optimize both? :)

If the optimizations are safe and correct I don't have a problem.

When we seem to have multiple copies of a packet in circulation and
we skip a what appears to be a required copy on write, I'm dubious.

Although the more I look at suggested optimization the less dubious I
am as it appears all we are skipping is a ttl decrement and the cow
flag exclusively applies to the data chunk and not the header chunk of
the packet whatever that means.

However we still need to guard against a loop in our routing table
setup between multiple guests.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen
Evgeniy Polyakov [EMAIL PROTECTED] writes:
 
 Jenkins hash is far from being simple to crack, although with some
 knowledge it can be done faster.

TCP tends to be initialized early before there is anything
good in the entropy pool.

static void init_std_data(struct entropy_store *r)
{
struct timeval tv;
unsigned long flags;

spin_lock_irqsave(r-lock, flags);
r-entropy_count = 0;
spin_unlock_irqrestore(r-lock, flags);

do_gettimeofday(tv);
add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4);
add_entropy_words(r, (__u32 *)utsname(),
  sizeof(*(utsname()))/4);
}

utsname is useless here because it runs before user space has 
a chance to set it. The only truly variable thing is the 
boot time, which can be guessed with the ns part being brute forced.

To make it secure you would need to do regular rehash like
the routing cache which would pick up true randomness on the first
rehash.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function II

2007-03-28 Thread Andi Kleen
Andi Kleen [EMAIL PROTECTED] writes:

 The only truly variable thing is the 
 boot time, which can be guessed 

Actually you don't need to guess it. It's in any TCP timestamp.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626)

2007-03-28 Thread Ilpo Järvinen
On Wed, 28 Mar 2007, Patrick McHardy wrote:

 I got this warning with the current net-2.6.22 tree:
 
 KERNEL: assertion ((int)tp-sacked_out = 0) failed at
 net/ipv4/tcp_input.c (2626)
 Leak s=4294967292 3
 
 Can't say what exactly triggered it.

It seems I'm being guilty to this one, Dave please apply to net-2.6.22 
(besides this I think the tcp_sync_left_out should be changed but I'll 
prepare a patch for that later). Btw, how should this kind of email with 
some non-patch description+patch be formatted?).

[PATCH] [TCP]: Timedout loop must skip SACKed skbs too while marking

Marking skb with both S and L is invalid, and that could easily
happen in the timedout loop. Later on the tcp_sync_left_out
reduces sacked_out if lost_out + sacked_out  packets_out and
then eventually sacked_out underflows triggering a debug trap in
tcp_clean_rtx_queue.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d116887..7a59ffe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1777,7 +1777,8 @@ static void tcp_timedout_mark_forward(st
if (skb == tcp_send_head(sk) || !tcp_skb_timedout(sk, skb))
break;
/* Could be lost already from a previous timedout check */
-   if (!(TCP_SKB_CB(skb)-sacked  TCPCB_LOST)) {
+   if (!(TCP_SKB_CB(skb)-sacked 
+(TCPCB_LOST|TCPCB_SACKED_ACKED))) {
TCP_SKB_CB(skb)-sacked |= TCPCB_LOST;
tp-lost_out += tcp_skb_pcount(skb);
tcp_verify_retransmit_hint(tp, skb);
-- 
1.4.2

[PATCH] [TCP]: Rexmit hint must be cleared instead of setting it

2007-03-28 Thread Ilpo Järvinen
Stupid error from my side. Even though now that I noticed this,
I hoped it would have been an optimization but no, the counter
hint is then incorrect. Thus clearing is necessary for now (I
still suspect though that this path is never executed).

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7a59ffe..c855791 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1763,7 +1763,7 @@ static void tcp_verify_retransmit_hint(s
!(TCP_SKB_CB(skb)-sackedTCPCB_SACKED_RETRANS) 
before(TCP_SKB_CB(skb)-seq,
TCP_SKB_CB(tp-retransmit_skb_hint)-seq))
-   tp-retransmit_skb_hint = skb;
+   tp-retransmit_skb_hint = NULL;
 }
 
 /* Forward walk starting from until a not timedout skb is encountered, timeout
-- 
1.4.2


Re: RFC: Established connections hash function

2007-03-28 Thread Eric Dumazet
On 28 Mar 2007 16:14:17 +0200
Andi Kleen [EMAIL PROTECTED] wrote:
 TCP tends to be initialized early before there is anything
 good in the entropy pool.
 
 static void init_std_data(struct entropy_store *r)
 {
 struct timeval tv;
 unsigned long flags;
 
 spin_lock_irqsave(r-lock, flags);
 r-entropy_count = 0;
 spin_unlock_irqrestore(r-lock, flags);
 
 do_gettimeofday(tv);
 add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4);
 add_entropy_words(r, (__u32 *)utsname(),
   sizeof(*(utsname()))/4);
 }
 
 utsname is useless here because it runs before user space has 
 a chance to set it. The only truly variable thing is the 
 boot time, which can be guessed with the ns part being brute forced.
 
 To make it secure you would need to do regular rehash like
 the routing cache which would pick up true randomness on the first
 rehash.

Good point, but :

1) We can now use struct timespec to get more bits in init_std_data()

2) tcp ehash salt is initialized at first socket creation, not boot time. Maybe 
we have more available entropy at this point.

3) We dont want to be 'totally secure'. We only want to raise the level, and 
eventually see if we have to spend more time on this next year(s). AFAIK we had 
two different reports from people being hit by the flaw of previous hash. Not 
really a critical issue.

4) We could add a hard limit on the length of one chain. Even if the bad guys 
discover a flaw, it wont hurt too much.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [RFC] [TCP]: Catch skb with S+L bugs earlier

2007-03-28 Thread Ilpo Järvinen
SACKED_ACKED and LOST are mutually exclusive, thus this
condition is bug with SACK (IMHO). NewReno, however, could get
enough duplicate ACKs which increment sacked_out, so it makes
sense to do this kind of limitting for non-SACK TCP but not for
SACK-enabled one. Perhaps the author had that in mind but did
the logic accidently wrong way around?

Eventually these bugs trigger traps in the tcp_clean_rtx_queue
but it's much more informative to do this here (excludes some
other possible bugs).

Maybe this BUG_TRAP is too expensive to be included everywhere
in the TCP code. Should there be some #if to surround it?

Compile tested. Sadly enough I don't have time for couple of
weeks to test this as it would require some setuping, and besides,
my test machines are occupied currently to other work, but this 
might also be net-2.6 (or even stable) material if it really
works (feel free to cut this paragraph or part of it if you
decide to include this :-)).

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/net/tcp.h |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fe1c4f0..3c8dd13 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -733,9 +733,10 @@ static inline __u32 tcp_current_ssthresh
 
 static inline void tcp_sync_left_out(struct tcp_sock *tp)
 {
-   if (tp-rx_opt.sack_ok 
-   (tp-sacked_out = tp-packets_out - tp-lost_out))
+   if (tp-sacked_out + tp-lost_out  tp-packets_out) {
+   BUG_TRAP(!tp-rx_opt.sack_ok);
tp-sacked_out = tp-packets_out - tp-lost_out;
+   }
tp-left_out = tp-sacked_out + tp-lost_out;
 }
 
-- 
1.4.2


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen
On Wed, Mar 28, 2007 at 03:50:47PM +0200, Eric Dumazet wrote:
 1) We can now use struct timespec to get more bits in init_std_data()

That would be a good change, but i don't think it would help that much.
If you know the hardware (e.g. webhost farms tend to have quite
predictive kit) and the kernel binary and the boot offset from
the timestamp you can probably guess the range of ns pretty closely
(let's say down to a few ms). With that it's a small range to search. 

 2) tcp ehash salt is initialized at first socket creation, not boot time. 
 Maybe we have more available entropy at this point.

Sockets are created early too. It would be a little better, but probably
not much. 

The only true random seed is disk, keyboard/mouse and previous state. previous
state is typically a relatively late init script, probably after the first
socket.  Servers tend to have no disk/mouse activity.

Disk may work if you manage to put it after the root mount, but you
lose on diskless systems. e.g. if the nfsd is built in that wouldn't
work though because it would create sockets before that.

Getting entropy from network interrupts would avoid the the diskless
issue, but people are paranoid about that.


 3) We dont want to be 'totally secure'. We only want to raise the level, and 
 eventually see if we have to spend more time on this next year(s). AFAIK we 
 had two different reports from people being hit by the flaw of previous hash. 
 Not really a critical issue.

Yes, but you probably want a complexity of at least 10^5-10^6 to be any
useful. I don't think you will get that early in boot from random
unless you use hardware support.

 
 4) We could add a hard limit on the length of one chain. Even if the bad guys 
 discover a flaw, it wont hurt too much.

Or just use the trie?  It has other advantages too :)

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: SAD sometimes has double SAs.

2007-03-28 Thread Joy Latten
Last night I browsed the ipsec-tools code and saw places in
racoon where improvement would actually fix this
problem. I am working on a patch and will pursue this 
on the ipsec-tools mailing list.

I apologize for any inconvenience. 
Eric, sorry as I know you already patched lspp kernel
for testing. I strongly think this should be fixed
in userspace. 

The permission check before flushing does still
need to be added to kernel. 

Regards,
Joy
  
On Mon, 2007-03-26 at 19:04 -0600, Joy Latten wrote:
 On Mon, 2007-03-26 at 14:48 -0700, David Miller wrote:
  From: Eric Paris [EMAIL PROTECTED]
  Date: Mon, 26 Mar 2007 17:34:59 -0400
  
   I'm not at all able to speak on the correctness or validity of the
   solution,
  
  Neither am I yet :)
  
   but shouldn't the ipv6 case be a  not an || like the ipv4
   case?  Isn't this going to match all sorts of things?  Did you test this
   patch on ipv6 and see it to solve your problem?
   
   I'm also not enjoying the formatting in the ipv6 part where the first
   time you have the cast on the same time as the object but not the second
   part where x-props.saddr.a6 is on its own little line.
  
  Also, I want to understand what is going to tear down these
  other direction fake entries later on?  I think I can review
  this patch better if I understand that.
  
 I am going to refer to the other-direction-placeholder as the fake
 entry. And the larval SA that gets created for the new
 spi as result of a GETSPI message as the real entry. 
 
 The fake entry gets created when the real entry does and does not have
 an spi. It shares some of the same properties of a real larval SA
 (they are created using same code) and it's state
 is marked as XFRM_STATE_ACQ. The real entry has a timeout. So, should
 IKE negotiation fail, take too long, etc... it will eventually timeout
 and be deleted. So does the fake entry. It will timeout and should be
 eventually deleted. (I will test this part tomorrow for assurance.)
  
 When the IKE negotiations are successful, xfrm_state_add() and the
 xfrm_state_update() look for larval SAs in that they look for an SA with
 same src, dst, etc... and with state==XFRM_STATE_ACQUIRE. Any that are
 found are deleted and new SA added. This removes the real larval SA, and
 should also remove the fake entry too. 
 
 Of course, this is all based on my assumption that IKE will install
 two SAs, one for incoming and one for outgoing. 
 
 Hopefully this answers how fake entries will be removed. 
 Admittedly, I may miss something or didn't understand something
 correctly as I learn the code, so please let me know. 
 There may even be a better solution that I don't readily see. 
 
  As it stands, this looks to me like a workaround for an improperly
  implemented IPSEC daemon.  Joy states it as saying that the current
  code requires the keying daemon to manage it's SAs, and I wonder
  whether any other implementation is even valid.
  
 My big mouth. :-) But yes, I do think more SA management in 
 userspace would be ideal. 
 
 This fix will hopefully ensure kernel doesn't send any
 extra acquires regardless. 
 
 Joy
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET : secure sequence number functions can use nsec resolution instead of usec

2007-03-28 Thread Eric Dumazet
Hello David

We could use the nanosec resolution for various functions defined in 
drivers/char/random.c
(secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), 
secure_dccp_sequence_number())

I am not sure if it's a netdev related patch or core kernel, so I have CC 
Andrew.

Thank you

[PATCH] NET : random functions can use nsec resolution instead of usec

In order to get more randomness for secure_tcpv6_sequence_number(), 
secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can 
use the high resolution time services, providing nanosec resolution.

I've also done two kmalloc()/kzalloc() conversions.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

--- linux-2.6.21-rc5/drivers/char/random.c
+++ linux-2.6.21-rc5-ed/drivers/char/random.c
@@ -881,15 +881,15 @@ EXPORT_SYMBOL(get_random_bytes);
  */
 static void init_std_data(struct entropy_store *r)
 {
-   struct timeval tv;
+   ktime_t now;
unsigned long flags;
 
spin_lock_irqsave(r-lock, flags);
r-entropy_count = 0;
spin_unlock_irqrestore(r-lock, flags);
 
-   do_gettimeofday(tv);
-   add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4);
+   now = ktime_get_real();
+   add_entropy_words(r, (__u32 *)now, sizeof(now)/4);
add_entropy_words(r, (__u32 *)utsname(),
  sizeof(*(utsname()))/4);
 }
@@ -911,14 +911,12 @@ void rand_initialize_irq(int irq)
return;
 
/*
-* If kmalloc returns null, we just won't use that entropy
+* If kzalloc returns null, we just won't use that entropy
 * source.
 */
-   state = kmalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
-   if (state) {
-   memset(state, 0, sizeof(struct timer_rand_state));
+   state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
+   if (state)
irq_timer_state[irq] = state;
-   }
 }
 
 #ifdef CONFIG_BLOCK
@@ -927,14 +925,12 @@ void rand_initialize_disk(struct gendisk
struct timer_rand_state *state;
 
/*
-* If kmalloc returns null, we just won't use that entropy
+* If kzalloc returns null, we just won't use that entropy
 * source.
 */
-   state = kmalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
-   if (state) {
-   memset(state, 0, sizeof(struct timer_rand_state));
+   state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
+   if (state)
disk-random = state;
-   }
 }
 #endif
 
@@ -1469,7 +1465,6 @@ late_initcall(seqgen_init);
 __u32 secure_tcpv6_sequence_number(__be32 *saddr, __be32 *daddr,
   __be16 sport, __be16 dport)
 {
-   struct timeval tv;
__u32 seq;
__u32 hash[12];
struct keydata *keyptr = get_keyptr();
@@ -1485,8 +1480,7 @@ __u32 secure_tcpv6_sequence_number(__be3
seq = twothirdsMD4Transform((const __u32 *)daddr, hash)  HASH_MASK;
seq += keyptr-count;
 
-   do_gettimeofday(tv);
-   seq += tv.tv_usec + tv.tv_sec * 100;
+   seq += ktime_get_real().tv64;
 
return seq;
 }
@@ -1521,7 +1515,6 @@ __u32 secure_ip_id(__be32 daddr)
 __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 __be16 sport, __be16 dport)
 {
-   struct timeval tv;
__u32 seq;
__u32 hash[4];
struct keydata *keyptr = get_keyptr();
@@ -1543,12 +1536,11 @@ __u32 secure_tcp_sequence_number(__be32 
 *  As close as possible to RFC 793, which
 *  suggests using a 250 kHz clock.
 *  Further reading shows this assumes 2 Mb/s networks.
-*  For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+*  For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
 *  That's funny, Linux has one built in!  Use it!
 *  (Networks are faster now - should this be increased?)
 */
-   do_gettimeofday(tv);
-   seq += tv.tv_usec + tv.tv_sec * 100;
+   seq += ktime_get_real().tv64;
 #if 0
printk(init_seq(%lx, %lx, %d, %d) = %d\n,
   saddr, daddr, sport, dport, seq);
@@ -1598,7 +1590,6 @@ u32 secure_ipv6_port_ephemeral(const __b
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
__be16 sport, __be16 dport)
 {
-   struct timeval tv;
u64 seq;
__u32 hash[4];
struct keydata *keyptr = get_keyptr();
@@ -1611,8 +1602,7 @@ u64 secure_dccp_sequence_number(__be32 s
seq = half_md4_transform(hash, keyptr-secret);
seq |= ((u64)keyptr-count)  (32 - HASH_BITS);
 
-   do_gettimeofday(tv);
-   seq += tv.tv_usec + tv.tv_sec * 100;
+   seq += ktime_get_real().tv64;
seq = (1ull  48) - 1;
 #if 0
printk(dccp init_seq(%lx, %lx, %d, %d) = %d\n,
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More 

Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications

2007-03-28 Thread Thomas Graf
* Jarek Poplawski [EMAIL PROTECTED] 2007-03-28 13:19
 I hope I'm wrong, but isn't this at the cost of admins
 working with long rules' sets, which (probably) take extra
 time now?

That's right, it makes the insert and delete operation more
expensive.

A compromise would be to delay the flushing and wait for
some time (default 2 seconds) whether more rules or routes
are being added before flushing.

[NET] fib_rules: delay route cache flush by ip_rt_min_delay

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: linux/net-2.6.22/net/decnet/dn_rules.c
===
--- linux.orig/net-2.6.22/net/decnet/dn_rules.c 2007-03-28 17:41:22.0 
+0200
+++ linux/net-2.6.22/net/decnet/dn_rules.c  2007-03-28 17:41:39.0 
+0200
@@ -242,7 +242,7 @@ static u32 dn_fib_rule_default_pref(void
 
 static void dn_fib_rule_flush_cache(void)
 {
-   dn_rt_cache_flush(0);
+   dn_rt_cache_flush(-1);
 }
 
 static struct fib_rules_ops dn_fib_rules_ops = {
Index: linux/net-2.6.22/net/ipv4/fib_rules.c
===
--- linux.orig/net-2.6.22/net/ipv4/fib_rules.c  2007-03-28 17:41:18.0 
+0200
+++ linux/net-2.6.22/net/ipv4/fib_rules.c   2007-03-28 17:41:30.0 
+0200
@@ -300,7 +300,7 @@ static size_t fib4_rule_nlmsg_payload(st
 
 static void fib4_rule_flush_cache(void)
 {
-   rt_cache_flush(0);
+   rt_cache_flush(-1);
 }
 
 static struct fib_rules_ops fib4_rules_ops = {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inline net_device_stats

2007-03-28 Thread Stephen Hemminger

Rusty Russell wrote:

Hi all,

Does something like this make sense for future drivers?

Cheers,
Rusty.
===
Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them.  It would be nice if
this were done by default.

1) Add a new stats field to struct net_device.
2) Add a new feature field to say this driver uses the internal one
3) Have a default get_stats which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
   -get_stats is set.

This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.

Lightly tested: works for a modified lguest network driver.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 0 files changed

diff -r 1ccab0a087b7 arch/s390/appldata/appldata_net_sum.c
--- a/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 13:46:10 2007 +1000
+++ b/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 14:28:47 2007 +1000
@@ -108,10 +108,10 @@ static void appldata_get_net_sum_data(vo
collisions = 0;
read_lock(dev_base_lock);
for (dev = dev_base; dev != NULL; dev = dev-next) {
-   if (dev-get_stats == NULL) {
+   stats = dev-get_stats(dev);
+   if (stats == NULL) {
continue;
}
-   stats = dev-get_stats(dev);
rx_packets += stats-rx_packets;
tx_packets += stats-tx_packets;
rx_bytes   += stats-rx_bytes;
diff -r 1ccab0a087b7 drivers/net/bonding/bond_main.c
--- a/drivers/net/bonding/bond_main.c   Tue Mar 27 13:46:10 2007 +1000
+++ b/drivers/net/bonding/bond_main.c   Tue Mar 27 14:29:08 2007 +1000
@@ -3621,9 +3621,8 @@ static struct net_device_stats *bond_get
read_lock_bh(bond-lock);
 
 	bond_for_each_slave(bond, slave, i) {

-   if (slave-dev-get_stats) {
-   sstats = slave-dev-get_stats(slave-dev);
-
+   sstats = slave-dev-get_stats(slave-dev);
+   if (sstats) {
stats-rx_packets += sstats-rx_packets;
stats-rx_bytes += sstats-rx_bytes;
stats-rx_errors += sstats-rx_errors;
diff -r 1ccab0a087b7 drivers/parisc/led.c
--- a/drivers/parisc/led.c  Tue Mar 27 13:46:10 2007 +1000
+++ b/drivers/parisc/led.c  Tue Mar 27 14:29:17 2007 +1000
@@ -372,9 +372,9 @@ static __inline__ int led_get_net_activi
continue;
if (LOOPBACK(in_dev-ifa_list-ifa_local))
continue;
-	if (!dev-get_stats) 
+	stats = dev-get_stats(dev);
+	if (!stats) 
 		continue;

-   stats = dev-get_stats(dev);
rx_total += stats-rx_packets;
tx_total += stats-tx_packets;
}
diff -r 1ccab0a087b7 include/linux/netdevice.h
--- a/include/linux/netdevice.h Tue Mar 27 13:46:10 2007 +1000
+++ b/include/linux/netdevice.h Tue Mar 27 14:21:09 2007 +1000
@@ -325,6 +325,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#define NETIF_F_INTERNAL_STATS 8192/* Use stats structure in net_device */
 
 	/* Segmentation offload features */

 #define NETIF_F_GSO_SHIFT  16
@@ -349,6 +350,7 @@ struct net_device
 
 
 	struct net_device_stats* (*get_stats)(struct net_device *dev);

+   struct net_device_stats stats;
 
 #ifdef CONFIG_WIRELESS_EXT

/* List of functions to handle Wireless Extensions (instead of ioctl).
diff -r 1ccab0a087b7 net/core/dev.c
--- a/net/core/dev.cTue Mar 27 13:46:10 2007 +1000
+++ b/net/core/dev.cTue Mar 27 14:30:05 2007 +1000
@@ -825,7 +825,6 @@ static int default_rebuild_header(struct
return 1;
 }
 
-

 /**
  * dev_open- prepare an interface for use.
  * @dev:   device to open
@@ -2120,9 +2119,9 @@ void dev_seq_stop(struct seq_file *seq, 
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)

 {
-   if (dev-get_stats) {
-   struct net_device_stats *stats = dev-get_stats(dev);
-
+   struct net_device_stats *stats = dev-get_stats(dev);
+
+   if (stats) {
seq_printf(seq, %6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu 
%8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n,
   dev-name, stats-rx_bytes, stats-rx_packets,
@@ -3146,6 +3145,13 @@ out:
mutex_unlock(net_todo_run_mutex);
 }
 
+static struct net_device_stats *maybe_internal_stats(struct net_device *dev)

+{
+   if (dev-features  NETIF_F_INTERNAL_STATS)
+   return dev-stats;
+   return NULL;
+}
+
 /**
  * alloc_netdev - allocate network device
  * @sizeof_priv:   size of private data to allocate space for
@@ -3181,6 +3187,7 @@ struct 

LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).

2007-03-28 Thread James Morris
On Wed, 28 Mar 2007, Joy Latten wrote:

 Eric, sorry as I know you already patched lspp kernel
 for testing.

I think it'd be better to have the lspp kernel join the upstream workflow 
process, rather than being a shortcut into RHEL.

Please consider creating an lspp git tree (based off Linus' tree), then 
once patches there are tested and ready to submit upstream, post them here 
or selinux-list, where they can be reviewed and applied to either my or 
DaveM's git tree.

From there, they'll be picked up in -mm for even wider testing then be 
merged into mainline as appropriate.  Then, they can be incorporated into 
distro devel kernels when they update their kernels, or backported to 
stable distro kernels as already reviewed  tested upstream patches.

If there are any objections, please respond.


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : secure sequence number functions can use nsec resolution instead of usec

2007-03-28 Thread James Morris
On Wed, 28 Mar 2007, Eric Dumazet wrote:

 Hello David
 
 We could use the nanosec resolution for various functions defined in 
 drivers/char/random.c
 (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), 
 secure_dccp_sequence_number())
 
 I am not sure if it's a netdev related patch or core kernel, so I have CC 
 Andrew.
 
 Thank you
 
 [PATCH] NET : random functions can use nsec resolution instead of usec
 
 In order to get more randomness for secure_tcpv6_sequence_number(), 
 secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can 
 use the high resolution time services, providing nanosec resolution.
 
 I've also done two kmalloc()/kzalloc() conversions.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

Looks good to me.

Acked-by: James Morris [EMAIL PROTECTED]



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).

2007-03-28 Thread Paul Moore
On Wednesday, March 28 2007 12:20:24 pm James Morris wrote:
 On Wed, 28 Mar 2007, Joy Latten wrote:
  Eric, sorry as I know you already patched lspp kernel
  for testing.

 I think it'd be better to have the lspp kernel join the upstream workflow
 process, rather than being a shortcut into RHEL.

 Please consider creating an lspp git tree (based off Linus' tree), then
 once patches there are tested and ready to submit upstream, post them here
 or selinux-list, where they can be reviewed and applied to either my or
 DaveM's git tree.

 From there, they'll be picked up in -mm for even wider testing then be
 merged into mainline as appropriate.  Then, they can be incorporated into
 distro devel kernels when they update their kernels, or backported to
 stable distro kernels as already reviewed  tested upstream patches.

 If there are any objections, please respond.

I think the original intent of the LSPP kernel series was to test patches 
before they were submitted to a wider audience (not too different from what 
you are describing).  Eric Paris became the LSPP/MLS group's Andrew Morton if 
you will :)

However, for whatever reason, things appear to have stumbled a bit in recent 
months and I think making an effort to move to a more standard approach based 
on current kernel development would be a step in the right direction.  This 
would probably make backports a bit more difficult but Eric's a smart guy and 
I'm sure he wouldn't mind :)

Does anyone have access to a public site we could use to host a git tree?  If 
no one has anything available (or is willing to maintain the tree) I might be 
able to do something.

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : secure sequence number functions can use nsec resolution instead of usec

2007-03-28 Thread Andi Kleen
On Wed, Mar 28, 2007 at 05:43:22PM +0200, Eric Dumazet wrote:
 Hello David
 
 We could use the nanosec resolution for various functions defined in 
 drivers/char/random.c
 (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), 
 secure_dccp_sequence_number())
 
 I am not sure if it's a netdev related patch or core kernel, so I have CC 
 Andrew.
 
 Thank you
 
 [PATCH] NET : random functions can use nsec resolution instead of usec
 
 In order to get more randomness for secure_tcpv6_sequence_number(), 
 secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can 
 use the high resolution time services, providing nanosec resolution.

It's also a little faster because it avoids one division.

You didn't mention the initial seed change.
There you could have removed the useless utsname initialization too.


 
 I've also done two kmalloc()/kzalloc() conversions.


Normally that should be separate patches

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).

2007-03-28 Thread Eric Paris
On Wed, 2007-03-28 at 12:20 -0400, James Morris wrote:
 On Wed, 28 Mar 2007, Joy Latten wrote:
 
  Eric, sorry as I know you already patched lspp kernel
  for testing.
 
 I think it'd be better to have the lspp kernel join the upstream workflow 
 process, rather than being a shortcut into RHEL.
 
 Please consider creating an lspp git tree (based off Linus' tree), then 
 once patches there are tested and ready to submit upstream, post them here 
 or selinux-list, where they can be reviewed and applied to either my or 
 DaveM's git tree.
 
 From there, they'll be picked up in -mm for even wider testing then be 
 merged into mainline as appropriate.  Then, they can be incorporated into 
 distro devel kernels when they update their kernels, or backported to 
 stable distro kernels as already reviewed  tested upstream patches.
 
 If there are any objections, please respond.

It is definitely NOT a shortcut into RHEL.  Nor is this government cert
effort (LSPP) being driven primary on RHEL code.  Not a single patch
will go into RHEL until it is upstream or in a tree to go upstream.
That is a given.  All development is being done upstream and then being
ported back to RHEL.  The LSPP kernel she mentioned is at this time
merely a testing ground for patches which may not quite be upstream
ready or are upstream but aren't in RHEL proper yet.  As it stands now
the LSPP kernel is carrying 22 patches on top of RHEL 5 GA (which is
2.6.18 based)  of those let me give you a breakdown.

12 are network related.
10 of those are in Linus's kernel
1 is not yet in miller's tree but i would expect it soon
1 is going to likely be dropped according to this thread

10 remaining patches are audit patches.

There is already a viro/audit-current.git tree on kernel.org where these
should be appearing.  I could make this a little easier for the audit
tree maintainer and make my own tree which he could pull from and then
push to Linus but a tree which should hold all of these does exist.  All
of them have been sent to the linux-audit mailing list and have been
commented on there.

I don't want to give the impression that upstream is not coming first.
All the work is being done upstream either on netdev or linux-audit and
then I pull it back into this LSPP kernel she talked about so that
people interested primarily in the testing necessary to meet that
particular government standard have a neat tidy little prebuild rpm to
work with.  Eventually all of these will show up in RHEL, but not until
all of the patches i'm dealing with are upstream.

-Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).

2007-03-28 Thread James Morris
On Wed, 28 Mar 2007, Eric Paris wrote:

 It is definitely NOT a shortcut into RHEL.

Ok, that was a poor choice of words on my part.

 I don't want to give the impression that upstream is not coming first.
 All the work is being done upstream either on netdev or linux-audit and
 then I pull it back into this LSPP kernel she talked about so that
 people interested primarily in the testing necessary to meet that
 particular government standard have a neat tidy little prebuild rpm to
 work with.  Eventually all of these will show up in RHEL, but not until
 all of the patches i'm dealing with are upstream.

It seems my understanding wasn't clear on the overall workflow.  If the 
consensus is to stay with this scheme, then please disregard my previous 
post.


-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


another critical bug ?

2007-03-28 Thread Denys
Something more

With all kernel debug enabled it was not giving this panic (maybe cause 
system becomes too slow).

vanilla kernel 2.6.20.3 with htb patch applied, ethernet cards RTL8139

If u need anythign more - inform me. 

Not sure it is iproute2, but if you can, just point me to right direction, to 
who i need to report, it is also happening on interface flood, when i bring 
it down:

(some data not accepted by kernel maillist, changed to *)

Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: Session 1 closed for client 
00:16:ec:7e:47:ea (172.16.102.2) on eth1
Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: Sent PADT
Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: PADT: Generic-Error: **
Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: PADT: Generic-Error: 
Received PADT from peer
Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: PADT: Generic-Error: 
*
Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: Sent PADT
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148236] BUG: unable to handle kernel 
paging request
Mar 28 20:13:29 OFFICE-PPPOE at virtual address 5b5a596c
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148497]  printing eip:
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148625] *pde = 
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148743] Oops:  [#1]
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148798]
Mar 28 20:13:29 OFFICE-PPPOE SMP
Mar 28 20:13:29 OFFICE-PPPOE
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148992] Modules linked in:
Mar 28 20:13:29 OFFICE-PPPOE netconsole
Mar 28 20:13:29 OFFICE-PPPOE xt_mac
Mar 28 20:13:29 OFFICE-PPPOE xt_tcpmss
Mar 28 20:13:29 OFFICE-PPPOE ipt_TCPMSS
Mar 28 20:13:29 OFFICE-PPPOE ipt_REJECT
Mar 28 20:13:29 OFFICE-PPPOE ts_bm
Mar 28 20:13:29 OFFICE-PPPOE xt_string
Mar 28 20:13:29 OFFICE-PPPOE ipt_ttl
Mar 28 20:13:29 OFFICE-PPPOE ifb
Mar 28 20:13:29 OFFICE-PPPOE iptable_mangle
Mar 28 20:13:29 OFFICE-PPPOE xt_MARK
Mar 28 20:13:29 OFFICE-PPPOE xt_mark
Mar 28 20:13:29 OFFICE-PPPOE pppoe
Mar 28 20:13:29 OFFICE-PPPOE pppox
Mar 28 20:13:29 OFFICE-PPPOE ppp_generic
Mar 28 20:13:29 OFFICE-PPPOE slhc
Mar 28 20:13:29 OFFICE-PPPOE xt_tcpudp
Mar 28 20:13:29 OFFICE-PPPOE em_nbyte
Mar 28 20:13:29 OFFICE-PPPOE cls_tcindex
Mar 28 20:13:29 OFFICE-PPPOE act_gact
Mar 28 20:13:29 OFFICE-PPPOE cls_rsvp
Mar 28 20:13:29 OFFICE-PPPOE sch_htb
Mar 28 20:13:29 OFFICE-PPPOE cls_fw
Mar 28 20:13:29 OFFICE-PPPOE act_mirred
Mar 28 20:13:29 OFFICE-PPPOE em_u32
Mar 28 20:13:29 OFFICE-PPPOE sch_red
Mar 28 20:13:29 OFFICE-PPPOE sch_sfq
Mar 28 20:13:29 OFFICE-PPPOE sch_tbf
Mar 28 20:13:29 OFFICE-PPPOE sch_teql
Mar 28 20:13:29 OFFICE-PPPOE cls_basic
Mar 28 20:13:29 OFFICE-PPPOE sch_gred
Mar 28 20:13:29 OFFICE-PPPOE act_pedit
Mar 28 20:13:29 OFFICE-PPPOE sch_hfsc
Mar 28 20:13:29 OFFICE-PPPOE cls_rsvp6
Mar 28 20:13:29 OFFICE-PPPOE sch_ingress
Mar 28 20:13:29 OFFICE-PPPOE em_meta
Mar 28 20:13:29 OFFICE-PPPOE em_text
Mar 28 20:13:29 OFFICE-PPPOE act_ipt
Mar 28 20:13:29 OFFICE-PPPOE sch_dsmark
Mar 28 20:13:29 OFFICE-PPPOE sch_prio
Mar 28 20:13:29 OFFICE-PPPOE sch_netem
Mar 28 20:13:29 OFFICE-PPPOE act_simple
Mar 28 20:13:29 OFFICE-PPPOE cls_u32
Mar 28 20:13:29 OFFICE-PPPOE em_cmp
Mar 28 20:13:29 OFFICE-PPPOE sch_cbq
Mar 28 20:13:29 OFFICE-PPPOE cls_route
Mar 28 20:13:29 OFFICE-PPPOE iptable_nat
Mar 28 20:13:29 OFFICE-PPPOE nf_conntrack_ipv4
Mar 28 20:13:29 OFFICE-PPPOE ipt_LOG
Mar 28 20:13:29 OFFICE-PPPOE ipt_MASQUERADE
Mar 28 20:13:29 OFFICE-PPPOE ipt_REDIRECT
Mar 28 20:13:29 OFFICE-PPPOE nf_nat
Mar 28 20:13:29 OFFICE-PPPOE nf_conntrack
Mar 28 20:13:29 OFFICE-PPPOE nfnetlink
Mar 28 20:13:29 OFFICE-PPPOE iptable_filter
Mar 28 20:13:29 OFFICE-PPPOE ip_tables
Mar 28 20:13:29 OFFICE-PPPOE x_tables
Mar 28 20:13:29 OFFICE-PPPOE 8021q
Mar 28 20:13:29 OFFICE-PPPOE tun
Mar 28 20:13:29 OFFICE-PPPOE via_velocity
Mar 28 20:13:29 OFFICE-PPPOE via_rhine
Mar 28 20:13:29 OFFICE-PPPOE sis900
Mar 28 20:13:29 OFFICE-PPPOE ne2k_pci
Mar 28 20:13:29 OFFICE-PPPOE 8390
Mar 28 20:13:29 OFFICE-PPPOE skge
Mar 28 20:13:29 OFFICE-PPPOE tg3
Mar 28 20:13:29 OFFICE-PPPOE 8139too
Mar 28 20:13:29 OFFICE-PPPOE e1000
Mar 28 20:13:29 OFFICE-PPPOE e100
Mar 28 20:13:29 OFFICE-PPPOE block2mtd
Mar 28 20:13:29 OFFICE-PPPOE usb_storage
Mar 28 20:13:29 OFFICE-PPPOE mtdblock
Mar 28 20:13:29 OFFICE-PPPOE mtd_blkdevs
Mar 28 20:13:29 OFFICE-PPPOE usbhid
Mar 28 20:13:29 OFFICE-PPPOE uhci_hcd
Mar 28 20:13:29 OFFICE-PPPOE ehci_hcd
Mar 28 20:13:29 OFFICE-PPPOE ohci_hcd
Mar 28 20:13:29 OFFICE-PPPOE usbcore
Mar 28 20:13:29 OFFICE-PPPOE
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153713] CPU:0
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153716] EIP:0060:[c02113c7]Not 
tainted VLI
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153718] EFLAGS: 00010202   (2.6.20.3-
build-0005 #18)
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153949] EIP is at netif_rx+0x18/0x126
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.154009] eax: c0c42800   ebx: 5b5a5958   
ecx: 0001   edx: c6541c80
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.154073] esi: c0319800   edi: c6541c80   
ebp: c02f5f14   esp: c02f5f04
Mar 28 20:13:29 OFFICE-PPPOE [ 1758.154135] 

Re: L2 network namespace benchmarking

2007-03-28 Thread Rick Jones

If I read the results right it took a 32bit machine from AMD with
a gigabit interface before you could measure a throughput difference.
That isn't shabby for a non-optimized code path.


Just some paranoid ramblings - one needs to look beyond just whether or 
not the performance of a bulk transfer test (eg TCP_STREAM) remains able 
to hit link-rate.  One has to also consider the change in service demand 
(the normalization of CPU util and throughput).  Also, with 
functionality like TSO in place, the ability to pass very large things 
down the stack can help cover for a multitude of path-length sins.  And 
with either multiple 1G or 10G NICs becoming more and more prevalent, we 
have another one of those NIC speed vs CPU speed switch-overs, so 
maintaining single-NIC 1 gigabit throughput, while necessary, isn't 
(IMO) sufficient.


S, it becomes very important to go beyond just TCP_STREAM tests when 
evaluating these sorts of things.  Another test to run would be the 
TCP_RR test.  TCP_RR with single-byte request/response sizes will 
bypass the TSO stuff, and the transaction rate will be more directly 
affected by the change in path length than a TCP_STREAM test.  It will 
also show-up quite clearly in the service demand.  Now, with NICs doing 
interrupt coalescing, if the NIC is strapped poorly (IMO) then you may 
not see a change in transaction rate - it may be getting limited 
artifically by the NIC's interrupt coalescing.  So, one has to fall-back 
on service demand, or better yet, disable the interrupt coalescing.


Otherwise, measuring peak aggregate request/response becomes necessary.


rick jones
don't be blinded by bit-rate
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications

2007-03-28 Thread David Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 17:49:03 +0200

 * Jarek Poplawski [EMAIL PROTECTED] 2007-03-28 13:19
  I hope I'm wrong, but isn't this at the cost of admins
  working with long rules' sets, which (probably) take extra
  time now?
 
 That's right, it makes the insert and delete operation more
 expensive.
 
 A compromise would be to delay the flushing and wait for
 some time (default 2 seconds) whether more rules or routes
 are being added before flushing.

Another idea Thomas and I tossed around was to have some kind of way
for the rule insertion to indicate that the flush should be deferred
and I kind of prefer that explicitness.

By default it's better the flush immediately, because the old
behavior is totally unexpected.  I insert a rule and it dosn't
show up?, nobody expects that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Xen netfront fixes for changed skbuff in net-2.6.22.git

2007-03-28 Thread Jeremy Fitzhardinge
Hi Herbert,

I wonder if you've got a chance to look at netfront in light of the new
stuff in davem's network tree  (the stuff that's in
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.22.git).

In particular, struct sk_buff has been changed so that nh has gone,
and the replacement can be just an offset rather than a full pointer. 
This breaks the netfront because it tries to stash a page * in nh.raw. 
I had a quick look at it and couldn't see an easy fix, but I don't
really understand what's going on in there.

But you do.  Any chance you could have a look at it, and at least give
me some pointers about how to proceed?

Thanks,
J
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: 28 Mar 2007 16:14:17 +0200

 Evgeniy Polyakov [EMAIL PROTECTED] writes:
  
  Jenkins hash is far from being simple to crack, although with some
  knowledge it can be done faster.
 
 TCP tends to be initialized early before there is anything
 good in the entropy pool.

Andi, you're being an idiot.

You are spewing endless and uninformed bullshit about this secure hash
topic, and it must stop now!

You obviously didn't even read my patch, because if you did you would
have seen that I don't initialize the random seed until MUCH MUCH
later _EXACTLY_ to deal with this issue.

In my patch the random seed is initialized when the first TCP or DCCP
socket is created, at which point we'll have sufficient entropy.

So please stop talking such nonsense about this topic.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626)

2007-03-28 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 16:24:40 +0300 (EEST)

 It seems I'm being guilty to this one, Dave please apply to net-2.6.22 
 (besides this I think the tcp_sync_left_out should be changed but I'll 
 prepare a patch for that later). Btw, how should this kind of email with 
 some non-patch description+patch be formatted?).

Thanks for figuring out the problem so quickly, this formatting
is fine.

 [PATCH] [TCP]: Timedout loop must skip SACKed skbs too while marking
 
 Marking skb with both S and L is invalid, and that could easily
 happen in the timedout loop. Later on the tcp_sync_left_out
 reduces sacked_out if lost_out + sacked_out  packets_out and
 then eventually sacked_out underflows triggering a debug trap in
 tcp_clean_rtx_queue.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Patch applied, thanks a lot!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it

2007-03-28 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 16:31:50 +0300 (EEST)

 Stupid error from my side. Even though now that I noticed this,
 I hoped it would have been an optimization but no, the counter
 hint is then incorrect. Thus clearing is necessary for now (I
 still suspect though that this path is never executed).
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Better safe than sorry :-)

We can start putting more aggressive assertions around if you'd
like to get some invariants like that validated.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).

2007-03-28 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 12:36:57 -0400

 Does anyone have access to a public site we could use to host a git
 tree?  If no one has anything available (or is willing to maintain
 the tree) I might be able to do something.

It's not difficult to get an account on master.kernel.org
and also there has been some success with infradead.org as
well.

James or someone else could help you get going, and he's
gone through this process already :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: SAD sometimes has double SAs.

2007-03-28 Thread David Miller
From: Joy Latten [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 09:15:15 -0600

 Last night I browsed the ipsec-tools code and saw places in
 racoon where improvement would actually fix this
 problem. I am working on a patch and will pursue this 
 on the ipsec-tools mailing list.
 
 I apologize for any inconvenience. 

No problem, thanks for the update.

 The permission check before flushing does still
 need to be added to kernel. 

Yep, I'll take care of integrating that patch.

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it

2007-03-28 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 12:07:09 -0700 (PDT)

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Wed, 28 Mar 2007 16:31:50 +0300 (EEST)
 
  Stupid error from my side. Even though now that I noticed this,
  I hoped it would have been an optimization but no, the counter
  hint is then incorrect. Thus clearing is necessary for now (I
  still suspect though that this path is never executed).
  
  Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
 
 Better safe than sorry :-)
 
 We can start putting more aggressive assertions around if you'd
 like to get some invariants like that validated.

In case it's not clear I did apply this patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it

2007-03-28 Thread Ilpo Järvinen
On Wed, 28 Mar 2007, David Miller wrote:

 From: David Miller [EMAIL PROTECTED]
 Date: Wed, 28 Mar 2007 12:07:09 -0700 (PDT)
 
  From: Ilpo_Järvinen [EMAIL PROTECTED]
  Date: Wed, 28 Mar 2007 16:31:50 +0300 (EEST)
  
   Stupid error from my side. Even though now that I noticed this,
   I hoped it would have been an optimization but no, the counter
   hint is then incorrect. Thus clearing is necessary for now (I
   still suspect though that this path is never executed).
   
   Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
  
  Better safe than sorry :-)
  
  We can start putting more aggressive assertions around if you'd
  like to get some invariants like that validated.
 
 In case it's not clear I did apply this patch.

I think more this on Friday, maybe WARN_ON could be placed there so that 
no harm is being done if it ever get there, probably a candidate for 
unlikely, if this is really needed. Anyway, applying the NULL this patch 
does no harm (it was supposed to be that way right from the
beginning)... :-)

...but lets keep in mind that the actual goal is, of course, to get rid of 
the hint altogether, rather than doing these clearing things... :-)

-- 
 i.

Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications

2007-03-28 Thread Thomas Graf
* David Miller [EMAIL PROTECTED] 2007-03-28 11:24
 Another idea Thomas and I tossed around was to have some kind of way
 for the rule insertion to indicate that the flush should be deferred
 and I kind of prefer that explicitness.

Right, although I believe the flag should not only defer it
but not flush at all. This would be the optimal solution
for scripts which can do a ip ro flush cache as they know
what they're doing.

 By default it's better the flush immediately, because the old
 behavior is totally unexpected.  I insert a rule and it dosn't
 show up?, nobody expects that.

It's a tough call, I'd favour immediate flush as well but I can
see the point in delaying by ip_rt_min_delay which can be
configured by the user. So people can choose to immediately flush
by setting it to 0. It would also be consistent to the flush
after route changes, the same delay is used there.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it

2007-03-28 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 22:29:05 +0300 (EEST)

 ...but lets keep in mind that the actual goal is, of course, to get rid of 
 the hint altogether, rather than doing these clearing things... :-)

Of course.

The retranmit and forward SKB hints should be easy to kill.
In the worst case we can use the cached SACK sequence numbers
and perhaps one auxiliary sequence number hint to guide the
RB tree search for SKBs to retransmit.

A space intensive, and therefore not very appealing, scheme
is to have a linked list of SKBs which have been marked lost.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications

2007-03-28 Thread David Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 21:34:36 +0200

 So people can choose to immediately flush by setting it to 0. It
 would also be consistent to the flush after route changes, the same
 delay is used there.

That's a good point I hadn't considered.

Therefore, I think I'll apply your patch, considering that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: L2 network namespace benchmarking

2007-03-28 Thread Daniel Lezcano

Rick Jones wrote:

If I read the results right it took a 32bit machine from AMD with
a gigabit interface before you could measure a throughput difference.
That isn't shabby for a non-optimized code path.


Just some paranoid ramblings - one needs to look beyond just whether 
or not the performance of a bulk transfer test (eg TCP_STREAM) remains 
able to hit link-rate.  One has to also consider the change in service 
demand (the normalization of CPU util and throughput).  Also, with 
functionality like TSO in place, the ability to pass very large things 
down the stack can help cover for a multitude of path-length sins.  
And with either multiple 1G or 10G NICs becoming more and more 
prevalent, we have another one of those NIC speed vs CPU speed 
switch-overs, so maintaining single-NIC 1 gigabit throughput, while 
necessary, isn't (IMO) sufficient.


S, it becomes very important to go beyond just TCP_STREAM tests 
when evaluating these sorts of things.  Another test to run would be 
the TCP_RR test.  TCP_RR with single-byte request/response sizes will 
bypass the TSO stuff, and the transaction rate will be more directly 
affected by the change in path length than a TCP_STREAM test.  It will 
also show-up quite clearly in the service demand.  Now, with NICs 
doing interrupt coalescing, if the NIC is strapped poorly (IMO) then 
you may not see a change in transaction rate - it may be getting 
limited artifically by the NIC's interrupt coalescing.  So, one has to 
fall-back on service demand, or better yet, disable the interrupt 
coalescing.


Otherwise, measuring peak aggregate request/response becomes necessary.


rick jones
don't be blinded by bit-rate

Thanks Rick,

Do you have any pointer to help on benchmarking the network, perhaps a 
checklist or some scripts for netperf ?


Regards.
   -- Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: L2 network namespace benchmarking

2007-03-28 Thread Rick Jones
Do you have any pointer to help on benchmarking the network, perhaps a 
checklist or some scripts for netperf ?


There are some scripts in doc/examples but they are probably a bit long 
in the tooth by now.


The main writeup _I_ have on netperf would be the manual, which was 
recently updated for the 2.4.3 release.


http://www.netperf.org/svn/netperf2/tags/netperf-2.4.3/doc/netperf.html

or the current top of trunk:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html

There is also a [EMAIL PROTECTED] mailing list which one can join 
and have discussions about netperf, and a [EMAIL PROTECTED] if one 
wants to discuss actual netperf (netperf2 or netperf4) development.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen
 In my patch the random seed is initialized when the first TCP or DCCP
 socket is created, at which point we'll have sufficient entropy.

See my discussion of this case in a later mail to Evgeniy 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BNX2]: Fix link interrupt problem.

2007-03-28 Thread Michael Chan
[BNX2]: Fix link interrupt problem.

bnx2_has_work()'s logic is flawed and can cause the driver to miss
a link event.  The fix is to compare the status block's attn_bits
and attn_bits_ack to determine if there is a link event.

Update version to 1.5.6.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index c12e5ea..d43fe28 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -54,8 +54,8 @@
 
 #define DRV_MODULE_NAMEbnx2
 #define PFX DRV_MODULE_NAME: 
-#define DRV_MODULE_VERSION 1.5.5
-#define DRV_MODULE_RELDATE February 1, 2007
+#define DRV_MODULE_VERSION 1.5.6
+#define DRV_MODULE_RELDATE March 28, 2007
 
 #define RUN_AT(x) (jiffies + (x))
 
@@ -2033,8 +2033,8 @@ bnx2_has_work(struct bnx2 *bp)
(sblk-status_tx_quick_consumer_index0 != bp-hw_tx_cons))
return 1;
 
-   if (((sblk-status_attn_bits  STATUS_ATTN_BITS_LINK_STATE) != 0) !=
-   bp-link_up)
+   if ((sblk-status_attn_bits  STATUS_ATTN_BITS_LINK_STATE) !=
+   (sblk-status_attn_bits_ack  STATUS_ATTN_BITS_LINK_STATE))
return 1;
 
return 0;


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BNX2]: Fix link interrupt problem.

2007-03-28 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 13:57:18 -0800

 [BNX2]: Fix link interrupt problem.
 
 bnx2_has_work()'s logic is flawed and can cause the driver to miss
 a link event.  The fix is to compare the status block's attn_bits
 and attn_bits_ack to determine if there is a link event.
 
 Update version to 1.5.6.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Applied, thanks Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : secure sequence number functions can use nsec resolution instead of usec

2007-03-28 Thread David Miller
From: James Morris [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 12:31:56 -0400 (EDT)

 On Wed, 28 Mar 2007, Eric Dumazet wrote:
 
  Hello David
  
  We could use the nanosec resolution for various functions defined in 
  drivers/char/random.c
  (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), 
  secure_dccp_sequence_number())
  
  I am not sure if it's a netdev related patch or core kernel, so I have CC 
  Andrew.
  
  Thank you
  
  [PATCH] NET : random functions can use nsec resolution instead of usec
  
  In order to get more randomness for secure_tcpv6_sequence_number(), 
  secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we 
  can use the high resolution time services, providing nanosec resolution.
  
  I've also done two kmalloc()/kzalloc() conversions.
  
  Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
 
 Looks good to me.
 
 Acked-by: James Morris [EMAIL PROTECTED]

To me too, patch applied, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inline net_device_stats

2007-03-28 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 08:52:24 -0700

 It would make sense to do it per-cpu and 64 bit for the non-error
 counters.

Good point, but that's a seperate change.

I have no real objection to Rusty's change and if more than
one driver uses this thing it's useful, so I'll apply his
patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Xen netfront fixes for changed skbuff in net-2.6.22.git

2007-03-28 Thread Herbert Xu
Hi Jeremy:

On Wed, Mar 28, 2007 at 11:36:17AM -0700, Jeremy Fitzhardinge wrote:
 
 I wonder if you've got a chance to look at netfront in light of the new
 stuff in davem's network tree  (the stuff that's in
 http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.22.git).

I've had a look at now and you can just stuff it into one of the other
pointers that's still there.  We just need to make sure that it is
reset properly before we feed the packet into the stack.  The pointer
skb-dev is one candidate but there are plenty of others.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: Xen netfront fixes for changed skbuff in net-2.6.22.git

2007-03-28 Thread Jeremy Fitzhardinge
Herbert Xu wrote:
 I've had a look at now and you can just stuff it into one of the other
 pointers that's still there.  We just need to make sure that it is
 reset properly before we feed the packet into the stack.  The pointer
 skb-dev is one candidate but there are plenty of others.
   

Hm, I was wondering if there's a nicer way of getting the same result. 
Does it need to be done that way?

Thanks,
J
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netpoll question

2007-03-28 Thread Steve Wise
Hey all,

I have netpoll question.  How does netpoll work with MSI/X, NAPI, and
nics that setup multiple RSS style receive queues for a single port?
From what I can tell, if you're doing something like netdump using
netpoll for IO, then you might never process incoming packets that get
posted to the rx queues not associated with the main netdevice structure
because netpoll only calls the poll() function for the main netdev
struct.  Not the dummy netdevs setup for multiple rx queues.  

Is this the case or am I confused?

Thanks,


Steve.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: Xen netfront fixes for changed skbuff in net-2.6.22.git

2007-03-28 Thread Herbert Xu
On Wed, Mar 28, 2007 at 02:55:56PM -0700, Jeremy Fitzhardinge wrote:
 
 Hm, I was wondering if there's a nicer way of getting the same result. 
 Does it need to be done that way?

Actually you could use the skb-cb buffer which can store anything and
doesn't need to be cleaned up.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sis190: new PHY support

2007-03-28 Thread Francois Romieu
Reported to work on the WinFast 761GXK8MB-RS motherboard.

Plain 10/100 Mbps.

Signed-off-by: Paul Gibbons [EMAIL PROTECTED]
Signed-off-by: Francois Romieu [EMAIL PROTECTED]
---
 drivers/net/sis190.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c
index b08508b..34463ce 100644
--- a/drivers/net/sis190.c
+++ b/drivers/net/sis190.c
@@ -324,6 +324,7 @@ static struct mii_chip_info {
u32 feature;
 } mii_chip_table[] = {
{ Broadcom PHY BCM5461, { 0x0020, 0x60c0 }, LAN, F_PHY_BCM5461 },
+   { Broadcom PHY AC131,   { 0x0143, 0xbc70 }, LAN, 0 },
{ Agere PHY ET1101B,{ 0x0282, 0xf010 }, LAN, 0 },
{ Marvell PHY 88E,  { 0x0141, 0x0cc0 }, LAN, F_PHY_88E },
{ Realtek PHY RTL8201,  { 0x, 0x8200 }, LAN, 0 },
-- 
1.5.0.5

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netpoll question

2007-03-28 Thread Mark Huth

Steve Wise wrote:

Hey all,

I have netpoll question.  How does netpoll work with MSI/X, NAPI, and
nics that setup multiple RSS style receive queues for a single port?
From what I can tell, if you're doing something like netdump using
netpoll for IO, then you might never process incoming packets that get
posted to the rx queues not associated with the main netdevice structure
because netpoll only calls the poll() function for the main netdev
struct.  Not the dummy netdevs setup for multiple rx queues.  


Is this the case or am I confused?

Thanks,


Steve
You are correct.  Netpoll needs a bit of work, especially on the receive 
side, for multi-queue and some other possible problems related to taking 
locks when the system is frozen.  If I get some time soon, I'm going to 
propose an overhaul to address some of these issues that show up in the 
kgdboe and netdump cases.


Mark Huth
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Extract out DSACK detection logic

2007-03-28 Thread David Miller

I'm about to reattempt the hacks to make tcp_sacktag_write_queue()
use the RB tree lookups.

In order to make the changes easier to review I'm trying to clean
up the function a little bit.  This one pulls out the DSACK
detection logic.

I'm starting to pepper get_unaligned() calls around the sack
block accesses as I've been getting a few of these in my logs
on sparc64 lately:

[68089.285478] Kernel unaligned access at TPC[60e3c4] 
tcp_sacktag_write_queue+0x40/0x86c

it's pretty easy to make it happen with NOP TCP options and
stuff like that, and we have get_unaligned() calls for other
TCP options already.

Pushed to net-2.6.22

commit d9367183d9d8fd1853e3bc4d0b1af077553e0e8a
Author: David S. Miller [EMAIL PROTECTED]
Date:   Wed Mar 28 16:27:47 2007 -0700

[TCP]: Extract DSACK detection code from tcp_sacktag_write_queue().

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c855791..a5a8987 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -936,6 +936,39 @@ static void tcp_update_reordering(struct sock *sk, const 
int metric,
  * Both of these heuristics are not used in Loss state, when we cannot
  * account for retransmits accurately.
  */
+static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
+  struct tcp_sack_block_wire *sp, int num_sacks,
+  u32 prior_snd_una)
+{
+   u32 start_seq_0 = ntohl(get_unaligned(sp[0].start_seq));
+   u32 end_seq_0 = ntohl(get_unaligned(sp[0].end_seq));
+   int dup_sack = 0;
+
+   if (before(start_seq_0, TCP_SKB_CB(ack_skb)-ack_seq)) {
+   dup_sack = 1;
+   tp-rx_opt.sack_ok |= 4;
+   NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
+   } else if (num_sacks  1) {
+   u32 end_seq_1 = ntohl(get_unaligned(sp[1].end_seq));
+   u32 start_seq_1 = ntohl(get_unaligned(sp[1].start_seq));
+
+   if (!after(end_seq_0, end_seq_1) 
+   !before(start_seq_0, start_seq_1)) {
+   dup_sack = 1;
+   tp-rx_opt.sack_ok |= 4;
+   NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
+   }
+   }
+
+   /* D-SACK for already forgotten data... Do dumb counting. */
+   if (dup_sack 
+   !after(end_seq_0, prior_snd_una) 
+   after(end_seq_0, tp-undo_marker))
+   tp-undo_retrans--;
+
+   return dup_sack;
+}
+
 static int
 tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb,
u32 prior_snd_una, u32 *mark_lost_entry_seq)
@@ -963,25 +996,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
*mark_lost_entry_seq = tp-highest_sack;
prior_fackets = tp-fackets_out;
 
-   /* Check for D-SACK. */
-   if (before(ntohl(sp[0].start_seq), TCP_SKB_CB(ack_skb)-ack_seq)) {
-   dup_sack = 1;
-   tp-rx_opt.sack_ok |= 4;
-   NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
-   } else if (num_sacks  1 
-   !after(ntohl(sp[0].end_seq), ntohl(sp[1].end_seq)) 
-   !before(ntohl(sp[0].start_seq), 
ntohl(sp[1].start_seq))) {
-   dup_sack = 1;
-   tp-rx_opt.sack_ok |= 4;
-   NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
-   }
-
-   /* D-SACK for already forgotten data...
-* Do dumb counting. */
-   if (dup_sack 
-   !after(ntohl(sp[0].end_seq), prior_snd_una) 
-   after(ntohl(sp[0].end_seq), tp-undo_marker))
-   tp-undo_retrans--;
+   dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una);
 
/* Eliminate too old ACKs, but take into
 * account more or less fresh ones, they can
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] add Attansic L2 PCI ID

2007-03-28 Thread Chris Snook
From: Chris Snook [EMAIL PROTECTED]

Add PCI ID for the Attansic L2 100 Mb ethernet adapter.

Signed-off-by: Chris Snook [EMAIL PROTECTED]

--- linux-2.6.21-rc5.orig/include/linux/pci_ids.h   2007-03-27 
23:26:50.0 -0400
+++ linux-2.6.21-rc5/include/linux/pci_ids.h2007-03-28 15:11:03.0 
-0400
@@ -2090,6 +2090,7 @@
 
 #define PCI_VENDOR_ID_ATTANSIC 0x1969
 #define PCI_DEVICE_ID_ATTANSIC_L1  0x1048
+#define PCI_DEVICE_ID_ATTANSIC_L2  0x2048
 
 #define PCI_VENDOR_ID_JMICRON  0x197B
 #define PCI_DEVICE_ID_JMICRON_JMB360   0x2360
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


another critical bug ?

2007-03-28 Thread Denys
Tried on 2.6.21-rc5-git3, but preempt enabled
Same panic, same place seems

Mar 29 02:50:53 LINUX [  164.644102] BUG: unable to handle kernel paging 
request
Mar 29 02:50:53 LINUX at virtual address 0302014c
Mar 29 02:50:53 LINUX [  164.644242]  printing eip:
Mar 29 02:50:53 LINUX [  164.644301] *pde = 
Mar 29 02:50:53 LINUX [  164.644371] Oops:  [#1]
Mar 29 02:50:53 LINUX [  164.644485] PREEMPT
Mar 29 02:50:53 LINUX SMP
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX [  164.644629] Modules linked in:
 LIST OF MODULES 
Mar 29 02:50:53 LINUX [  164.648758] CPU:0
Mar 29 02:50:53 LINUX [  164.648760] EIP:0060:[c0216e37]Not tainted 
VLI
Mar 29 02:50:53 LINUX [  164.648762] EFLAGS: 00010206   (2.6.20.3-build-0002 
#14)
Mar 29 02:50:53 LINUX [  164.648948] EIP is at netif_rx+0x12/0x115
Mar 29 02:50:53 LINUX [  164.649011] eax: c33fb000   ebx: 03020100   ecx: 
0001   edx: c3380d80
Mar 29 02:50:53 LINUX [  164.649078] esi: c4cfc000   edi: c3380d80   ebp: 
   esp: c7fb7f74
Mar 29 02:50:53 LINUX [  164.649144] ds: 007b   es: 007b   ss: 0068   
preempt: 0001
Mar 29 02:50:53 LINUX [  164.649210] Process softirq-tasklet (pid: 9, 
ti=c7fb6000 task=c7f9f000 task.ti=c7fb6000)
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX [  164.649276] Stack:
Mar 29 02:50:53 LINUX c4cfc400
Mar 29 02:50:53 LINUX c4cfc000
Mar 29 02:50:53 LINUX 
Mar 29 02:50:53 LINUX c8a1f268
Mar 29 02:50:53 LINUX c4cfc45c
Mar 29 02:50:53 LINUX 000f4240
Mar 29 02:50:53 LINUX 
Mar 29 02:50:53 LINUX c011c5d8
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX [  164.649754]
Mar 29 02:50:53 LINUX c7fb7fac
Mar 29 02:50:53 LINUX c026ba53
Mar 29 02:50:53 LINUX 0006
Mar 29 02:50:53 LINUX c11c5c98
Mar 29 02:50:53 LINUX c11c5c98
Mar 29 02:50:53 LINUX 0020
Mar 29 02:50:53 LINUX c011cadf
Mar 29 02:50:53 LINUX c011cbc9
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX [  164.650231]
Mar 29 02:50:53 LINUX 
Mar 29 02:50:53 LINUX 0001
Mar 29 02:50:53 LINUX c7fb7fc0
Mar 29 02:50:53 LINUX 0032
Mar 29 02:50:53 LINUX c11c5c98
Mar 29 02:50:53 LINUX c7fa1ef8
Mar 29 02:50:53 LINUX c0128757
Mar 29 02:50:53 LINUX 
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX [  164.650707] Call Trace:
Mar 29 02:50:53 LINUX [  164.650829]  [c8a1f268]
Mar 29 02:50:53 LINUX ri_tasklet+0xd5/0x1a1 [ifb]
Mar 29 02:50:53 LINUX [  164.650954]  [c011c5d8]
Mar 29 02:50:53 LINUX __tasklet_action+0xe5/0x126
Mar 29 02:50:53 LINUX [  164.651073]  [c026ba53]
Mar 29 02:50:53 LINUX schedule+0xe0/0xfa
Mar 29 02:50:53 LINUX [  164.651201]  [c011cadf]
Mar 29 02:50:53 LINUX ksoftirqd+0x0/0x178
Mar 29 02:50:53 LINUX [  164.651312]  [c011cbc9]
Mar 29 02:50:53 LINUX ksoftirqd+0xea/0x178
Mar 29 02:50:53 LINUX [  164.651443]  [c0128757]
Mar 29 02:50:53 LINUX kthread+0xb2/0xdb
Mar 29 02:50:53 LINUX [  164.651560]  [c01286a5]
Mar 29 02:50:53 LINUX kthread+0x0/0xdb
Mar 29 02:50:53 LINUX [  164.651683]  [c0103a5f]
Mar 29 02:50:53 LINUX kernel_thread_helper+0x7/0x10
Mar 29 02:50:53 LINUX [  164.651823]  ===
Mar 29 02:50:53 LINUX [  164.651890] Code:
Mar 29 02:50:53 LINUX 00
Mar 29 02:50:53 LINUX eb
Mar 29 02:50:53 LINUX 0c
Mar 29 02:50:53 LINUX 89
Mar 29 02:50:53 LINUX d8
Mar 29 02:50:53 LINUX e8
Mar 29 02:50:53 LINUX 7a
Mar 29 02:50:53 LINUX 5e
Mar 29 02:50:53 LINUX 05
Mar 29 02:50:53 LINUX 00
Mar 29 02:50:53 LINUX e9
Mar 29 02:50:53 LINUX bb
Mar 29 02:50:53 LINUX fe
Mar 29 02:50:53 LINUX ff
Mar 29 02:50:53 LINUX ff
Mar 29 02:50:53 LINUX 83
Mar 29 02:50:53 LINUX c4
Mar 29 02:50:53 LINUX 14
Mar 29 02:50:53 LINUX 89
Mar 29 02:50:53 LINUX f0
Mar 29 02:50:53 LINUX 5b
Mar 29 02:50:53 LINUX 5e
Mar 29 02:50:53 LINUX 5f
Mar 29 02:50:53 LINUX 5d
Mar 29 02:50:53 LINUX c3
Mar 29 02:50:53 LINUX 57
Mar 29 02:50:53 LINUX 89
Mar 29 02:50:53 LINUX c7
Mar 29 02:50:53 LINUX 56
Mar 29 02:50:53 LINUX 53
Mar 29 02:50:53 LINUX 8b
Mar 29 02:50:53 LINUX 40
Mar 29 02:50:53 LINUX 14
Mar 29 02:50:53 LINUX 8b
Mar 29 02:50:53 LINUX 98
Mar 29 02:50:53 LINUX e4
Mar 29 02:50:53 LINUX 02
Mar 29 02:50:53 LINUX 00
Mar 29 02:50:53 LINUX 00
Mar 29 02:50:53 LINUX 85
Mar 29 02:50:53 LINUX db
Mar 29 02:50:53 LINUX 74
Mar 29 02:50:53 LINUX 32
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX 7b
Mar 29 02:50:53 LINUX 4c
Mar 29 02:50:53 LINUX 00
Mar 29 02:50:53 LINUX 75
Mar 29 02:50:53 LINUX 06
Mar 29 02:50:53 LINUX 83
Mar 29 02:50:53 LINUX 7b
Mar 29 02:50:53 LINUX 28
Mar 29 02:50:53 LINUX 00
Mar 29 02:50:53 LINUX 74
Mar 29 02:50:53 LINUX 26
Mar 29 02:50:53 LINUX 8d
Mar 29 02:50:53 LINUX 73
Mar 29 02:50:53 LINUX 2c
Mar 29 02:50:53 LINUX 89
Mar 29 02:50:53 LINUX f0
Mar 29 02:50:53 LINUX e8
Mar 29 02:50:53 LINUX 65
Mar 29 02:50:53 LINUX 5e
Mar 29 02:50:53 LINUX 05
Mar 29 02:50:53 LINUX
Mar 29 02:50:53 LINUX [  164.654978] EIP: [c0216e37]
Mar 29 02:50:53 LINUX netif_rx+0x12/0x115
Mar 29 02:50:53 LINUX SS:ESP 0068:c7fb7f74
Mar 29 02:50:53 LINUX [  164.655137]
Mar 29 02:50:53 LINUX Kernel panic - not syncing: Fatal exception
Mar 29 02:50:53 LINUX [  164.655280]  [c0118387]
Mar 29 

Re: [PATCH] Inline net_device_stats

2007-03-28 Thread Rusty Russell
On Wed, 2007-03-28 at 08:52 -0700, Stephen Hemminger wrote:
 It would make sense to do it per-cpu and 64 bit for the non-error counters.

Well, I looked at the e1000, it doesn't update on every packet anyway,
but seems to d/l from the card occasionally.  I assume this is the
method for high-speed drivers, otherwise we should split the tx  rx
parts of the structure.

64 bit introduces potential compatibility problems (exporting via proc).
And per-cpu feels like overkill to me.

Rusty.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Make sacktag logic splittable...

2007-03-28 Thread David Miller

It's nearly impossible to break out tcp_sacktag_write_queue() into
smaller worker functions because there are so many local variables
that are live and updated throughout the inner loop and beyond.

So create a state block so we can start simplifying this function
properly.

Pushed to net-2.6.22

commit eb7723322ccc43f19714ac83395e5204fee0e5b8
Author: David S. Miller [EMAIL PROTECTED]
Date:   Wed Mar 28 17:17:19 2007 -0700

[TCP]: Create tcp_sacktag_state.

It is difficult to break out the inner-logic of
tcp_sacktag_write_queue() into worker functions because
so many local variables get updated in-place.

Start to overcome this by creating a structure block
of state variables that can be passed around into
worker routines.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a5a8987..464dc80 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -936,6 +936,15 @@ static void tcp_update_reordering(struct sock *sk, const 
int metric,
  * Both of these heuristics are not used in Loss state, when we cannot
  * account for retransmits accurately.
  */
+struct tcp_sacktag_state {
+   unsigned int flag;
+   int dup_sack;
+   int reord;
+   int prior_fackets;
+   u32 lost_retrans;
+   int first_sack_index;
+};
+
 static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
   struct tcp_sack_block_wire *sp, int num_sacks,
   u32 prior_snd_una)
@@ -980,23 +989,18 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
-   int reord = tp-packets_out;
-   int prior_fackets;
-   u32 lost_retrans = 0;
-   int flag = 0;
-   int dup_sack = 0;
+   struct tcp_sacktag_state state;
int cached_fack_count;
int i;
-   int first_sack_index;
+   int force_one_sack;
 
if (!tp-sacked_out) {
tp-fackets_out = 0;
tp-highest_sack = tp-snd_una;
} else
*mark_lost_entry_seq = tp-highest_sack;
-   prior_fackets = tp-fackets_out;
 
-   dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una);
+   state.dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, 
prior_snd_una);
 
/* Eliminate too old ACKs, but take into
 * account more or less fresh ones, they can
@@ -1009,18 +1013,18 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
 * if the only SACK change is the increase of the end_seq of
 * the first block then only apply that SACK block
 * and use retrans queue hinting otherwise slowpath */
-   flag = 1;
+   force_one_sack = 1;
for (i = 0; i  num_sacks; i++) {
__be32 start_seq = sp[i].start_seq;
__be32 end_seq = sp[i].end_seq;
 
if (i == 0) {
if (tp-recv_sack_cache[i].start_seq != start_seq)
-   flag = 0;
+   force_one_sack = 0;
} else {
if ((tp-recv_sack_cache[i].start_seq != start_seq) ||
(tp-recv_sack_cache[i].end_seq != end_seq))
-   flag = 0;
+   force_one_sack = 0;
}
tp-recv_sack_cache[i].start_seq = start_seq;
tp-recv_sack_cache[i].end_seq = end_seq;
@@ -1031,8 +1035,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
tp-recv_sack_cache[i].end_seq = 0;
}
 
-   first_sack_index = 0;
-   if (flag)
+   state.first_sack_index = 0;
+   if (force_one_sack)
num_sacks = 1;
else {
int j;
@@ -1050,17 +1054,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
sp[j+1] = tmp;
 
/* Track where the first SACK block 
goes to */
-   if (j == first_sack_index)
-   first_sack_index = j+1;
+   if (j == state.first_sack_index)
+   state.first_sack_index = j+1;
}
 
}
}
}
 
-   /* clear flag as used for different purpose in following code */
-   flag = 0;
-
/* Use SACK fastpath hint if valid */
cached_skb = tp-fastpath_skb_hint;
cached_fack_count = tp-fastpath_cnt_hint;
@@ -1069,6 +1070,11 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb,
cached_fack_count = 0;
}
 
+  

[PATCH] atl1: save mac address on remove

2007-03-28 Thread Chris Snook
From: Chris Snook [EMAIL PROTECTED]

Some atl1 boards get their MAC address written directly to the register
by the BIOS during POST, rather than storing it in EEPROM that's
accessible to the driver.  If the MAC register on one of these boards
is changed and then the module is unloaded, the permanent MAC address
will be forgotten until the box is rebooted.  We should save the
permanent address during removal if we've been messing with it.

Signed-off-by: Chris Snook [EMAIL PROTECTED]

--- a/drivers/net/atl1/atl1_main.c  2007-03-01 14:14:48.0 -0500
+++ b/drivers/net/atl1/atl1_main.c  2007-03-01 16:59:59.0 -0500
@@ -2321,6 +2321,16 @@ static void __devexit atl1_remove(struct
return;
 
adapter = netdev_priv(netdev);
+
+   /* Some atl1 boards lack persistent storage for their MAC, and get it
+* from the BIOS during POST.  If we've been messing with the MAC
+* address, we need to save the permanent one.
+*/
+   if (memcmp(adapter-hw.mac_addr, adapter-hw.perm_mac_addr, ETH_ALEN)) {
+   memcpy(adapter-hw.mac_addr, adapter-hw.perm_mac_addr, 
ETH_ALEN);
+   atl1_set_mac_addr(adapter-hw);
+   }
+
iowrite16(0, adapter-hw.hw_addr + REG_GPHY_ENABLE);
unregister_netdev(netdev);
pci_iounmap(pdev, adapter-hw.hw_addr);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netpoll question

2007-03-28 Thread Steve Wise
On Wed, 2007-03-28 at 16:28 -0700, Mark Huth wrote:
 Steve Wise wrote:
  Hey all,
 
  I have netpoll question.  How does netpoll work with MSI/X, NAPI, and
  nics that setup multiple RSS style receive queues for a single port?
  From what I can tell, if you're doing something like netdump using
  netpoll for IO, then you might never process incoming packets that get
  posted to the rx queues not associated with the main netdevice structure
  because netpoll only calls the poll() function for the main netdev
  struct.  Not the dummy netdevs setup for multiple rx queues.  
 
  Is this the case or am I confused?
 
  Thanks,
 
 
  Steve
 You are correct.  Netpoll needs a bit of work, especially on the receive 
 side, for multi-queue and some other possible problems related to taking 
 locks when the system is frozen.  If I get some time soon, I'm going to 
 propose an overhaul to address some of these issues that show up in the 
 kgdboe and netdump cases.
 
 Mark Huth

Hey Mark,

What are your thoughts on how to implement this?  

Scrub every softnet_data queue-poll_list for every cpu?  Or perhaps its
better to have a new function ptr off the netdev that sez poll all rx
queues?


Steve.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2007-03-28 Thread Jeff Garzik

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/atl1/atl1_hw.c   |1 -
 drivers/net/forcedeth.c  |8 ++-
 drivers/net/mv643xx_eth.c|4 +-
 drivers/net/myri10ge/myri10ge.c  |7 +-
 drivers/net/qla3xxx.c|  110 +++--
 drivers/net/qla3xxx.h|3 +-
 drivers/net/sun3lance.c  |   16 -
 drivers/net/wireless/bcm43xx/bcm43xx_phy.c   |4 +-
 drivers/net/wireless/bcm43xx/bcm43xx_radio.c |   12 ++--
 fs/compat_ioctl.c|9 ++
 include/linux/wireless.h |   21 -
 include/net/iw_handler.h |   30 +--
 net/core/rtnetlink.c |3 +-
 net/core/wireless.c  |   82 
 14 files changed, 182 insertions(+), 128 deletions(-)

Ayaz Abdulla (2):
  forcedeth: fix nic poll
  forcedeth: fix tx timeout

Brice Goglin (1):
  myri10ge: correctly detect when TSO should be used

Cyrill V. Gorcunov (1):
  SUN3/3X Lance trivial fix improved

David Woodhouse (1):
  bcm43xx: Fix machine check on PPC for version 1 PHY

Gabriel Paubert (1):
  mv643xx_eth: Fix use of uninitialized port_num field

Jay Cliburn (1):
  atl1: remove unnecessary crc inversion

Jean Tourrilhes (2):
  wext: Add missing ioctls to 64-32 conversion
  WE-22 : prevent information leak on 64 bit

Larry Finger (1):
  bcm43xx: Fix code for confusion between PHY revision and PHY version

Ron Mercer (4):
  qla3xxx: bugfix: Add tx control block memset.
  qla3xxx: bugfix: Multi segment sends were getting whacked.
  qla3xxx: bugfix: Dropping interrupt under heavy network load.
  qla3xxx: bugfix: Jumbo frame handling.

Stefano Brivio (1):
  bcm43xx: fix radio_set_tx_iq

diff --git a/drivers/net/atl1/atl1_hw.c b/drivers/net/atl1/atl1_hw.c
index 314dbaa..69482e0 100644
--- a/drivers/net/atl1/atl1_hw.c
+++ b/drivers/net/atl1/atl1_hw.c
@@ -334,7 +334,6 @@ u32 atl1_hash_mc_addr(struct atl1_hw *hw, u8 *mc_addr)
int i;
 
crc32 = ether_crc_le(6, mc_addr);
-   crc32 = ~crc32;
for (i = 0; i  32; i++)
value |= (((crc32  i)  1)  (31 - i));
 
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 46e1697..d04214e 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2050,9 +2050,10 @@ static void nv_tx_timeout(struct net_device *dev)
nv_drain_tx(dev);
nv_init_tx(dev);
setup_hw_rings(dev, NV_SETUP_TX_RING);
-   netif_wake_queue(dev);
}
 
+   netif_wake_queue(dev);
+
/* 4) restart tx engine */
nv_start_tx(dev);
spin_unlock_irq(np-lock);
@@ -3536,7 +3537,10 @@ static void nv_do_nic_poll(unsigned long data)
pci_push(base);
 
if (!using_multi_irqs(dev)) {
-   nv_nic_irq(0, dev);
+   if (np-desc_ver == DESC_VER_3)
+   nv_nic_irq_optimized(0, dev);
+   else
+   nv_nic_irq(0, dev);
if (np-msi_flags  NV_MSI_X_ENABLED)

enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index c9f55bc..8015a7c 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1379,7 +1379,7 @@ static int mv643xx_eth_probe(struct platform_device *pdev)
 
spin_lock_init(mp-lock);
 
-   port_num = pd-port_number;
+   port_num = mp-port_num = pd-port_number;
 
/* set default config values */
eth_port_uc_addr_get(dev, dev-dev_addr);
@@ -1411,8 +1411,6 @@ static int mv643xx_eth_probe(struct platform_device *pdev)
duplex = pd-duplex;
speed = pd-speed;
 
-   mp-port_num = port_num;
-
/* Hook up MII support for ethtool */
mp-mii.dev = dev;
mp-mii.mdio_read = mv643xx_mdio_read;
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index b05b20e..c216e6a 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -71,7 +71,7 @@
 #include myri10ge_mcp.h
 #include myri10ge_mcp_gen_header.h
 
-#define MYRI10GE_VERSION_STR 1.3.0-1.226
+#define MYRI10GE_VERSION_STR 1.3.0-1.227
 
 MODULE_DESCRIPTION(Myricom 10G driver (10GbE));
 MODULE_AUTHOR(Maintainer: [EMAIL PROTECTED]);
@@ -2015,10 +2015,9 @@ again:
mss = 0;
max_segments = MXGEFW_MAX_SEND_DESC;
 
-   if (skb-len  (dev-mtu + ETH_HLEN)) {
+   if (skb_is_gso(skb)) {
mss = skb_shinfo(skb)-gso_size;
-   if (mss != 0)
-   max_segments = MYRI10GE_MAX_SEND_DESC_TSO;
+   max_segments = 

Re: IPv6: Connection reset/timeout under heavy load

2007-03-28 Thread Agoston Horvath
YOSHIFUJI Hideaki / 吉藤英明 wrote:
 Would you test with latest kernel, if possible, please?

For the archive: switching to 2.6.20.4 fixed this problem.

Thanks!

Agoston
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6: Connection reset/timeout under heavy load

2007-03-28 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Wed, 28 Mar 2007 10:48:27 +0200), Agoston 
Horvath [EMAIL PROTECTED] says:

 YOSHIFUJI Hideaki / 吉藤英明 wrote:
  Would you test with latest kernel, if possible, please?
 
 For the archive: switching to 2.6.20.4 fixed this problem.

Thank you for your report.

I guess the following change will fix the issue for 2.6.16.y:
http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/yoshfuji/linux-2.6-fix.git;a=commit;h=33a79bba0cc2f197b46cc54182f94c31ff6ad334

I hope this patch will go in 2.6.16-stable...

Regards,

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6: Connection reset/timeout under heavy load

2007-03-28 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Wed, 28 Mar 2007 18:23:34 +0900 (JST)

 In article [EMAIL PROTECTED] (at Wed, 28 Mar 2007 10:48:27 +0200), Agoston 
 Horvath [EMAIL PROTECTED] says:
 
  YOSHIFUJI Hideaki / 吉藤英明 wrote:
   Would you test with latest kernel, if possible, please?
  
  For the archive: switching to 2.6.20.4 fixed this problem.
 
 Thank you for your report.
 
 I guess the following change will fix the issue for 2.6.16.y:
 http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/yoshfuji/linux-2.6-fix.git;a=commit;h=33a79bba0cc2f197b46cc54182f94c31ff6ad334
 
 I hope this patch will go in 2.6.16-stable...

Please forward this patch to Adrian Bunk ([EMAIL PROTECTED]),
he will definitely add it to 2.6.16-stable for you.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPv6: Connection reset/timeout under heavy load

2007-03-28 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Wed, 28 Mar 2007 02:26:24 -0700 (PDT)), 
David Miller [EMAIL PROTECTED] says:

  http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/yoshfuji/linux-2.6-fix.git;a=commit;h=33a79bba0cc2f197b46cc54182f94c31ff6ad334
  
  I hope this patch will go in 2.6.16-stable...
 
 Please forward this patch to Adrian Bunk ([EMAIL PROTECTED]),
 he will definitely add it to 2.6.16-stable for you.

I will do it again...

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] 2.6.21-rc5: known regressions

2007-03-28 Thread Kok, Auke

Adrian Bunk wrote:

Subject: e1000 resume weirdness
References : http://lkml.org/lkml/2007/3/26/91
Submitter  : Ingo Molnar [EMAIL PROTECTED]
Handled-By : Jesse Brandeburg [EMAIL PROTECTED]
 Auke Kok [EMAIL PROTECTED]
Status : problem is being debugged


The issue comes from a corner case and the underlying problem is that e1000 
isn't stopping tx properly. We have a fix for this pending in our tree that I'll 
push upstream for 2.6.22 to Jeff, but I don't think this should be a blocker and 
it's probably is not a regression at all, the gap has always been present.


on a side note, this is probably fixed easily by turning the adapters 
detect_tx_hung flag off in e1000_down, so if someone spots this reoccurring 
somewhat regularly, please contact me so we can debug it. I myself have a system 
suspend/resuming in circles for an hour now with traffic flying across without a 
single hit on it


Adrian, you probably want to drop this issue from your list.

Cheers,


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] 2.6.21-rc5: known regressions

2007-03-28 Thread Ingo Molnar

* Kok, Auke [EMAIL PROTECTED] wrote:

 Adrian Bunk wrote:
 Subject: e1000 resume weirdness
 References : http://lkml.org/lkml/2007/3/26/91
 Submitter  : Ingo Molnar [EMAIL PROTECTED]
 Handled-By : Jesse Brandeburg [EMAIL PROTECTED]
  Auke Kok [EMAIL PROTECTED]
 Status : problem is being debugged

 Adrian, you probably want to drop this issue from your list.

agreed - i have done many suspend/resumes meanwhile, and this condition 
has not reoccured since then. (and even when it occured, it was 
transitionary)

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IP1000A: About IC Plus IP1000A Linux driver current status

2007-03-28 Thread Francois Romieu
Jesse [EMAIL PROTECTED] :
[...]
 The latest version had been modified by you. 
 Would you be kindly to do this for me.

I have just screwdriven a test machine but I will not finish a
build/test cycle today.

-- 
Ueimor

Anybody got a battery for my Ultra 10 ?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] the scheduled eepro100 removal

2007-03-28 Thread Bill Davidsen

Adrian Bunk wrote:

This patch contains the scheduled removal of the eepro100 driver.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]


This keeps coming around, but I haven't seen an answer to the questions 
raised by Eric Piel or Kiszka. I do know that e100 didn't work on some 
IBM rackmount servers and eepro100 did, but since I'm no longer 
responsible for those machines I can't retest. Perhaps someone will be 
able to provide data points.


IBM current offerings as of about three years ago, I had a few dozen of 
them at one time.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] the scheduled eepro100 removal

2007-03-28 Thread Kok, Auke

Bill Davidsen wrote:

Adrian Bunk wrote:

This patch contains the scheduled removal of the eepro100 driver.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]


This keeps coming around, but I haven't seen an answer to the questions 
raised by Eric Piel or Kiszka. I do know that e100 didn't work on some 
IBM rackmount servers and eepro100 did, but since I'm no longer 
responsible for those machines I can't retest. Perhaps someone will be 
able to provide data points.


IBM current offerings as of about three years ago, I had a few dozen of 
them at one time.


We have provided a (test) driver which allows e100 to use IO to communicate with 
the device, which seems to have helped for one person. I think we need to work 
with those changes and see if it helps the other people resolve their e100 
issues. Unfortunately it keeps slipping off to the low priority list for us.


I suggest that we should push this code into -mm for people to test or 
something. It's fairly low risk as by default the patch won't enable IO and thus 
use the old method of writing to the adapter.


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] e100 driver on ARM

2007-03-28 Thread Kok, Auke

Lennert Buytenhek wrote:

On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote:


1) Does e100 driver work on ARM?


FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based)
board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based)
board.  ixp2350 works both with hardware coherency turned on (cpu
snoops bus) and turned off (manual dma cache clean/invalidate as usual.)

As for the other ARM platforms that I'm interested in / have hardware
for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other
two (iop32x/iop33x) I can't test because I don't have such systems with
e100 NICs, but I expect those would work, since they're both xscale
based like the ixp2400, and the ixp2400 works.


I just got an iop342 board dropped on my lap. Once it's running, I'll make sure 
to make this the first thing to test.


Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Pull out core sack tagging logic

2007-03-28 Thread David Miller

Ok, this is what I was initially trying to do, pull out the
inner-most loop main code into a helper function.

Pushed to net-2.6.22

commit b096b50b4bf3c923bee28751d1ed41e92361a298
Author: David S. Miller [EMAIL PROTECTED]
Date:   Wed Mar 28 19:35:51 2007 -0700

[TCP]: Create tcp_sacktag_one().

Worker function that implements the main logic of
the inner-most loop of tcp_sacktag_write_queue().

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 464dc80..97b9be2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -978,6 +978,115 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
return dup_sack;
 }
 
+static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
+   struct tcp_sacktag_state *state, int in_sack,
+   int fack_count, u32 end_seq)
+{
+   u8 sacked = TCP_SKB_CB(skb)-sacked;
+
+   /* Account D-SACK for retransmitted packet. */
+   if ((state-dup_sack  in_sack) 
+   (sacked  TCPCB_RETRANS) 
+   after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker))
+   tp-undo_retrans--;
+
+   /* The frame is ACKed. */
+   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) {
+   if (sacked  TCPCB_RETRANS) {
+   if ((state-dup_sack  in_sack) 
+   (sacked  TCPCB_SACKED_ACKED))
+   state-reord = min(fack_count, state-reord);
+   } else {
+   /* If it was in a hole, we detected reordering. */
+   if (fack_count  state-prior_fackets 
+   !(sacked  TCPCB_SACKED_ACKED))
+   state-reord = min(fack_count, state-reord);
+   }
+
+   /* Nothing to do; acked frame is about to be dropped. */
+   return;
+   }
+
+   if ((sacked  TCPCB_SACKED_RETRANS) 
+   after(end_seq, TCP_SKB_CB(skb)-ack_seq) 
+   (!state-lost_retrans || after(end_seq, state-lost_retrans)))
+   state-lost_retrans = end_seq;
+
+   if (!in_sack)
+   return;
+
+   if (!(sacked  TCPCB_SACKED_ACKED)) {
+   if (sacked  TCPCB_SACKED_RETRANS) {
+   /* If the segment is not tagged as lost,
+* we do not clear RETRANS, believing
+* that retransmission is still in flight.
+*/
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked =
+   ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
+   tp-lost_out -= tcp_skb_pcount(skb);
+   tp-retrans_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   } else {
+   /* New sack for not retransmitted frame,
+* which was in hole. It is reordering.
+*/
+   if (!(sacked  TCPCB_RETRANS) 
+   fack_count  state-prior_fackets)
+   state-reord = min(fack_count, state-reord);
+
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST;
+   tp-lost_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   /* SACK enhanced F-RTO detection.
+* Set flag if and only if non-rexmitted
+* segments below frto_highmark are
+* SACKed (RFC4138; Appendix B).
+* Clearing correct due to in-order walk
+*/
+   if (after(end_seq, tp-frto_highmark)) {
+   state-flag = ~FLAG_ONLY_ORIG_SACKED;
+   } else {
+   if (!(sacked  TCPCB_RETRANS))
+   state-flag |= FLAG_ONLY_ORIG_SACKED;
+   }
+   }
+
+   TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED;
+   state-flag |= FLAG_DATA_SACKED;
+   tp-sacked_out += tcp_skb_pcount(skb);
+
+   if (fack_count  tp-fackets_out)
+   tp-fackets_out = fack_count;
+
+   if (after(TCP_SKB_CB(skb)-seq,
+ tp-highest_sack))
+   tp-highest_sack = TCP_SKB_CB(skb)-seq;
+   } else {
+   if (state-dup_sack  (sackedTCPCB_RETRANS))
+   state-reord = min(fack_count, state-reord);
+   }
+
+

Re: [2.6 patch] the scheduled eepro100 removal

2007-03-28 Thread Yinghai Lu

On 3/28/07, Jeff Garzik [EMAIL PROTECTED] wrote:

Kok, Auke wrote:
Sounds sane to me.  My overall opinion on eepro100 removal is that we're
not there yet.  Rare problem cases remain where e100 fails but eepro100
works, and it's older drivers so its low priority for everybody.

Needs to happen, though...



It seems that several Tyan Opteron base system that were using IPMI
add on card.  the IPMI card share intel 100Mhz nic onboard. you need
to use eepro100 instead of e100 otherwise the e100 will shutdown OOB
(out of Band) connection for IPMI when shut down the OS.

YH
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] e100 driver on ARM

2007-03-28 Thread David Acker

Kok, Auke wrote:

Lennert Buytenhek wrote:

On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote:


1) Does e100 driver work on ARM?


FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based)
board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based)
board.  ixp2350 works both with hardware coherency turned on (cpu
snoops bus) and turned off (manual dma cache clean/invalidate as usual.)

As for the other ARM platforms that I'm interested in / have hardware
for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other
two (iop32x/iop33x) I can't test because I don't have such systems with
e100 NICs, but I expect those would work, since they're both xscale
based like the ixp2400, and the ixp2400 works.


I just got an iop342 board dropped on my lap. Once it's running, I'll 
make sure to make this the first thing to test.




I have a pxa255 based system with PCI added to it.  The e100 would have 
memory corruption in its receive buffers detected by slab debugging 
unless I put in the patch to use the S-bit.


Here is a link to the patch posting:
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc3-mm1/broken-out/git-netdev-all.patch
Search for e100.c.

http://www-gatago.com/linux/kernel/15457063.html - This discussion seems 
to hit the issue.


There appears to be a race on the cache line where the EL bit and the 
next packet info live. In my case the hardware appeared to write to a 
free packet.  The S-bit seems to make the hardware stop and spin on the 
bit, while the EL bit seems to let the hardware try to use that packet.


This race would occur less often when the receive buffer chain is always 
refilled before the hardware can use them up.  On our 400 Mhz Xscale, we 
can use up all 256 buffers if the PCI bus has another busy device on it. 
 In our case it is an 802.11g miniPCI card and our software was routing 
all ethernet packets to the wireless interface and vice versa while TCP 
streams were running accross these connections.

-Ack
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html