Re: L2 network namespace benchmarking
Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network namespace are compiled in, there is no performance degradations. Eric's patchset allows to move network devices between namespaces and this is clearly a good feature, missing in the Dmitry's patchset. This feature helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container. Assuming these results are not contradicted this says that the extra dereference where we need it does not add measurable to the overhead in the Linus network stack. Performance wise this should be good enough to allow merging the code into the linux kernel, as it does not measurably affect networking when we do not have multiple containers in use. I have a few questions about merging code into the linux kernel. * How do you plan to do that ? * When do you expect to have the network namespace into mainline ? * Are Dave Miller and Alexey Kuznetov aware of the network namespace ? * Did they saw your patchset or ever know it exists ? * Do you have any feedbacks from netdev about the network namespace ? Things are good enough that we can even consider not providing an option to compile the support out. The loss of performances is very noticeable inside the container and seems to be directly related to the usage of the pair device and the specific network configuration needed for the container. When the packets are sent by the container, the mac address is for the pair device but the IP address is not owned by the host. That directly implies to have the host to act as a router and the packets to be forwarded. That adds a lot of overhead. Well it adds measurable overhead. A hack has been made in the ip_forward function to avoid useless skb_cow when using the pair device/tunnel device and the overhead is reduced by the half. To be fully satisfactory how we get the packets to the namespace still appears to need work. We have overhead in routing. That may simply be the cost of performing routing or there may be some optimizations opportunities there. We have about the same overhead when performing bridging which I actually find more surprising, as the bridging code should involve less packet handling. Yep. I will try to figure out what is happening. Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. The routing was optimized for the loopback, no ? Why can't we do the same for the etun device ? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ping6 to own link-local address
In article [EMAIL PROTECTED] (at Wed, 21 Mar 2007 00:26:09 +0100 (CET)), YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] says: In article [EMAIL PROTECTED] (at Tue, 20 Mar 2007 15:16:40 -0700), Sridhar Samudrala [EMAIL PROTECTED] says: On Tue, 2007-03-20 at 10:19 +0100, YOSHIFUJI Hideaki / 吉藤英明 wrote: Hello. Recent 2.6.21-git kernels do not respond to ping6 queries to our own (local) link-local address. Now bisecting... The following patch seems to be the cause for this regression. [IPV6] ROUTE: Do not route packets to link-local address on other device. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f Right. Hmm... Well, 2.6.21 is coming. I think it is better to revert it for now. Current situation is more critical than the original. Dave? --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/1] netlink: no need to crash if table does not exist.
On Tue, Mar 27, 2007 at 04:41:54PM -0700, David Miller ([EMAIL PROTECTED]) wrote: There is no problem as-is, but I implement unified cache for different sockets (currently tcp/udp/raw and netlink are supported), which does not use that table, so I currently wrap all access code into special ifdefs, this one can be wrapped too, but since it is not needed, it saves couple of lines of code. It is needed. It is there to make sure that a kernel netlink socket is not created before the af_netlink init code runs. We've had sequencing bugs like that in the initcall call chain in the past, that's why the check is there. Argh, I see. I fail to find exact commit (at least it was not in 2.4 and was created before 2.6.12), but it is ineed neeed. Thanks for explaination. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ping6 to own link-local address
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 16:57:48 +0900 (JST) In article [EMAIL PROTECTED] (at Wed, 21 Mar 2007 00:26:09 +0100 (CET)), YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] says: In article [EMAIL PROTECTED] (at Tue, 20 Mar 2007 15:16:40 -0700), Sridhar Samudrala [EMAIL PROTECTED] says: The following patch seems to be the cause for this regression. [IPV6] ROUTE: Do not route packets to link-local address on other device. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a0d78ebf3a0e33a1aeacf2fc518ad9273d6a1c2f Right. Hmm... Well, 2.6.21 is coming. I think it is better to revert it for now. Current situation is more critical than the original. Dave? I will look into this and make some kind of decision on how to proceed tomorrow. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch:replace with time_after in drivers/net/eexpress.c
On Wed, Mar 28, 2007 at 10:44:31AM +0530, Shani wrote: Hi, Replacing with time_after in drivers/net/eexpress.c Applies and compiles clean on latest tree.Not tested. thanks. Signed-off-by: Shani Moideen [EMAIL PROTECTED] NAK as not tested. The existing code is known to work so ugly or not it is better than untested changes - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ppp_generic: lockdep warning Re: [Bug 8132] New: pptp server lockup ...
On Mon, Mar 19, 2007 at 10:49:12AM +0300, Yuriy N. Shkandybin wrote: I've changed kernel to rc4 and completely changed hardware. Now this is I've got new trace, but this is another problem as i can see and connected with pppoe === [ INFO: possible circular locking dependency detected ] 2.6.21-rc4 #1 --- pppd/8926 is trying to acquire lock: (vlan_netdev_xmit_lock_key){-...}, at: [c0265486] dev_queue_xmit+0x247/0x2f1 but task is already holding lock: (pch-downl){-+..}, at: [c0230c72] ppp_channel_push+0x19/0x9a which lock already depends on the new lock. the existing dependency chain (in reverse order) is: - #3 (pch-downl){-+..}: [c013642b] __lock_acquire+0xe62/0x1010 [c0136642] lock_acquire+0x69/0x83 [c02afc13] _spin_lock_bh+0x30/0x3d [c022f715] ppp_push+0x5a/0x9a [c022fb40] ppp_xmit_process+0x2e/0x511 [c0231a05] ppp_write+0xb8/0xf2 [c015ec26] vfs_write+0x7f/0xba [c015f158] sys_write+0x3d/0x64 [c01027de] sysenter_past_esp+0x5f/0x99 [] 0x - #2 (ppp-wlock){-+..}: [c013642b] __lock_acquire+0xe62/0x1010 [c0136642] lock_acquire+0x69/0x83 [c02afc13] _spin_lock_bh+0x30/0x3d [c022fb2b] ppp_xmit_process+0x19/0x511 [c02318d3] ppp_start_xmit+0x18a/0x204 [c0263a6f] dev_hard_start_xmit+0x1f6/0x2c4 [c026ded3] __qdisc_run+0x81/0x1bc [c026549e] dev_queue_xmit+0x25f/0x2f1 [c027c75f] ip_output+0x1be/0x25f [c02788ce] ip_forward+0x159/0x22b [c027745c] ip_rcv+0x297/0x4dd [c0263698] netif_receive_skb+0x164/0x1f2 [c022199d] e1000_clean_rx_irq+0x12a/0x4b7 [c02209bc] e1000_clean+0x3ff/0x5dd [c0265084] net_rx_action+0x7d/0x12b [c011e442] __do_softirq+0x82/0xf2 [c011e509] do_softirq+0x57/0x59 [c011e877] irq_exit+0x7f/0x81 [c0105011] do_IRQ+0x45/0x84 [c0103252] common_interrupt+0x2e/0x34 [c0100b66] mwait_idle+0x12/0x14 [c0100c60] cpu_idle+0x6c/0x86 [c01001cd] rest_init+0x23/0x36 [c0377d89] start_kernel+0x3ca/0x461 [] 0x0 [] 0x - #1 (dev-_xmit_lock){-+..}: [c013642b] __lock_acquire+0xe62/0x1010 [c0136642] lock_acquire+0x69/0x83 [c02afc13] _spin_lock_bh+0x30/0x3d [c0266861] dev_mc_add+0x34/0x16a [c02ab5c7] vlan_dev_set_multicast_list+0x88/0x25c [c0266592] __dev_mc_upload+0x22/0x24 [c0266914] dev_mc_add+0xe7/0x16a [c029f323] igmp_group_added+0xe6/0xeb [c029f50b] ip_mc_inc_group+0x13f/0x210 [c029f5fa] ip_mc_up+0x1e/0x61 [c029ab81] inetdev_event+0x154/0x2c7 [c0125a46] notifier_call_chain+0x2c/0x39 [c0125a7c] raw_notifier_call_chain+0x8/0xa [c026477a] dev_open+0x6d/0x71 [c0263028] dev_change_flags+0x51/0x101 [c029b7ca] devinet_ioctl+0x4df/0x644 [c029bc03] inet_ioctl+0x5c/0x6f [c02596e0] sock_ioctl+0x4f/0x1e8 [c0168c32] do_ioctl+0x22/0x71 [c0168cd6] vfs_ioctl+0x55/0x27e [c0168f32] sys_ioctl+0x33/0x51 [c01027de] sysenter_past_esp+0x5f/0x99 [] 0x - #0 (vlan_netdev_xmit_lock_key){-...}: [c0136289] __lock_acquire+0xcc0/0x1010 [c0136642] lock_acquire+0x69/0x83 [c02afbd6] _spin_lock+0x2b/0x38 [c0265486] dev_queue_xmit+0x247/0x2f1 [c02334f6] __pppoe_xmit+0x1a9/0x215 [c023356c] pppoe_xmit+0xa/0xc [c0230c9a] ppp_channel_push+0x41/0x9a [c0231a13] ppp_write+0xc6/0xf2 [c015ec26] vfs_write+0x7f/0xba [c015f158] sys_write+0x3d/0x64 [c01027de] sysenter_past_esp+0x5f/0x99 [] 0x other info that might help us debug this: 1 lock held by pppd/8926: #0: (pch-downl){-+..}, at: [c0230c72] ppp_channel_push+0x19/0x9a stack backtrace: [c0103834] show_trace_log_lvl+0x1a/0x30 [c0103f16] show_trace+0x12/0x14 [c0103f9d] dump_stack+0x16/0x18 [c01343cd] print_circular_bug_tail+0x68/0x71 [c0136289] __lock_acquire+0xcc0/0x1010 [c0136642] lock_acquire+0x69/0x83 [c02afbd6] _spin_lock+0x2b/0x38 [c0265486] dev_queue_xmit+0x247/0x2f1 [c02334f6] __pppoe_xmit+0x1a9/0x215 [c023356c] pppoe_xmit+0xa/0xc [c0230c9a] ppp_channel_push+0x41/0x9a [c0231a13] ppp_write+0xc6/0xf2 [c015ec26] vfs_write+0x7f/0xba [c015f158] sys_write+0x3d/0x64 [c01027de] sysenter_past_esp+0x5f/0x99 === Clocksource tsc unstable (delta = 4686844667 ns) Time: acpi_pm clocksource has been installed. ... lockdep has seen locks - #0 - - #3 taken in circular order, but IMHO, lock - #3 (pch-downl) taken after - #2 (ppp-wlock) differs from pch-downl lock taken in - #0 (before vlan_netdev_xmit_lock_key) and lockdep should be notified about this. This patch proposal needs confirmation by some PPP expert that channels processed in ppp_channel_push() differ from channels
Re: [PATCH]: Fix ipv6 round-robin locking
Hello. In article [EMAIL PROTECTED] (at Sat, 24 Mar 2007 12:44:36 -0700 (PDT)), David Miller [EMAIL PROTECTED] says: The fix for the most serious of them is below, and I'd appreciate any feedback if people spot any problems or holes in that approach. I hoped we could save some memory per fib6_node, but I'm fine with it. Regards, --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Inline net_device_stats
Hi all, Does something like this make sense for future drivers? Cheers, Rusty. === Network drivers which keep stats allocate their own stats structure then write a get_stats() function to return them. It would be nice if this were done by default. 1) Add a new stats field to struct net_device. 2) Add a new feature field to say this driver uses the internal one 3) Have a default get_stats which returns NULL if that feature not set. 4) Change callers to check result of get_stats call for NULL, not if -get_stats is set. This should not break backwards compatibility with older drivers, yet allow modern drivers to shed some boilerplate code. Lightly tested: works for a modified lguest network driver. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- 0 files changed diff -r 1ccab0a087b7 arch/s390/appldata/appldata_net_sum.c --- a/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 13:46:10 2007 +1000 +++ b/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 14:28:47 2007 +1000 @@ -108,10 +108,10 @@ static void appldata_get_net_sum_data(vo collisions = 0; read_lock(dev_base_lock); for (dev = dev_base; dev != NULL; dev = dev-next) { - if (dev-get_stats == NULL) { + stats = dev-get_stats(dev); + if (stats == NULL) { continue; } - stats = dev-get_stats(dev); rx_packets += stats-rx_packets; tx_packets += stats-tx_packets; rx_bytes += stats-rx_bytes; diff -r 1ccab0a087b7 drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c Tue Mar 27 13:46:10 2007 +1000 +++ b/drivers/net/bonding/bond_main.c Tue Mar 27 14:29:08 2007 +1000 @@ -3621,9 +3621,8 @@ static struct net_device_stats *bond_get read_lock_bh(bond-lock); bond_for_each_slave(bond, slave, i) { - if (slave-dev-get_stats) { - sstats = slave-dev-get_stats(slave-dev); - + sstats = slave-dev-get_stats(slave-dev); + if (sstats) { stats-rx_packets += sstats-rx_packets; stats-rx_bytes += sstats-rx_bytes; stats-rx_errors += sstats-rx_errors; diff -r 1ccab0a087b7 drivers/parisc/led.c --- a/drivers/parisc/led.c Tue Mar 27 13:46:10 2007 +1000 +++ b/drivers/parisc/led.c Tue Mar 27 14:29:17 2007 +1000 @@ -372,9 +372,9 @@ static __inline__ int led_get_net_activi continue; if (LOOPBACK(in_dev-ifa_list-ifa_local)) continue; - if (!dev-get_stats) + stats = dev-get_stats(dev); + if (!stats) continue; - stats = dev-get_stats(dev); rx_total += stats-rx_packets; tx_total += stats-tx_packets; } diff -r 1ccab0a087b7 include/linux/netdevice.h --- a/include/linux/netdevice.h Tue Mar 27 13:46:10 2007 +1000 +++ b/include/linux/netdevice.h Tue Mar 27 14:21:09 2007 +1000 @@ -325,6 +325,7 @@ struct net_device #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ #define NETIF_F_GSO2048/* Enable software GSO. */ #define NETIF_F_LLTX 4096/* LockLess TX */ +#define NETIF_F_INTERNAL_STATS 8192/* Use stats structure in net_device */ /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 @@ -349,6 +350,7 @@ struct net_device struct net_device_stats* (*get_stats)(struct net_device *dev); + struct net_device_stats stats; #ifdef CONFIG_WIRELESS_EXT /* List of functions to handle Wireless Extensions (instead of ioctl). diff -r 1ccab0a087b7 net/core/dev.c --- a/net/core/dev.cTue Mar 27 13:46:10 2007 +1000 +++ b/net/core/dev.cTue Mar 27 14:30:05 2007 +1000 @@ -825,7 +825,6 @@ static int default_rebuild_header(struct return 1; } - /** * dev_open- prepare an interface for use. * @dev: device to open @@ -2120,9 +2119,9 @@ void dev_seq_stop(struct seq_file *seq, static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) { - if (dev-get_stats) { - struct net_device_stats *stats = dev-get_stats(dev); - + struct net_device_stats *stats = dev-get_stats(dev); + + if (stats) { seq_printf(seq, %6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu %8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n, dev-name, stats-rx_bytes, stats-rx_packets, @@ -3146,6 +3145,13 @@ out: mutex_unlock(net_todo_run_mutex); } +static struct net_device_stats *maybe_internal_stats(struct net_device *dev) +{ + if (dev-features NETIF_F_INTERNAL_STATS) + return dev-stats; + return NULL; +} + /** * alloc_netdev - allocate network device * @sizeof_priv: size of private data to allocate space for @@ -3181,6 +3187,7
Re: [PATCH]: Fix ipv6 round-robin locking
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 18:16:42 +0900 (JST) I hoped we could save some memory per fib6_node, but I'm fine with it. I know, I did not want to add it either :( Speaking of which, several of the potential fixes for the rt6_probe() deadlock require adding even more things to the fib6_node (a linked list which some workqueue or similar can run, or a timer, etc.). So, I'm trying to figure out a way to get the rt6_probe() to run outside of the per-table rwlock without adding more state to fib6_node. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
But I think it can be mostly ignored. With all due respect, it cannot. An attacker with a small-sized botnet (which is ~250 hosts) can create chains that contain well in excess of 3000 items. Most likely they can also easily generate enough latency data to crack any simple hash function then. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
On Wed, Mar 28, 2007 at 11:29:43AM +0200, Andi Kleen ([EMAIL PROTECTED]) wrote: But I think it can be mostly ignored. With all due respect, it cannot. An attacker with a small-sized botnet (which is ~250 hosts) can create chains that contain well in excess of 3000 items. Most likely they can also easily generate enough latency data to crack any simple hash function then. Jenkins hash is far from being simple to crack, although with some knowledge it can be done faster. SHA or something else is essentially the same, except it has different set of rounds - we only hashes 3 32 bit values, so Jenkins result is really good. For the hash tables it is a good solution, but we can move further. I created multidimensional trie with that problem in mind, but it looks like right now it is not absolutely required solution - I will continue to work on it to check if we can be faster than properly sized hash table with additional trie allocation overhead, but likely I should not force people include it as is, only for information I think. -Andi -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications
On 27-03-2007 14:38, Thomas Graf wrote: The results of FIB rules lookups are cached in the routing cache except for IPv6 as no such cache exists. So far, it was the responsibility of the user to flush the cache after modifying any rules. This lead to many false bug reports due to misunderstanding of this concept. This patch automatically flushes the route cache after inserting or deleting a rule. I hope I'm wrong, but isn't this at the cost of admins working with long rules' sets, which (probably) take extra time now? Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Daniel Lezcano [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network namespace are compiled in, there is no performance degradations. Eric's patchset allows to move network devices between namespaces and this is clearly a good feature, missing in the Dmitry's patchset. This feature helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container. Assuming these results are not contradicted this says that the extra dereference where we need it does not add measurable to the overhead in the Linus network stack. Performance wise this should be good enough to allow merging the code into the linux kernel, as it does not measurably affect networking when we do not have multiple containers in use. I have a few questions about merging code into the linux kernel. * How do you plan to do that ? One small comprehensible piece at a time. Basically some variant of etun should not be a problem to merge then I have to get some part of the network namespace code merged, and the concept accepted. Once the basic acceptance occurs it just becomes a long slog of merging more and more patches. * When do you expect to have the network namespace into mainline ? My current goal is to finish my rebase against 2.6.linus_lastest in the next couple of days after having figured out how to deal with sysfs. I have been doing reviewing in more code then I know what to do with, and fighting some very strange bugs during the stabilization window. Which has kept me from doing additional development. Plus I have had a cold. * Are Dave Miller and Alexey Kuznetov aware of the network namespace ? Aware yes, reviewed not yet. I believe Alexey is a little more familiar with the OpenVZ work. The high level concepts still apply. * Did they saw your patchset or ever know it exists ? Yes. * Do you have any feedbacks from netdev about the network namespace ? Not really. Except that Dave Miller wanted to review what I posted last time but the timing was bad and he failed to get around to it. To be fully satisfactory how we get the packets to the namespace still appears to need work. We have overhead in routing. That may simply be the cost of performing routing or there may be some optimizations opportunities there. We have about the same overhead when performing bridging which I actually find more surprising, as the bridging code should involve less packet handling. Yep. I will try to figure out what is happening. Thanks. Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. The routing was optimized for the loopback, no ? Why can't we do the same for the etun device ? I have no problem with it if we can use valid optimizations. Avoiding a packet copy when the packet is marked as having a second copy somewhere else does not sound like a valid optimization to me. Routing through both network namespaces so that we can set up a dst cache entry that takes you to the final destination I am will to working with. Perhaps something that hits this piece of the etun driver, so we don't have to make a second set of routing decisions. if (skb-dst) skb-dst = dst_pop(skb-dst); /* Allow for smart routing */ tcpdump at any phase of the process should be able to do the right thing. Mostly I care right now in that it is interesting to know where the performance overhead is coming from. Unless it is something of a merge stopper I don't much care about how we are going to fix it yet, especially if it is only cross network namespace traffic. If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626)
I got this warning with the current net-2.6.22 tree: KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626) Leak s=4294967292 3 Can't say what exactly triggered it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Kirill Korotaev [EMAIL PROTECTED] writes: Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. yes, we can hack the bridge, so that packets coming out of eth devices can go directly to the container and get out of veth devices from inside the container. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. Why not optimize both? :) If the optimizations are safe and correct I don't have a problem. When we seem to have multiple copies of a packet in circulation and we skip a what appears to be a required copy on write, I'm dubious. Although the more I look at suggested optimization the less dubious I am as it appears all we are skipping is a ttl decrement and the cow flag exclusively applies to the data chunk and not the header chunk of the packet whatever that means. However we still need to guard against a loop in our routing table setup between multiple guests. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
Evgeniy Polyakov [EMAIL PROTECTED] writes: Jenkins hash is far from being simple to crack, although with some knowledge it can be done faster. TCP tends to be initialized early before there is anything good in the entropy pool. static void init_std_data(struct entropy_store *r) { struct timeval tv; unsigned long flags; spin_lock_irqsave(r-lock, flags); r-entropy_count = 0; spin_unlock_irqrestore(r-lock, flags); do_gettimeofday(tv); add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4); add_entropy_words(r, (__u32 *)utsname(), sizeof(*(utsname()))/4); } utsname is useless here because it runs before user space has a chance to set it. The only truly variable thing is the boot time, which can be guessed with the ns part being brute forced. To make it secure you would need to do regular rehash like the routing cache which would pick up true randomness on the first rehash. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function II
Andi Kleen [EMAIL PROTECTED] writes: The only truly variable thing is the boot time, which can be guessed Actually you don't need to guess it. It's in any TCP timestamp. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626)
On Wed, 28 Mar 2007, Patrick McHardy wrote: I got this warning with the current net-2.6.22 tree: KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626) Leak s=4294967292 3 Can't say what exactly triggered it. It seems I'm being guilty to this one, Dave please apply to net-2.6.22 (besides this I think the tcp_sync_left_out should be changed but I'll prepare a patch for that later). Btw, how should this kind of email with some non-patch description+patch be formatted?). [PATCH] [TCP]: Timedout loop must skip SACKed skbs too while marking Marking skb with both S and L is invalid, and that could easily happen in the timedout loop. Later on the tcp_sync_left_out reduces sacked_out if lost_out + sacked_out packets_out and then eventually sacked_out underflows triggering a debug trap in tcp_clean_rtx_queue. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d116887..7a59ffe 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1777,7 +1777,8 @@ static void tcp_timedout_mark_forward(st if (skb == tcp_send_head(sk) || !tcp_skb_timedout(sk, skb)) break; /* Could be lost already from a previous timedout check */ - if (!(TCP_SKB_CB(skb)-sacked TCPCB_LOST)) { + if (!(TCP_SKB_CB(skb)-sacked +(TCPCB_LOST|TCPCB_SACKED_ACKED))) { TCP_SKB_CB(skb)-sacked |= TCPCB_LOST; tp-lost_out += tcp_skb_pcount(skb); tcp_verify_retransmit_hint(tp, skb); -- 1.4.2
[PATCH] [TCP]: Rexmit hint must be cleared instead of setting it
Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 7a59ffe..c855791 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1763,7 +1763,7 @@ static void tcp_verify_retransmit_hint(s !(TCP_SKB_CB(skb)-sackedTCPCB_SACKED_RETRANS) before(TCP_SKB_CB(skb)-seq, TCP_SKB_CB(tp-retransmit_skb_hint)-seq)) - tp-retransmit_skb_hint = skb; + tp-retransmit_skb_hint = NULL; } /* Forward walk starting from until a not timedout skb is encountered, timeout -- 1.4.2
Re: RFC: Established connections hash function
On 28 Mar 2007 16:14:17 +0200 Andi Kleen [EMAIL PROTECTED] wrote: TCP tends to be initialized early before there is anything good in the entropy pool. static void init_std_data(struct entropy_store *r) { struct timeval tv; unsigned long flags; spin_lock_irqsave(r-lock, flags); r-entropy_count = 0; spin_unlock_irqrestore(r-lock, flags); do_gettimeofday(tv); add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4); add_entropy_words(r, (__u32 *)utsname(), sizeof(*(utsname()))/4); } utsname is useless here because it runs before user space has a chance to set it. The only truly variable thing is the boot time, which can be guessed with the ns part being brute forced. To make it secure you would need to do regular rehash like the routing cache which would pick up true randomness on the first rehash. Good point, but : 1) We can now use struct timespec to get more bits in init_std_data() 2) tcp ehash salt is initialized at first socket creation, not boot time. Maybe we have more available entropy at this point. 3) We dont want to be 'totally secure'. We only want to raise the level, and eventually see if we have to spend more time on this next year(s). AFAIK we had two different reports from people being hit by the flaw of previous hash. Not really a critical issue. 4) We could add a hard limit on the length of one chain. Even if the bad guys discover a flaw, it wont hurt too much. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [RFC] [TCP]: Catch skb with S+L bugs earlier
SACKED_ACKED and LOST are mutually exclusive, thus this condition is bug with SACK (IMHO). NewReno, however, could get enough duplicate ACKs which increment sacked_out, so it makes sense to do this kind of limitting for non-SACK TCP but not for SACK-enabled one. Perhaps the author had that in mind but did the logic accidently wrong way around? Eventually these bugs trigger traps in the tcp_clean_rtx_queue but it's much more informative to do this here (excludes some other possible bugs). Maybe this BUG_TRAP is too expensive to be included everywhere in the TCP code. Should there be some #if to surround it? Compile tested. Sadly enough I don't have time for couple of weeks to test this as it would require some setuping, and besides, my test machines are occupied currently to other work, but this might also be net-2.6 (or even stable) material if it really works (feel free to cut this paragraph or part of it if you decide to include this :-)). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/tcp.h |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index fe1c4f0..3c8dd13 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -733,9 +733,10 @@ static inline __u32 tcp_current_ssthresh static inline void tcp_sync_left_out(struct tcp_sock *tp) { - if (tp-rx_opt.sack_ok - (tp-sacked_out = tp-packets_out - tp-lost_out)) + if (tp-sacked_out + tp-lost_out tp-packets_out) { + BUG_TRAP(!tp-rx_opt.sack_ok); tp-sacked_out = tp-packets_out - tp-lost_out; + } tp-left_out = tp-sacked_out + tp-lost_out; } -- 1.4.2
Re: RFC: Established connections hash function
On Wed, Mar 28, 2007 at 03:50:47PM +0200, Eric Dumazet wrote: 1) We can now use struct timespec to get more bits in init_std_data() That would be a good change, but i don't think it would help that much. If you know the hardware (e.g. webhost farms tend to have quite predictive kit) and the kernel binary and the boot offset from the timestamp you can probably guess the range of ns pretty closely (let's say down to a few ms). With that it's a small range to search. 2) tcp ehash salt is initialized at first socket creation, not boot time. Maybe we have more available entropy at this point. Sockets are created early too. It would be a little better, but probably not much. The only true random seed is disk, keyboard/mouse and previous state. previous state is typically a relatively late init script, probably after the first socket. Servers tend to have no disk/mouse activity. Disk may work if you manage to put it after the root mount, but you lose on diskless systems. e.g. if the nfsd is built in that wouldn't work though because it would create sockets before that. Getting entropy from network interrupts would avoid the the diskless issue, but people are paranoid about that. 3) We dont want to be 'totally secure'. We only want to raise the level, and eventually see if we have to spend more time on this next year(s). AFAIK we had two different reports from people being hit by the flaw of previous hash. Not really a critical issue. Yes, but you probably want a complexity of at least 10^5-10^6 to be any useful. I don't think you will get that early in boot from random unless you use hardware support. 4) We could add a hard limit on the length of one chain. Even if the bad guys discover a flaw, it wont hurt too much. Or just use the trie? It has other advantages too :) -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: SAD sometimes has double SAs.
Last night I browsed the ipsec-tools code and saw places in racoon where improvement would actually fix this problem. I am working on a patch and will pursue this on the ipsec-tools mailing list. I apologize for any inconvenience. Eric, sorry as I know you already patched lspp kernel for testing. I strongly think this should be fixed in userspace. The permission check before flushing does still need to be added to kernel. Regards, Joy On Mon, 2007-03-26 at 19:04 -0600, Joy Latten wrote: On Mon, 2007-03-26 at 14:48 -0700, David Miller wrote: From: Eric Paris [EMAIL PROTECTED] Date: Mon, 26 Mar 2007 17:34:59 -0400 I'm not at all able to speak on the correctness or validity of the solution, Neither am I yet :) but shouldn't the ipv6 case be a not an || like the ipv4 case? Isn't this going to match all sorts of things? Did you test this patch on ipv6 and see it to solve your problem? I'm also not enjoying the formatting in the ipv6 part where the first time you have the cast on the same time as the object but not the second part where x-props.saddr.a6 is on its own little line. Also, I want to understand what is going to tear down these other direction fake entries later on? I think I can review this patch better if I understand that. I am going to refer to the other-direction-placeholder as the fake entry. And the larval SA that gets created for the new spi as result of a GETSPI message as the real entry. The fake entry gets created when the real entry does and does not have an spi. It shares some of the same properties of a real larval SA (they are created using same code) and it's state is marked as XFRM_STATE_ACQ. The real entry has a timeout. So, should IKE negotiation fail, take too long, etc... it will eventually timeout and be deleted. So does the fake entry. It will timeout and should be eventually deleted. (I will test this part tomorrow for assurance.) When the IKE negotiations are successful, xfrm_state_add() and the xfrm_state_update() look for larval SAs in that they look for an SA with same src, dst, etc... and with state==XFRM_STATE_ACQUIRE. Any that are found are deleted and new SA added. This removes the real larval SA, and should also remove the fake entry too. Of course, this is all based on my assumption that IKE will install two SAs, one for incoming and one for outgoing. Hopefully this answers how fake entries will be removed. Admittedly, I may miss something or didn't understand something correctly as I learn the code, so please let me know. There may even be a better solution that I don't readily see. As it stands, this looks to me like a workaround for an improperly implemented IPSEC daemon. Joy states it as saying that the current code requires the keying daemon to manage it's SAs, and I wonder whether any other implementation is even valid. My big mouth. :-) But yes, I do think more SA management in userspace would be ideal. This fix will hopefully ensure kernel doesn't send any extra acquires regardless. Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET : secure sequence number functions can use nsec resolution instead of usec
Hello David We could use the nanosec resolution for various functions defined in drivers/char/random.c (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number()) I am not sure if it's a netdev related patch or core kernel, so I have CC Andrew. Thank you [PATCH] NET : random functions can use nsec resolution instead of usec In order to get more randomness for secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can use the high resolution time services, providing nanosec resolution. I've also done two kmalloc()/kzalloc() conversions. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.21-rc5/drivers/char/random.c +++ linux-2.6.21-rc5-ed/drivers/char/random.c @@ -881,15 +881,15 @@ EXPORT_SYMBOL(get_random_bytes); */ static void init_std_data(struct entropy_store *r) { - struct timeval tv; + ktime_t now; unsigned long flags; spin_lock_irqsave(r-lock, flags); r-entropy_count = 0; spin_unlock_irqrestore(r-lock, flags); - do_gettimeofday(tv); - add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4); + now = ktime_get_real(); + add_entropy_words(r, (__u32 *)now, sizeof(now)/4); add_entropy_words(r, (__u32 *)utsname(), sizeof(*(utsname()))/4); } @@ -911,14 +911,12 @@ void rand_initialize_irq(int irq) return; /* -* If kmalloc returns null, we just won't use that entropy +* If kzalloc returns null, we just won't use that entropy * source. */ - state = kmalloc(sizeof(struct timer_rand_state), GFP_KERNEL); - if (state) { - memset(state, 0, sizeof(struct timer_rand_state)); + state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL); + if (state) irq_timer_state[irq] = state; - } } #ifdef CONFIG_BLOCK @@ -927,14 +925,12 @@ void rand_initialize_disk(struct gendisk struct timer_rand_state *state; /* -* If kmalloc returns null, we just won't use that entropy +* If kzalloc returns null, we just won't use that entropy * source. */ - state = kmalloc(sizeof(struct timer_rand_state), GFP_KERNEL); - if (state) { - memset(state, 0, sizeof(struct timer_rand_state)); + state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL); + if (state) disk-random = state; - } } #endif @@ -1469,7 +1465,6 @@ late_initcall(seqgen_init); __u32 secure_tcpv6_sequence_number(__be32 *saddr, __be32 *daddr, __be16 sport, __be16 dport) { - struct timeval tv; __u32 seq; __u32 hash[12]; struct keydata *keyptr = get_keyptr(); @@ -1485,8 +1480,7 @@ __u32 secure_tcpv6_sequence_number(__be3 seq = twothirdsMD4Transform((const __u32 *)daddr, hash) HASH_MASK; seq += keyptr-count; - do_gettimeofday(tv); - seq += tv.tv_usec + tv.tv_sec * 100; + seq += ktime_get_real().tv64; return seq; } @@ -1521,7 +1515,6 @@ __u32 secure_ip_id(__be32 daddr) __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport) { - struct timeval tv; __u32 seq; __u32 hash[4]; struct keydata *keyptr = get_keyptr(); @@ -1543,12 +1536,11 @@ __u32 secure_tcp_sequence_number(__be32 * As close as possible to RFC 793, which * suggests using a 250 kHz clock. * Further reading shows this assumes 2 Mb/s networks. -* For 10 Mb/s Ethernet, a 1 MHz clock is appropriate. +* For 10 Gb/s Ethernet, a 1 GHz clock is appropriate. * That's funny, Linux has one built in! Use it! * (Networks are faster now - should this be increased?) */ - do_gettimeofday(tv); - seq += tv.tv_usec + tv.tv_sec * 100; + seq += ktime_get_real().tv64; #if 0 printk(init_seq(%lx, %lx, %d, %d) = %d\n, saddr, daddr, sport, dport, seq); @@ -1598,7 +1590,6 @@ u32 secure_ipv6_port_ephemeral(const __b u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport) { - struct timeval tv; u64 seq; __u32 hash[4]; struct keydata *keyptr = get_keyptr(); @@ -1611,8 +1602,7 @@ u64 secure_dccp_sequence_number(__be32 s seq = half_md4_transform(hash, keyptr-secret); seq |= ((u64)keyptr-count) (32 - HASH_BITS); - do_gettimeofday(tv); - seq += tv.tv_usec + tv.tv_sec * 100; + seq += ktime_get_real().tv64; seq = (1ull 48) - 1; #if 0 printk(dccp init_seq(%lx, %lx, %d, %d) = %d\n, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More
Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications
* Jarek Poplawski [EMAIL PROTECTED] 2007-03-28 13:19 I hope I'm wrong, but isn't this at the cost of admins working with long rules' sets, which (probably) take extra time now? That's right, it makes the insert and delete operation more expensive. A compromise would be to delay the flushing and wait for some time (default 2 seconds) whether more rules or routes are being added before flushing. [NET] fib_rules: delay route cache flush by ip_rt_min_delay Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: linux/net-2.6.22/net/decnet/dn_rules.c === --- linux.orig/net-2.6.22/net/decnet/dn_rules.c 2007-03-28 17:41:22.0 +0200 +++ linux/net-2.6.22/net/decnet/dn_rules.c 2007-03-28 17:41:39.0 +0200 @@ -242,7 +242,7 @@ static u32 dn_fib_rule_default_pref(void static void dn_fib_rule_flush_cache(void) { - dn_rt_cache_flush(0); + dn_rt_cache_flush(-1); } static struct fib_rules_ops dn_fib_rules_ops = { Index: linux/net-2.6.22/net/ipv4/fib_rules.c === --- linux.orig/net-2.6.22/net/ipv4/fib_rules.c 2007-03-28 17:41:18.0 +0200 +++ linux/net-2.6.22/net/ipv4/fib_rules.c 2007-03-28 17:41:30.0 +0200 @@ -300,7 +300,7 @@ static size_t fib4_rule_nlmsg_payload(st static void fib4_rule_flush_cache(void) { - rt_cache_flush(0); + rt_cache_flush(-1); } static struct fib_rules_ops fib4_rules_ops = { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inline net_device_stats
Rusty Russell wrote: Hi all, Does something like this make sense for future drivers? Cheers, Rusty. === Network drivers which keep stats allocate their own stats structure then write a get_stats() function to return them. It would be nice if this were done by default. 1) Add a new stats field to struct net_device. 2) Add a new feature field to say this driver uses the internal one 3) Have a default get_stats which returns NULL if that feature not set. 4) Change callers to check result of get_stats call for NULL, not if -get_stats is set. This should not break backwards compatibility with older drivers, yet allow modern drivers to shed some boilerplate code. Lightly tested: works for a modified lguest network driver. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- 0 files changed diff -r 1ccab0a087b7 arch/s390/appldata/appldata_net_sum.c --- a/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 13:46:10 2007 +1000 +++ b/arch/s390/appldata/appldata_net_sum.c Tue Mar 27 14:28:47 2007 +1000 @@ -108,10 +108,10 @@ static void appldata_get_net_sum_data(vo collisions = 0; read_lock(dev_base_lock); for (dev = dev_base; dev != NULL; dev = dev-next) { - if (dev-get_stats == NULL) { + stats = dev-get_stats(dev); + if (stats == NULL) { continue; } - stats = dev-get_stats(dev); rx_packets += stats-rx_packets; tx_packets += stats-tx_packets; rx_bytes += stats-rx_bytes; diff -r 1ccab0a087b7 drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c Tue Mar 27 13:46:10 2007 +1000 +++ b/drivers/net/bonding/bond_main.c Tue Mar 27 14:29:08 2007 +1000 @@ -3621,9 +3621,8 @@ static struct net_device_stats *bond_get read_lock_bh(bond-lock); bond_for_each_slave(bond, slave, i) { - if (slave-dev-get_stats) { - sstats = slave-dev-get_stats(slave-dev); - + sstats = slave-dev-get_stats(slave-dev); + if (sstats) { stats-rx_packets += sstats-rx_packets; stats-rx_bytes += sstats-rx_bytes; stats-rx_errors += sstats-rx_errors; diff -r 1ccab0a087b7 drivers/parisc/led.c --- a/drivers/parisc/led.c Tue Mar 27 13:46:10 2007 +1000 +++ b/drivers/parisc/led.c Tue Mar 27 14:29:17 2007 +1000 @@ -372,9 +372,9 @@ static __inline__ int led_get_net_activi continue; if (LOOPBACK(in_dev-ifa_list-ifa_local)) continue; - if (!dev-get_stats) + stats = dev-get_stats(dev); + if (!stats) continue; - stats = dev-get_stats(dev); rx_total += stats-rx_packets; tx_total += stats-tx_packets; } diff -r 1ccab0a087b7 include/linux/netdevice.h --- a/include/linux/netdevice.h Tue Mar 27 13:46:10 2007 +1000 +++ b/include/linux/netdevice.h Tue Mar 27 14:21:09 2007 +1000 @@ -325,6 +325,7 @@ struct net_device #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ #define NETIF_F_GSO2048/* Enable software GSO. */ #define NETIF_F_LLTX 4096/* LockLess TX */ +#define NETIF_F_INTERNAL_STATS 8192/* Use stats structure in net_device */ /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 @@ -349,6 +350,7 @@ struct net_device struct net_device_stats* (*get_stats)(struct net_device *dev); + struct net_device_stats stats; #ifdef CONFIG_WIRELESS_EXT /* List of functions to handle Wireless Extensions (instead of ioctl). diff -r 1ccab0a087b7 net/core/dev.c --- a/net/core/dev.cTue Mar 27 13:46:10 2007 +1000 +++ b/net/core/dev.cTue Mar 27 14:30:05 2007 +1000 @@ -825,7 +825,6 @@ static int default_rebuild_header(struct return 1; } - /** * dev_open- prepare an interface for use. * @dev: device to open @@ -2120,9 +2119,9 @@ void dev_seq_stop(struct seq_file *seq, static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) { - if (dev-get_stats) { - struct net_device_stats *stats = dev-get_stats(dev); - + struct net_device_stats *stats = dev-get_stats(dev); + + if (stats) { seq_printf(seq, %6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu %8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n, dev-name, stats-rx_bytes, stats-rx_packets, @@ -3146,6 +3145,13 @@ out: mutex_unlock(net_todo_run_mutex); } +static struct net_device_stats *maybe_internal_stats(struct net_device *dev) +{ + if (dev-features NETIF_F_INTERNAL_STATS) + return dev-stats; + return NULL; +} + /** * alloc_netdev - allocate network device * @sizeof_priv: size of private data to allocate space for @@ -3181,6 +3187,7 @@ struct
LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).
On Wed, 28 Mar 2007, Joy Latten wrote: Eric, sorry as I know you already patched lspp kernel for testing. I think it'd be better to have the lspp kernel join the upstream workflow process, rather than being a shortcut into RHEL. Please consider creating an lspp git tree (based off Linus' tree), then once patches there are tested and ready to submit upstream, post them here or selinux-list, where they can be reviewed and applied to either my or DaveM's git tree. From there, they'll be picked up in -mm for even wider testing then be merged into mainline as appropriate. Then, they can be incorporated into distro devel kernels when they update their kernels, or backported to stable distro kernels as already reviewed tested upstream patches. If there are any objections, please respond. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : secure sequence number functions can use nsec resolution instead of usec
On Wed, 28 Mar 2007, Eric Dumazet wrote: Hello David We could use the nanosec resolution for various functions defined in drivers/char/random.c (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number()) I am not sure if it's a netdev related patch or core kernel, so I have CC Andrew. Thank you [PATCH] NET : random functions can use nsec resolution instead of usec In order to get more randomness for secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can use the high resolution time services, providing nanosec resolution. I've also done two kmalloc()/kzalloc() conversions. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Looks good to me. Acked-by: James Morris [EMAIL PROTECTED] - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).
On Wednesday, March 28 2007 12:20:24 pm James Morris wrote: On Wed, 28 Mar 2007, Joy Latten wrote: Eric, sorry as I know you already patched lspp kernel for testing. I think it'd be better to have the lspp kernel join the upstream workflow process, rather than being a shortcut into RHEL. Please consider creating an lspp git tree (based off Linus' tree), then once patches there are tested and ready to submit upstream, post them here or selinux-list, where they can be reviewed and applied to either my or DaveM's git tree. From there, they'll be picked up in -mm for even wider testing then be merged into mainline as appropriate. Then, they can be incorporated into distro devel kernels when they update their kernels, or backported to stable distro kernels as already reviewed tested upstream patches. If there are any objections, please respond. I think the original intent of the LSPP kernel series was to test patches before they were submitted to a wider audience (not too different from what you are describing). Eric Paris became the LSPP/MLS group's Andrew Morton if you will :) However, for whatever reason, things appear to have stumbled a bit in recent months and I think making an effort to move to a more standard approach based on current kernel development would be a step in the right direction. This would probably make backports a bit more difficult but Eric's a smart guy and I'm sure he wouldn't mind :) Does anyone have access to a public site we could use to host a git tree? If no one has anything available (or is willing to maintain the tree) I might be able to do something. -- paul moore linux security @ hp - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : secure sequence number functions can use nsec resolution instead of usec
On Wed, Mar 28, 2007 at 05:43:22PM +0200, Eric Dumazet wrote: Hello David We could use the nanosec resolution for various functions defined in drivers/char/random.c (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number()) I am not sure if it's a netdev related patch or core kernel, so I have CC Andrew. Thank you [PATCH] NET : random functions can use nsec resolution instead of usec In order to get more randomness for secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can use the high resolution time services, providing nanosec resolution. It's also a little faster because it avoids one division. You didn't mention the initial seed change. There you could have removed the useless utsname initialization too. I've also done two kmalloc()/kzalloc() conversions. Normally that should be separate patches -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).
On Wed, 2007-03-28 at 12:20 -0400, James Morris wrote: On Wed, 28 Mar 2007, Joy Latten wrote: Eric, sorry as I know you already patched lspp kernel for testing. I think it'd be better to have the lspp kernel join the upstream workflow process, rather than being a shortcut into RHEL. Please consider creating an lspp git tree (based off Linus' tree), then once patches there are tested and ready to submit upstream, post them here or selinux-list, where they can be reviewed and applied to either my or DaveM's git tree. From there, they'll be picked up in -mm for even wider testing then be merged into mainline as appropriate. Then, they can be incorporated into distro devel kernels when they update their kernels, or backported to stable distro kernels as already reviewed tested upstream patches. If there are any objections, please respond. It is definitely NOT a shortcut into RHEL. Nor is this government cert effort (LSPP) being driven primary on RHEL code. Not a single patch will go into RHEL until it is upstream or in a tree to go upstream. That is a given. All development is being done upstream and then being ported back to RHEL. The LSPP kernel she mentioned is at this time merely a testing ground for patches which may not quite be upstream ready or are upstream but aren't in RHEL proper yet. As it stands now the LSPP kernel is carrying 22 patches on top of RHEL 5 GA (which is 2.6.18 based) of those let me give you a breakdown. 12 are network related. 10 of those are in Linus's kernel 1 is not yet in miller's tree but i would expect it soon 1 is going to likely be dropped according to this thread 10 remaining patches are audit patches. There is already a viro/audit-current.git tree on kernel.org where these should be appearing. I could make this a little easier for the audit tree maintainer and make my own tree which he could pull from and then push to Linus but a tree which should hold all of these does exist. All of them have been sent to the linux-audit mailing list and have been commented on there. I don't want to give the impression that upstream is not coming first. All the work is being done upstream either on netdev or linux-audit and then I pull it back into this LSPP kernel she talked about so that people interested primarily in the testing necessary to meet that particular government standard have a neat tidy little prebuild rpm to work with. Eventually all of these will show up in RHEL, but not until all of the patches i'm dealing with are upstream. -Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).
On Wed, 28 Mar 2007, Eric Paris wrote: It is definitely NOT a shortcut into RHEL. Ok, that was a poor choice of words on my part. I don't want to give the impression that upstream is not coming first. All the work is being done upstream either on netdev or linux-audit and then I pull it back into this LSPP kernel she talked about so that people interested primarily in the testing necessary to meet that particular government standard have a neat tidy little prebuild rpm to work with. Eventually all of these will show up in RHEL, but not until all of the patches i'm dealing with are upstream. It seems my understanding wasn't clear on the overall workflow. If the consensus is to stay with this scheme, then please disregard my previous post. -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
another critical bug ?
Something more With all kernel debug enabled it was not giving this panic (maybe cause system becomes too slow). vanilla kernel 2.6.20.3 with htb patch applied, ethernet cards RTL8139 If u need anythign more - inform me. Not sure it is iproute2, but if you can, just point me to right direction, to who i need to report, it is also happening on interface flood, when i bring it down: (some data not accepted by kernel maillist, changed to *) Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: Session 1 closed for client 00:16:ec:7e:47:ea (172.16.102.2) on eth1 Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: Sent PADT Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: PADT: Generic-Error: ** Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: PADT: Generic-Error: Received PADT from peer Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: PADT: Generic-Error: * Mar 28 22:14:36 OFFICE-PPPOE pppoe-server[1456]: Sent PADT Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148236] BUG: unable to handle kernel paging request Mar 28 20:13:29 OFFICE-PPPOE at virtual address 5b5a596c Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148497] printing eip: Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148625] *pde = Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148743] Oops: [#1] Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148798] Mar 28 20:13:29 OFFICE-PPPOE SMP Mar 28 20:13:29 OFFICE-PPPOE Mar 28 20:13:29 OFFICE-PPPOE [ 1758.148992] Modules linked in: Mar 28 20:13:29 OFFICE-PPPOE netconsole Mar 28 20:13:29 OFFICE-PPPOE xt_mac Mar 28 20:13:29 OFFICE-PPPOE xt_tcpmss Mar 28 20:13:29 OFFICE-PPPOE ipt_TCPMSS Mar 28 20:13:29 OFFICE-PPPOE ipt_REJECT Mar 28 20:13:29 OFFICE-PPPOE ts_bm Mar 28 20:13:29 OFFICE-PPPOE xt_string Mar 28 20:13:29 OFFICE-PPPOE ipt_ttl Mar 28 20:13:29 OFFICE-PPPOE ifb Mar 28 20:13:29 OFFICE-PPPOE iptable_mangle Mar 28 20:13:29 OFFICE-PPPOE xt_MARK Mar 28 20:13:29 OFFICE-PPPOE xt_mark Mar 28 20:13:29 OFFICE-PPPOE pppoe Mar 28 20:13:29 OFFICE-PPPOE pppox Mar 28 20:13:29 OFFICE-PPPOE ppp_generic Mar 28 20:13:29 OFFICE-PPPOE slhc Mar 28 20:13:29 OFFICE-PPPOE xt_tcpudp Mar 28 20:13:29 OFFICE-PPPOE em_nbyte Mar 28 20:13:29 OFFICE-PPPOE cls_tcindex Mar 28 20:13:29 OFFICE-PPPOE act_gact Mar 28 20:13:29 OFFICE-PPPOE cls_rsvp Mar 28 20:13:29 OFFICE-PPPOE sch_htb Mar 28 20:13:29 OFFICE-PPPOE cls_fw Mar 28 20:13:29 OFFICE-PPPOE act_mirred Mar 28 20:13:29 OFFICE-PPPOE em_u32 Mar 28 20:13:29 OFFICE-PPPOE sch_red Mar 28 20:13:29 OFFICE-PPPOE sch_sfq Mar 28 20:13:29 OFFICE-PPPOE sch_tbf Mar 28 20:13:29 OFFICE-PPPOE sch_teql Mar 28 20:13:29 OFFICE-PPPOE cls_basic Mar 28 20:13:29 OFFICE-PPPOE sch_gred Mar 28 20:13:29 OFFICE-PPPOE act_pedit Mar 28 20:13:29 OFFICE-PPPOE sch_hfsc Mar 28 20:13:29 OFFICE-PPPOE cls_rsvp6 Mar 28 20:13:29 OFFICE-PPPOE sch_ingress Mar 28 20:13:29 OFFICE-PPPOE em_meta Mar 28 20:13:29 OFFICE-PPPOE em_text Mar 28 20:13:29 OFFICE-PPPOE act_ipt Mar 28 20:13:29 OFFICE-PPPOE sch_dsmark Mar 28 20:13:29 OFFICE-PPPOE sch_prio Mar 28 20:13:29 OFFICE-PPPOE sch_netem Mar 28 20:13:29 OFFICE-PPPOE act_simple Mar 28 20:13:29 OFFICE-PPPOE cls_u32 Mar 28 20:13:29 OFFICE-PPPOE em_cmp Mar 28 20:13:29 OFFICE-PPPOE sch_cbq Mar 28 20:13:29 OFFICE-PPPOE cls_route Mar 28 20:13:29 OFFICE-PPPOE iptable_nat Mar 28 20:13:29 OFFICE-PPPOE nf_conntrack_ipv4 Mar 28 20:13:29 OFFICE-PPPOE ipt_LOG Mar 28 20:13:29 OFFICE-PPPOE ipt_MASQUERADE Mar 28 20:13:29 OFFICE-PPPOE ipt_REDIRECT Mar 28 20:13:29 OFFICE-PPPOE nf_nat Mar 28 20:13:29 OFFICE-PPPOE nf_conntrack Mar 28 20:13:29 OFFICE-PPPOE nfnetlink Mar 28 20:13:29 OFFICE-PPPOE iptable_filter Mar 28 20:13:29 OFFICE-PPPOE ip_tables Mar 28 20:13:29 OFFICE-PPPOE x_tables Mar 28 20:13:29 OFFICE-PPPOE 8021q Mar 28 20:13:29 OFFICE-PPPOE tun Mar 28 20:13:29 OFFICE-PPPOE via_velocity Mar 28 20:13:29 OFFICE-PPPOE via_rhine Mar 28 20:13:29 OFFICE-PPPOE sis900 Mar 28 20:13:29 OFFICE-PPPOE ne2k_pci Mar 28 20:13:29 OFFICE-PPPOE 8390 Mar 28 20:13:29 OFFICE-PPPOE skge Mar 28 20:13:29 OFFICE-PPPOE tg3 Mar 28 20:13:29 OFFICE-PPPOE 8139too Mar 28 20:13:29 OFFICE-PPPOE e1000 Mar 28 20:13:29 OFFICE-PPPOE e100 Mar 28 20:13:29 OFFICE-PPPOE block2mtd Mar 28 20:13:29 OFFICE-PPPOE usb_storage Mar 28 20:13:29 OFFICE-PPPOE mtdblock Mar 28 20:13:29 OFFICE-PPPOE mtd_blkdevs Mar 28 20:13:29 OFFICE-PPPOE usbhid Mar 28 20:13:29 OFFICE-PPPOE uhci_hcd Mar 28 20:13:29 OFFICE-PPPOE ehci_hcd Mar 28 20:13:29 OFFICE-PPPOE ohci_hcd Mar 28 20:13:29 OFFICE-PPPOE usbcore Mar 28 20:13:29 OFFICE-PPPOE Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153713] CPU:0 Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153716] EIP:0060:[c02113c7]Not tainted VLI Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153718] EFLAGS: 00010202 (2.6.20.3- build-0005 #18) Mar 28 20:13:29 OFFICE-PPPOE [ 1758.153949] EIP is at netif_rx+0x18/0x126 Mar 28 20:13:29 OFFICE-PPPOE [ 1758.154009] eax: c0c42800 ebx: 5b5a5958 ecx: 0001 edx: c6541c80 Mar 28 20:13:29 OFFICE-PPPOE [ 1758.154073] esi: c0319800 edi: c6541c80 ebp: c02f5f14 esp: c02f5f04 Mar 28 20:13:29 OFFICE-PPPOE [ 1758.154135]
Re: L2 network namespace benchmarking
If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Just some paranoid ramblings - one needs to look beyond just whether or not the performance of a bulk transfer test (eg TCP_STREAM) remains able to hit link-rate. One has to also consider the change in service demand (the normalization of CPU util and throughput). Also, with functionality like TSO in place, the ability to pass very large things down the stack can help cover for a multitude of path-length sins. And with either multiple 1G or 10G NICs becoming more and more prevalent, we have another one of those NIC speed vs CPU speed switch-overs, so maintaining single-NIC 1 gigabit throughput, while necessary, isn't (IMO) sufficient. S, it becomes very important to go beyond just TCP_STREAM tests when evaluating these sorts of things. Another test to run would be the TCP_RR test. TCP_RR with single-byte request/response sizes will bypass the TSO stuff, and the transaction rate will be more directly affected by the change in path length than a TCP_STREAM test. It will also show-up quite clearly in the service demand. Now, with NICs doing interrupt coalescing, if the NIC is strapped poorly (IMO) then you may not see a change in transaction rate - it may be getting limited artifically by the NIC's interrupt coalescing. So, one has to fall-back on service demand, or better yet, disable the interrupt coalescing. Otherwise, measuring peak aggregate request/response becomes necessary. rick jones don't be blinded by bit-rate - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 17:49:03 +0200 * Jarek Poplawski [EMAIL PROTECTED] 2007-03-28 13:19 I hope I'm wrong, but isn't this at the cost of admins working with long rules' sets, which (probably) take extra time now? That's right, it makes the insert and delete operation more expensive. A compromise would be to delay the flushing and wait for some time (default 2 seconds) whether more rules or routes are being added before flushing. Another idea Thomas and I tossed around was to have some kind of way for the rule insertion to indicate that the flush should be deferred and I kind of prefer that explicitness. By default it's better the flush immediately, because the old behavior is totally unexpected. I insert a rule and it dosn't show up?, nobody expects that. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Xen netfront fixes for changed skbuff in net-2.6.22.git
Hi Herbert, I wonder if you've got a chance to look at netfront in light of the new stuff in davem's network tree (the stuff that's in http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.22.git). In particular, struct sk_buff has been changed so that nh has gone, and the replacement can be just an offset rather than a full pointer. This breaks the netfront because it tries to stash a page * in nh.raw. I had a quick look at it and couldn't see an easy fix, but I don't really understand what's going on in there. But you do. Any chance you could have a look at it, and at least give me some pointers about how to proceed? Thanks, J - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
From: Andi Kleen [EMAIL PROTECTED] Date: 28 Mar 2007 16:14:17 +0200 Evgeniy Polyakov [EMAIL PROTECTED] writes: Jenkins hash is far from being simple to crack, although with some knowledge it can be done faster. TCP tends to be initialized early before there is anything good in the entropy pool. Andi, you're being an idiot. You are spewing endless and uninformed bullshit about this secure hash topic, and it must stop now! You obviously didn't even read my patch, because if you did you would have seen that I don't initialize the random seed until MUCH MUCH later _EXACTLY_ to deal with this issue. In my patch the random seed is initialized when the first TCP or DCCP socket is created, at which point we'll have sufficient entropy. So please stop talking such nonsense about this topic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KERNEL: assertion ((int)tp-sacked_out = 0) failed at net/ipv4/tcp_input.c (2626)
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 16:24:40 +0300 (EEST) It seems I'm being guilty to this one, Dave please apply to net-2.6.22 (besides this I think the tcp_sync_left_out should be changed but I'll prepare a patch for that later). Btw, how should this kind of email with some non-patch description+patch be formatted?). Thanks for figuring out the problem so quickly, this formatting is fine. [PATCH] [TCP]: Timedout loop must skip SACKed skbs too while marking Marking skb with both S and L is invalid, and that could easily happen in the timedout loop. Later on the tcp_sync_left_out reduces sacked_out if lost_out + sacked_out packets_out and then eventually sacked_out underflows triggering a debug trap in tcp_clean_rtx_queue. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Patch applied, thanks a lot! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 16:31:50 +0300 (EEST) Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Better safe than sorry :-) We can start putting more aggressive assertions around if you'd like to get some invariants like that validated. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LSPP kernels (was Re: [PATCH]: SAD sometimes has double SAs).
From: Paul Moore [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 12:36:57 -0400 Does anyone have access to a public site we could use to host a git tree? If no one has anything available (or is willing to maintain the tree) I might be able to do something. It's not difficult to get an account on master.kernel.org and also there has been some success with infradead.org as well. James or someone else could help you get going, and he's gone through this process already :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: SAD sometimes has double SAs.
From: Joy Latten [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 09:15:15 -0600 Last night I browsed the ipsec-tools code and saw places in racoon where improvement would actually fix this problem. I am working on a patch and will pursue this on the ipsec-tools mailing list. I apologize for any inconvenience. No problem, thanks for the update. The permission check before flushing does still need to be added to kernel. Yep, I'll take care of integrating that patch. Thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it
From: David Miller [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 12:07:09 -0700 (PDT) From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 16:31:50 +0300 (EEST) Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Better safe than sorry :-) We can start putting more aggressive assertions around if you'd like to get some invariants like that validated. In case it's not clear I did apply this patch. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it
On Wed, 28 Mar 2007, David Miller wrote: From: David Miller [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 12:07:09 -0700 (PDT) From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 16:31:50 +0300 (EEST) Stupid error from my side. Even though now that I noticed this, I hoped it would have been an optimization but no, the counter hint is then incorrect. Thus clearing is necessary for now (I still suspect though that this path is never executed). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Better safe than sorry :-) We can start putting more aggressive assertions around if you'd like to get some invariants like that validated. In case it's not clear I did apply this patch. I think more this on Friday, maybe WARN_ON could be placed there so that no harm is being done if it ever get there, probably a candidate for unlikely, if this is really needed. Anyway, applying the NULL this patch does no harm (it was supposed to be that way right from the beginning)... :-) ...but lets keep in mind that the actual goal is, of course, to get rid of the hint altogether, rather than doing these clearing things... :-) -- i.
Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications
* David Miller [EMAIL PROTECTED] 2007-03-28 11:24 Another idea Thomas and I tossed around was to have some kind of way for the rule insertion to indicate that the flush should be deferred and I kind of prefer that explicitness. Right, although I believe the flag should not only defer it but not flush at all. This would be the optimal solution for scripts which can do a ip ro flush cache as they know what they're doing. By default it's better the flush immediately, because the old behavior is totally unexpected. I insert a rule and it dosn't show up?, nobody expects that. It's a tough call, I'd favour immediate flush as well but I can see the point in delaying by ip_rt_min_delay which can be configured by the user. So people can choose to immediately flush by setting it to 0. It would also be consistent to the flush after route changes, the same delay is used there. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Rexmit hint must be cleared instead of setting it
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 22:29:05 +0300 (EEST) ...but lets keep in mind that the actual goal is, of course, to get rid of the hint altogether, rather than doing these clearing things... :-) Of course. The retranmit and forward SKB hints should be easy to kill. In the worst case we can use the cached SACK sequence numbers and perhaps one auxiliary sequence number hint to guide the RB tree search for SKBs to retransmit. A space intensive, and therefore not very appealing, scheme is to have a linked list of SKBs which have been marked lost. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND] [NET] fib_rules: Flush route cache after rule modifications
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 21:34:36 +0200 So people can choose to immediately flush by setting it to 0. It would also be consistent to the flush after route changes, the same delay is used there. That's a good point I hadn't considered. Therefore, I think I'll apply your patch, considering that. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Rick Jones wrote: If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Just some paranoid ramblings - one needs to look beyond just whether or not the performance of a bulk transfer test (eg TCP_STREAM) remains able to hit link-rate. One has to also consider the change in service demand (the normalization of CPU util and throughput). Also, with functionality like TSO in place, the ability to pass very large things down the stack can help cover for a multitude of path-length sins. And with either multiple 1G or 10G NICs becoming more and more prevalent, we have another one of those NIC speed vs CPU speed switch-overs, so maintaining single-NIC 1 gigabit throughput, while necessary, isn't (IMO) sufficient. S, it becomes very important to go beyond just TCP_STREAM tests when evaluating these sorts of things. Another test to run would be the TCP_RR test. TCP_RR with single-byte request/response sizes will bypass the TSO stuff, and the transaction rate will be more directly affected by the change in path length than a TCP_STREAM test. It will also show-up quite clearly in the service demand. Now, with NICs doing interrupt coalescing, if the NIC is strapped poorly (IMO) then you may not see a change in transaction rate - it may be getting limited artifically by the NIC's interrupt coalescing. So, one has to fall-back on service demand, or better yet, disable the interrupt coalescing. Otherwise, measuring peak aggregate request/response becomes necessary. rick jones don't be blinded by bit-rate Thanks Rick, Do you have any pointer to help on benchmarking the network, perhaps a checklist or some scripts for netperf ? Regards. -- Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Do you have any pointer to help on benchmarking the network, perhaps a checklist or some scripts for netperf ? There are some scripts in doc/examples but they are probably a bit long in the tooth by now. The main writeup _I_ have on netperf would be the manual, which was recently updated for the 2.4.3 release. http://www.netperf.org/svn/netperf2/tags/netperf-2.4.3/doc/netperf.html or the current top of trunk: http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html There is also a [EMAIL PROTECTED] mailing list which one can join and have discussions about netperf, and a [EMAIL PROTECTED] if one wants to discuss actual netperf (netperf2 or netperf4) development. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
In my patch the random seed is initialized when the first TCP or DCCP socket is created, at which point we'll have sufficient entropy. See my discussion of this case in a later mail to Evgeniy -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BNX2]: Fix link interrupt problem.
[BNX2]: Fix link interrupt problem. bnx2_has_work()'s logic is flawed and can cause the driver to miss a link event. The fix is to compare the status block's attn_bits and attn_bits_ack to determine if there is a link event. Update version to 1.5.6. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index c12e5ea..d43fe28 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -54,8 +54,8 @@ #define DRV_MODULE_NAMEbnx2 #define PFX DRV_MODULE_NAME: -#define DRV_MODULE_VERSION 1.5.5 -#define DRV_MODULE_RELDATE February 1, 2007 +#define DRV_MODULE_VERSION 1.5.6 +#define DRV_MODULE_RELDATE March 28, 2007 #define RUN_AT(x) (jiffies + (x)) @@ -2033,8 +2033,8 @@ bnx2_has_work(struct bnx2 *bp) (sblk-status_tx_quick_consumer_index0 != bp-hw_tx_cons)) return 1; - if (((sblk-status_attn_bits STATUS_ATTN_BITS_LINK_STATE) != 0) != - bp-link_up) + if ((sblk-status_attn_bits STATUS_ATTN_BITS_LINK_STATE) != + (sblk-status_attn_bits_ack STATUS_ATTN_BITS_LINK_STATE)) return 1; return 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNX2]: Fix link interrupt problem.
From: Michael Chan [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 13:57:18 -0800 [BNX2]: Fix link interrupt problem. bnx2_has_work()'s logic is flawed and can cause the driver to miss a link event. The fix is to compare the status block's attn_bits and attn_bits_ack to determine if there is a link event. Update version to 1.5.6. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied, thanks Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : secure sequence number functions can use nsec resolution instead of usec
From: James Morris [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 12:31:56 -0400 (EDT) On Wed, 28 Mar 2007, Eric Dumazet wrote: Hello David We could use the nanosec resolution for various functions defined in drivers/char/random.c (secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number()) I am not sure if it's a netdev related patch or core kernel, so I have CC Andrew. Thank you [PATCH] NET : random functions can use nsec resolution instead of usec In order to get more randomness for secure_tcpv6_sequence_number(), secure_tcp_sequence_number(), secure_dccp_sequence_number() functions, we can use the high resolution time services, providing nanosec resolution. I've also done two kmalloc()/kzalloc() conversions. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Looks good to me. Acked-by: James Morris [EMAIL PROTECTED] To me too, patch applied, thanks everyone. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inline net_device_stats
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 08:52:24 -0700 It would make sense to do it per-cpu and 64 bit for the non-error counters. Good point, but that's a seperate change. I have no real objection to Rusty's change and if more than one driver uses this thing it's useful, so I'll apply his patch. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Xen netfront fixes for changed skbuff in net-2.6.22.git
Hi Jeremy: On Wed, Mar 28, 2007 at 11:36:17AM -0700, Jeremy Fitzhardinge wrote: I wonder if you've got a chance to look at netfront in light of the new stuff in davem's network tree (the stuff that's in http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.22.git). I've had a look at now and you can just stuff it into one of the other pointers that's still there. We just need to make sure that it is reset properly before we feed the packet into the stack. The pointer skb-dev is one candidate but there are plenty of others. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] Re: Xen netfront fixes for changed skbuff in net-2.6.22.git
Herbert Xu wrote: I've had a look at now and you can just stuff it into one of the other pointers that's still there. We just need to make sure that it is reset properly before we feed the packet into the stack. The pointer skb-dev is one candidate but there are plenty of others. Hm, I was wondering if there's a nicer way of getting the same result. Does it need to be done that way? Thanks, J - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
netpoll question
Hey all, I have netpoll question. How does netpoll work with MSI/X, NAPI, and nics that setup multiple RSS style receive queues for a single port? From what I can tell, if you're doing something like netdump using netpoll for IO, then you might never process incoming packets that get posted to the rx queues not associated with the main netdevice structure because netpoll only calls the poll() function for the main netdev struct. Not the dummy netdevs setup for multiple rx queues. Is this the case or am I confused? Thanks, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] Re: Xen netfront fixes for changed skbuff in net-2.6.22.git
On Wed, Mar 28, 2007 at 02:55:56PM -0700, Jeremy Fitzhardinge wrote: Hm, I was wondering if there's a nicer way of getting the same result. Does it need to be done that way? Actually you could use the skb-cb buffer which can store anything and doesn't need to be cleaned up. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sis190: new PHY support
Reported to work on the WinFast 761GXK8MB-RS motherboard. Plain 10/100 Mbps. Signed-off-by: Paul Gibbons [EMAIL PROTECTED] Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/sis190.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index b08508b..34463ce 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -324,6 +324,7 @@ static struct mii_chip_info { u32 feature; } mii_chip_table[] = { { Broadcom PHY BCM5461, { 0x0020, 0x60c0 }, LAN, F_PHY_BCM5461 }, + { Broadcom PHY AC131, { 0x0143, 0xbc70 }, LAN, 0 }, { Agere PHY ET1101B,{ 0x0282, 0xf010 }, LAN, 0 }, { Marvell PHY 88E, { 0x0141, 0x0cc0 }, LAN, F_PHY_88E }, { Realtek PHY RTL8201, { 0x, 0x8200 }, LAN, 0 }, -- 1.5.0.5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netpoll question
Steve Wise wrote: Hey all, I have netpoll question. How does netpoll work with MSI/X, NAPI, and nics that setup multiple RSS style receive queues for a single port? From what I can tell, if you're doing something like netdump using netpoll for IO, then you might never process incoming packets that get posted to the rx queues not associated with the main netdevice structure because netpoll only calls the poll() function for the main netdev struct. Not the dummy netdevs setup for multiple rx queues. Is this the case or am I confused? Thanks, Steve You are correct. Netpoll needs a bit of work, especially on the receive side, for multi-queue and some other possible problems related to taking locks when the system is frozen. If I get some time soon, I'm going to propose an overhaul to address some of these issues that show up in the kgdboe and netdump cases. Mark Huth - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Extract out DSACK detection logic
I'm about to reattempt the hacks to make tcp_sacktag_write_queue() use the RB tree lookups. In order to make the changes easier to review I'm trying to clean up the function a little bit. This one pulls out the DSACK detection logic. I'm starting to pepper get_unaligned() calls around the sack block accesses as I've been getting a few of these in my logs on sparc64 lately: [68089.285478] Kernel unaligned access at TPC[60e3c4] tcp_sacktag_write_queue+0x40/0x86c it's pretty easy to make it happen with NOP TCP options and stuff like that, and we have get_unaligned() calls for other TCP options already. Pushed to net-2.6.22 commit d9367183d9d8fd1853e3bc4d0b1af077553e0e8a Author: David S. Miller [EMAIL PROTECTED] Date: Wed Mar 28 16:27:47 2007 -0700 [TCP]: Extract DSACK detection code from tcp_sacktag_write_queue(). Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c855791..a5a8987 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -936,6 +936,39 @@ static void tcp_update_reordering(struct sock *sk, const int metric, * Both of these heuristics are not used in Loss state, when we cannot * account for retransmits accurately. */ +static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb, + struct tcp_sack_block_wire *sp, int num_sacks, + u32 prior_snd_una) +{ + u32 start_seq_0 = ntohl(get_unaligned(sp[0].start_seq)); + u32 end_seq_0 = ntohl(get_unaligned(sp[0].end_seq)); + int dup_sack = 0; + + if (before(start_seq_0, TCP_SKB_CB(ack_skb)-ack_seq)) { + dup_sack = 1; + tp-rx_opt.sack_ok |= 4; + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV); + } else if (num_sacks 1) { + u32 end_seq_1 = ntohl(get_unaligned(sp[1].end_seq)); + u32 start_seq_1 = ntohl(get_unaligned(sp[1].start_seq)); + + if (!after(end_seq_0, end_seq_1) + !before(start_seq_0, start_seq_1)) { + dup_sack = 1; + tp-rx_opt.sack_ok |= 4; + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV); + } + } + + /* D-SACK for already forgotten data... Do dumb counting. */ + if (dup_sack + !after(end_seq_0, prior_snd_una) + after(end_seq_0, tp-undo_marker)) + tp-undo_retrans--; + + return dup_sack; +} + static int tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una, u32 *mark_lost_entry_seq) @@ -963,25 +996,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, *mark_lost_entry_seq = tp-highest_sack; prior_fackets = tp-fackets_out; - /* Check for D-SACK. */ - if (before(ntohl(sp[0].start_seq), TCP_SKB_CB(ack_skb)-ack_seq)) { - dup_sack = 1; - tp-rx_opt.sack_ok |= 4; - NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV); - } else if (num_sacks 1 - !after(ntohl(sp[0].end_seq), ntohl(sp[1].end_seq)) - !before(ntohl(sp[0].start_seq), ntohl(sp[1].start_seq))) { - dup_sack = 1; - tp-rx_opt.sack_ok |= 4; - NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV); - } - - /* D-SACK for already forgotten data... -* Do dumb counting. */ - if (dup_sack - !after(ntohl(sp[0].end_seq), prior_snd_una) - after(ntohl(sp[0].end_seq), tp-undo_marker)) - tp-undo_retrans--; + dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una); /* Eliminate too old ACKs, but take into * account more or less fresh ones, they can - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] add Attansic L2 PCI ID
From: Chris Snook [EMAIL PROTECTED] Add PCI ID for the Attansic L2 100 Mb ethernet adapter. Signed-off-by: Chris Snook [EMAIL PROTECTED] --- linux-2.6.21-rc5.orig/include/linux/pci_ids.h 2007-03-27 23:26:50.0 -0400 +++ linux-2.6.21-rc5/include/linux/pci_ids.h2007-03-28 15:11:03.0 -0400 @@ -2090,6 +2090,7 @@ #define PCI_VENDOR_ID_ATTANSIC 0x1969 #define PCI_DEVICE_ID_ATTANSIC_L1 0x1048 +#define PCI_DEVICE_ID_ATTANSIC_L2 0x2048 #define PCI_VENDOR_ID_JMICRON 0x197B #define PCI_DEVICE_ID_JMICRON_JMB360 0x2360 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
another critical bug ?
Tried on 2.6.21-rc5-git3, but preempt enabled Same panic, same place seems Mar 29 02:50:53 LINUX [ 164.644102] BUG: unable to handle kernel paging request Mar 29 02:50:53 LINUX at virtual address 0302014c Mar 29 02:50:53 LINUX [ 164.644242] printing eip: Mar 29 02:50:53 LINUX [ 164.644301] *pde = Mar 29 02:50:53 LINUX [ 164.644371] Oops: [#1] Mar 29 02:50:53 LINUX [ 164.644485] PREEMPT Mar 29 02:50:53 LINUX SMP Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX [ 164.644629] Modules linked in: LIST OF MODULES Mar 29 02:50:53 LINUX [ 164.648758] CPU:0 Mar 29 02:50:53 LINUX [ 164.648760] EIP:0060:[c0216e37]Not tainted VLI Mar 29 02:50:53 LINUX [ 164.648762] EFLAGS: 00010206 (2.6.20.3-build-0002 #14) Mar 29 02:50:53 LINUX [ 164.648948] EIP is at netif_rx+0x12/0x115 Mar 29 02:50:53 LINUX [ 164.649011] eax: c33fb000 ebx: 03020100 ecx: 0001 edx: c3380d80 Mar 29 02:50:53 LINUX [ 164.649078] esi: c4cfc000 edi: c3380d80 ebp: esp: c7fb7f74 Mar 29 02:50:53 LINUX [ 164.649144] ds: 007b es: 007b ss: 0068 preempt: 0001 Mar 29 02:50:53 LINUX [ 164.649210] Process softirq-tasklet (pid: 9, ti=c7fb6000 task=c7f9f000 task.ti=c7fb6000) Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX [ 164.649276] Stack: Mar 29 02:50:53 LINUX c4cfc400 Mar 29 02:50:53 LINUX c4cfc000 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX c8a1f268 Mar 29 02:50:53 LINUX c4cfc45c Mar 29 02:50:53 LINUX 000f4240 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX c011c5d8 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX [ 164.649754] Mar 29 02:50:53 LINUX c7fb7fac Mar 29 02:50:53 LINUX c026ba53 Mar 29 02:50:53 LINUX 0006 Mar 29 02:50:53 LINUX c11c5c98 Mar 29 02:50:53 LINUX c11c5c98 Mar 29 02:50:53 LINUX 0020 Mar 29 02:50:53 LINUX c011cadf Mar 29 02:50:53 LINUX c011cbc9 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX [ 164.650231] Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX 0001 Mar 29 02:50:53 LINUX c7fb7fc0 Mar 29 02:50:53 LINUX 0032 Mar 29 02:50:53 LINUX c11c5c98 Mar 29 02:50:53 LINUX c7fa1ef8 Mar 29 02:50:53 LINUX c0128757 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX [ 164.650707] Call Trace: Mar 29 02:50:53 LINUX [ 164.650829] [c8a1f268] Mar 29 02:50:53 LINUX ri_tasklet+0xd5/0x1a1 [ifb] Mar 29 02:50:53 LINUX [ 164.650954] [c011c5d8] Mar 29 02:50:53 LINUX __tasklet_action+0xe5/0x126 Mar 29 02:50:53 LINUX [ 164.651073] [c026ba53] Mar 29 02:50:53 LINUX schedule+0xe0/0xfa Mar 29 02:50:53 LINUX [ 164.651201] [c011cadf] Mar 29 02:50:53 LINUX ksoftirqd+0x0/0x178 Mar 29 02:50:53 LINUX [ 164.651312] [c011cbc9] Mar 29 02:50:53 LINUX ksoftirqd+0xea/0x178 Mar 29 02:50:53 LINUX [ 164.651443] [c0128757] Mar 29 02:50:53 LINUX kthread+0xb2/0xdb Mar 29 02:50:53 LINUX [ 164.651560] [c01286a5] Mar 29 02:50:53 LINUX kthread+0x0/0xdb Mar 29 02:50:53 LINUX [ 164.651683] [c0103a5f] Mar 29 02:50:53 LINUX kernel_thread_helper+0x7/0x10 Mar 29 02:50:53 LINUX [ 164.651823] === Mar 29 02:50:53 LINUX [ 164.651890] Code: Mar 29 02:50:53 LINUX 00 Mar 29 02:50:53 LINUX eb Mar 29 02:50:53 LINUX 0c Mar 29 02:50:53 LINUX 89 Mar 29 02:50:53 LINUX d8 Mar 29 02:50:53 LINUX e8 Mar 29 02:50:53 LINUX 7a Mar 29 02:50:53 LINUX 5e Mar 29 02:50:53 LINUX 05 Mar 29 02:50:53 LINUX 00 Mar 29 02:50:53 LINUX e9 Mar 29 02:50:53 LINUX bb Mar 29 02:50:53 LINUX fe Mar 29 02:50:53 LINUX ff Mar 29 02:50:53 LINUX ff Mar 29 02:50:53 LINUX 83 Mar 29 02:50:53 LINUX c4 Mar 29 02:50:53 LINUX 14 Mar 29 02:50:53 LINUX 89 Mar 29 02:50:53 LINUX f0 Mar 29 02:50:53 LINUX 5b Mar 29 02:50:53 LINUX 5e Mar 29 02:50:53 LINUX 5f Mar 29 02:50:53 LINUX 5d Mar 29 02:50:53 LINUX c3 Mar 29 02:50:53 LINUX 57 Mar 29 02:50:53 LINUX 89 Mar 29 02:50:53 LINUX c7 Mar 29 02:50:53 LINUX 56 Mar 29 02:50:53 LINUX 53 Mar 29 02:50:53 LINUX 8b Mar 29 02:50:53 LINUX 40 Mar 29 02:50:53 LINUX 14 Mar 29 02:50:53 LINUX 8b Mar 29 02:50:53 LINUX 98 Mar 29 02:50:53 LINUX e4 Mar 29 02:50:53 LINUX 02 Mar 29 02:50:53 LINUX 00 Mar 29 02:50:53 LINUX 00 Mar 29 02:50:53 LINUX 85 Mar 29 02:50:53 LINUX db Mar 29 02:50:53 LINUX 74 Mar 29 02:50:53 LINUX 32 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX 7b Mar 29 02:50:53 LINUX 4c Mar 29 02:50:53 LINUX 00 Mar 29 02:50:53 LINUX 75 Mar 29 02:50:53 LINUX 06 Mar 29 02:50:53 LINUX 83 Mar 29 02:50:53 LINUX 7b Mar 29 02:50:53 LINUX 28 Mar 29 02:50:53 LINUX 00 Mar 29 02:50:53 LINUX 74 Mar 29 02:50:53 LINUX 26 Mar 29 02:50:53 LINUX 8d Mar 29 02:50:53 LINUX 73 Mar 29 02:50:53 LINUX 2c Mar 29 02:50:53 LINUX 89 Mar 29 02:50:53 LINUX f0 Mar 29 02:50:53 LINUX e8 Mar 29 02:50:53 LINUX 65 Mar 29 02:50:53 LINUX 5e Mar 29 02:50:53 LINUX 05 Mar 29 02:50:53 LINUX Mar 29 02:50:53 LINUX [ 164.654978] EIP: [c0216e37] Mar 29 02:50:53 LINUX netif_rx+0x12/0x115 Mar 29 02:50:53 LINUX SS:ESP 0068:c7fb7f74 Mar 29 02:50:53 LINUX [ 164.655137] Mar 29 02:50:53 LINUX Kernel panic - not syncing: Fatal exception Mar 29 02:50:53 LINUX [ 164.655280] [c0118387] Mar 29
Re: [PATCH] Inline net_device_stats
On Wed, 2007-03-28 at 08:52 -0700, Stephen Hemminger wrote: It would make sense to do it per-cpu and 64 bit for the non-error counters. Well, I looked at the e1000, it doesn't update on every packet anyway, but seems to d/l from the card occasionally. I assume this is the method for high-speed drivers, otherwise we should split the tx rx parts of the structure. 64 bit introduces potential compatibility problems (exporting via proc). And per-cpu feels like overkill to me. Rusty. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Make sacktag logic splittable...
It's nearly impossible to break out tcp_sacktag_write_queue() into smaller worker functions because there are so many local variables that are live and updated throughout the inner loop and beyond. So create a state block so we can start simplifying this function properly. Pushed to net-2.6.22 commit eb7723322ccc43f19714ac83395e5204fee0e5b8 Author: David S. Miller [EMAIL PROTECTED] Date: Wed Mar 28 17:17:19 2007 -0700 [TCP]: Create tcp_sacktag_state. It is difficult to break out the inner-logic of tcp_sacktag_write_queue() into worker functions because so many local variables get updated in-place. Start to overcome this by creating a structure block of state variables that can be passed around into worker routines. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a5a8987..464dc80 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -936,6 +936,15 @@ static void tcp_update_reordering(struct sock *sk, const int metric, * Both of these heuristics are not used in Loss state, when we cannot * account for retransmits accurately. */ +struct tcp_sacktag_state { + unsigned int flag; + int dup_sack; + int reord; + int prior_fackets; + u32 lost_retrans; + int first_sack_index; +}; + static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb, struct tcp_sack_block_wire *sp, int num_sacks, u32 prior_snd_una) @@ -980,23 +989,18 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2); struct sk_buff *cached_skb; int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3; - int reord = tp-packets_out; - int prior_fackets; - u32 lost_retrans = 0; - int flag = 0; - int dup_sack = 0; + struct tcp_sacktag_state state; int cached_fack_count; int i; - int first_sack_index; + int force_one_sack; if (!tp-sacked_out) { tp-fackets_out = 0; tp-highest_sack = tp-snd_una; } else *mark_lost_entry_seq = tp-highest_sack; - prior_fackets = tp-fackets_out; - dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una); + state.dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una); /* Eliminate too old ACKs, but take into * account more or less fresh ones, they can @@ -1009,18 +1013,18 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, * if the only SACK change is the increase of the end_seq of * the first block then only apply that SACK block * and use retrans queue hinting otherwise slowpath */ - flag = 1; + force_one_sack = 1; for (i = 0; i num_sacks; i++) { __be32 start_seq = sp[i].start_seq; __be32 end_seq = sp[i].end_seq; if (i == 0) { if (tp-recv_sack_cache[i].start_seq != start_seq) - flag = 0; + force_one_sack = 0; } else { if ((tp-recv_sack_cache[i].start_seq != start_seq) || (tp-recv_sack_cache[i].end_seq != end_seq)) - flag = 0; + force_one_sack = 0; } tp-recv_sack_cache[i].start_seq = start_seq; tp-recv_sack_cache[i].end_seq = end_seq; @@ -1031,8 +1035,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, tp-recv_sack_cache[i].end_seq = 0; } - first_sack_index = 0; - if (flag) + state.first_sack_index = 0; + if (force_one_sack) num_sacks = 1; else { int j; @@ -1050,17 +1054,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, sp[j+1] = tmp; /* Track where the first SACK block goes to */ - if (j == first_sack_index) - first_sack_index = j+1; + if (j == state.first_sack_index) + state.first_sack_index = j+1; } } } } - /* clear flag as used for different purpose in following code */ - flag = 0; - /* Use SACK fastpath hint if valid */ cached_skb = tp-fastpath_skb_hint; cached_fack_count = tp-fastpath_cnt_hint; @@ -1069,6 +1070,11 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, cached_fack_count = 0; } +
[PATCH] atl1: save mac address on remove
From: Chris Snook [EMAIL PROTECTED] Some atl1 boards get their MAC address written directly to the register by the BIOS during POST, rather than storing it in EEPROM that's accessible to the driver. If the MAC register on one of these boards is changed and then the module is unloaded, the permanent MAC address will be forgotten until the box is rebooted. We should save the permanent address during removal if we've been messing with it. Signed-off-by: Chris Snook [EMAIL PROTECTED] --- a/drivers/net/atl1/atl1_main.c 2007-03-01 14:14:48.0 -0500 +++ b/drivers/net/atl1/atl1_main.c 2007-03-01 16:59:59.0 -0500 @@ -2321,6 +2321,16 @@ static void __devexit atl1_remove(struct return; adapter = netdev_priv(netdev); + + /* Some atl1 boards lack persistent storage for their MAC, and get it +* from the BIOS during POST. If we've been messing with the MAC +* address, we need to save the permanent one. +*/ + if (memcmp(adapter-hw.mac_addr, adapter-hw.perm_mac_addr, ETH_ALEN)) { + memcpy(adapter-hw.mac_addr, adapter-hw.perm_mac_addr, ETH_ALEN); + atl1_set_mac_addr(adapter-hw); + } + iowrite16(0, adapter-hw.hw_addr + REG_GPHY_ENABLE); unregister_netdev(netdev); pci_iounmap(pdev, adapter-hw.hw_addr); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netpoll question
On Wed, 2007-03-28 at 16:28 -0700, Mark Huth wrote: Steve Wise wrote: Hey all, I have netpoll question. How does netpoll work with MSI/X, NAPI, and nics that setup multiple RSS style receive queues for a single port? From what I can tell, if you're doing something like netdump using netpoll for IO, then you might never process incoming packets that get posted to the rx queues not associated with the main netdevice structure because netpoll only calls the poll() function for the main netdev struct. Not the dummy netdevs setup for multiple rx queues. Is this the case or am I confused? Thanks, Steve You are correct. Netpoll needs a bit of work, especially on the receive side, for multi-queue and some other possible problems related to taking locks when the system is frozen. If I get some time soon, I'm going to propose an overhaul to address some of these issues that show up in the kgdboe and netdump cases. Mark Huth Hey Mark, What are your thoughts on how to implement this? Scrub every softnet_data queue-poll_list for every cpu? Or perhaps its better to have a new function ptr off the netdev that sez poll all rx queues? Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[git patches] net driver fixes
Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus to receive the following updates: drivers/net/atl1/atl1_hw.c |1 - drivers/net/forcedeth.c |8 ++- drivers/net/mv643xx_eth.c|4 +- drivers/net/myri10ge/myri10ge.c |7 +- drivers/net/qla3xxx.c| 110 +++-- drivers/net/qla3xxx.h|3 +- drivers/net/sun3lance.c | 16 - drivers/net/wireless/bcm43xx/bcm43xx_phy.c |4 +- drivers/net/wireless/bcm43xx/bcm43xx_radio.c | 12 ++-- fs/compat_ioctl.c|9 ++ include/linux/wireless.h | 21 - include/net/iw_handler.h | 30 +-- net/core/rtnetlink.c |3 +- net/core/wireless.c | 82 14 files changed, 182 insertions(+), 128 deletions(-) Ayaz Abdulla (2): forcedeth: fix nic poll forcedeth: fix tx timeout Brice Goglin (1): myri10ge: correctly detect when TSO should be used Cyrill V. Gorcunov (1): SUN3/3X Lance trivial fix improved David Woodhouse (1): bcm43xx: Fix machine check on PPC for version 1 PHY Gabriel Paubert (1): mv643xx_eth: Fix use of uninitialized port_num field Jay Cliburn (1): atl1: remove unnecessary crc inversion Jean Tourrilhes (2): wext: Add missing ioctls to 64-32 conversion WE-22 : prevent information leak on 64 bit Larry Finger (1): bcm43xx: Fix code for confusion between PHY revision and PHY version Ron Mercer (4): qla3xxx: bugfix: Add tx control block memset. qla3xxx: bugfix: Multi segment sends were getting whacked. qla3xxx: bugfix: Dropping interrupt under heavy network load. qla3xxx: bugfix: Jumbo frame handling. Stefano Brivio (1): bcm43xx: fix radio_set_tx_iq diff --git a/drivers/net/atl1/atl1_hw.c b/drivers/net/atl1/atl1_hw.c index 314dbaa..69482e0 100644 --- a/drivers/net/atl1/atl1_hw.c +++ b/drivers/net/atl1/atl1_hw.c @@ -334,7 +334,6 @@ u32 atl1_hash_mc_addr(struct atl1_hw *hw, u8 *mc_addr) int i; crc32 = ether_crc_le(6, mc_addr); - crc32 = ~crc32; for (i = 0; i 32; i++) value |= (((crc32 i) 1) (31 - i)); diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c index 46e1697..d04214e 100644 --- a/drivers/net/forcedeth.c +++ b/drivers/net/forcedeth.c @@ -2050,9 +2050,10 @@ static void nv_tx_timeout(struct net_device *dev) nv_drain_tx(dev); nv_init_tx(dev); setup_hw_rings(dev, NV_SETUP_TX_RING); - netif_wake_queue(dev); } + netif_wake_queue(dev); + /* 4) restart tx engine */ nv_start_tx(dev); spin_unlock_irq(np-lock); @@ -3536,7 +3537,10 @@ static void nv_do_nic_poll(unsigned long data) pci_push(base); if (!using_multi_irqs(dev)) { - nv_nic_irq(0, dev); + if (np-desc_ver == DESC_VER_3) + nv_nic_irq_optimized(0, dev); + else + nv_nic_irq(0, dev); if (np-msi_flags NV_MSI_X_ENABLED) enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector); else diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index c9f55bc..8015a7c 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -1379,7 +1379,7 @@ static int mv643xx_eth_probe(struct platform_device *pdev) spin_lock_init(mp-lock); - port_num = pd-port_number; + port_num = mp-port_num = pd-port_number; /* set default config values */ eth_port_uc_addr_get(dev, dev-dev_addr); @@ -1411,8 +1411,6 @@ static int mv643xx_eth_probe(struct platform_device *pdev) duplex = pd-duplex; speed = pd-speed; - mp-port_num = port_num; - /* Hook up MII support for ethtool */ mp-mii.dev = dev; mp-mii.mdio_read = mv643xx_mdio_read; diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c index b05b20e..c216e6a 100644 --- a/drivers/net/myri10ge/myri10ge.c +++ b/drivers/net/myri10ge/myri10ge.c @@ -71,7 +71,7 @@ #include myri10ge_mcp.h #include myri10ge_mcp_gen_header.h -#define MYRI10GE_VERSION_STR 1.3.0-1.226 +#define MYRI10GE_VERSION_STR 1.3.0-1.227 MODULE_DESCRIPTION(Myricom 10G driver (10GbE)); MODULE_AUTHOR(Maintainer: [EMAIL PROTECTED]); @@ -2015,10 +2015,9 @@ again: mss = 0; max_segments = MXGEFW_MAX_SEND_DESC; - if (skb-len (dev-mtu + ETH_HLEN)) { + if (skb_is_gso(skb)) { mss = skb_shinfo(skb)-gso_size; - if (mss != 0) - max_segments = MYRI10GE_MAX_SEND_DESC_TSO; + max_segments =
Re: IPv6: Connection reset/timeout under heavy load
YOSHIFUJI Hideaki / 吉藤英明 wrote: Would you test with latest kernel, if possible, please? For the archive: switching to 2.6.20.4 fixed this problem. Thanks! Agoston - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6: Connection reset/timeout under heavy load
In article [EMAIL PROTECTED] (at Wed, 28 Mar 2007 10:48:27 +0200), Agoston Horvath [EMAIL PROTECTED] says: YOSHIFUJI Hideaki / 吉藤英明 wrote: Would you test with latest kernel, if possible, please? For the archive: switching to 2.6.20.4 fixed this problem. Thank you for your report. I guess the following change will fix the issue for 2.6.16.y: http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/yoshfuji/linux-2.6-fix.git;a=commit;h=33a79bba0cc2f197b46cc54182f94c31ff6ad334 I hope this patch will go in 2.6.16-stable... Regards, --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6: Connection reset/timeout under heavy load
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Wed, 28 Mar 2007 18:23:34 +0900 (JST) In article [EMAIL PROTECTED] (at Wed, 28 Mar 2007 10:48:27 +0200), Agoston Horvath [EMAIL PROTECTED] says: YOSHIFUJI Hideaki / 吉藤英明 wrote: Would you test with latest kernel, if possible, please? For the archive: switching to 2.6.20.4 fixed this problem. Thank you for your report. I guess the following change will fix the issue for 2.6.16.y: http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/yoshfuji/linux-2.6-fix.git;a=commit;h=33a79bba0cc2f197b46cc54182f94c31ff6ad334 I hope this patch will go in 2.6.16-stable... Please forward this patch to Adrian Bunk ([EMAIL PROTECTED]), he will definitely add it to 2.6.16-stable for you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6: Connection reset/timeout under heavy load
In article [EMAIL PROTECTED] (at Wed, 28 Mar 2007 02:26:24 -0700 (PDT)), David Miller [EMAIL PROTECTED] says: http://www.linux-ipv6.org/gitweb/gitweb.cgi?p=gitroot/yoshfuji/linux-2.6-fix.git;a=commit;h=33a79bba0cc2f197b46cc54182f94c31ff6ad334 I hope this patch will go in 2.6.16-stable... Please forward this patch to Adrian Bunk ([EMAIL PROTECTED]), he will definitely add it to 2.6.16-stable for you. I will do it again... --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/5] 2.6.21-rc5: known regressions
Adrian Bunk wrote: Subject: e1000 resume weirdness References : http://lkml.org/lkml/2007/3/26/91 Submitter : Ingo Molnar [EMAIL PROTECTED] Handled-By : Jesse Brandeburg [EMAIL PROTECTED] Auke Kok [EMAIL PROTECTED] Status : problem is being debugged The issue comes from a corner case and the underlying problem is that e1000 isn't stopping tx properly. We have a fix for this pending in our tree that I'll push upstream for 2.6.22 to Jeff, but I don't think this should be a blocker and it's probably is not a regression at all, the gap has always been present. on a side note, this is probably fixed easily by turning the adapters detect_tx_hung flag off in e1000_down, so if someone spots this reoccurring somewhat regularly, please contact me so we can debug it. I myself have a system suspend/resuming in circles for an hour now with traffic flying across without a single hit on it Adrian, you probably want to drop this issue from your list. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/5] 2.6.21-rc5: known regressions
* Kok, Auke [EMAIL PROTECTED] wrote: Adrian Bunk wrote: Subject: e1000 resume weirdness References : http://lkml.org/lkml/2007/3/26/91 Submitter : Ingo Molnar [EMAIL PROTECTED] Handled-By : Jesse Brandeburg [EMAIL PROTECTED] Auke Kok [EMAIL PROTECTED] Status : problem is being debugged Adrian, you probably want to drop this issue from your list. agreed - i have done many suspend/resumes meanwhile, and this condition has not reoccured since then. (and even when it occured, it was transitionary) Ingo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IP1000A: About IC Plus IP1000A Linux driver current status
Jesse [EMAIL PROTECTED] : [...] The latest version had been modified by you. Would you be kindly to do this for me. I have just screwdriven a test machine but I will not finish a build/test cycle today. -- Ueimor Anybody got a battery for my Ultra 10 ? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] the scheduled eepro100 removal
Adrian Bunk wrote: This patch contains the scheduled removal of the eepro100 driver. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] This keeps coming around, but I haven't seen an answer to the questions raised by Eric Piel or Kiszka. I do know that e100 didn't work on some IBM rackmount servers and eepro100 did, but since I'm no longer responsible for those machines I can't retest. Perhaps someone will be able to provide data points. IBM current offerings as of about three years ago, I had a few dozen of them at one time. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] the scheduled eepro100 removal
Bill Davidsen wrote: Adrian Bunk wrote: This patch contains the scheduled removal of the eepro100 driver. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] This keeps coming around, but I haven't seen an answer to the questions raised by Eric Piel or Kiszka. I do know that e100 didn't work on some IBM rackmount servers and eepro100 did, but since I'm no longer responsible for those machines I can't retest. Perhaps someone will be able to provide data points. IBM current offerings as of about three years ago, I had a few dozen of them at one time. We have provided a (test) driver which allows e100 to use IO to communicate with the device, which seems to have helped for one person. I think we need to work with those changes and see if it helps the other people resolve their e100 issues. Unfortunately it keeps slipping off to the low priority list for us. I suggest that we should push this code into -mm for people to test or something. It's fairly low risk as by default the patch won't enable IO and thus use the old method of writing to the adapter. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
Lennert Buytenhek wrote: On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote: 1) Does e100 driver work on ARM? FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based) board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based) board. ixp2350 works both with hardware coherency turned on (cpu snoops bus) and turned off (manual dma cache clean/invalidate as usual.) As for the other ARM platforms that I'm interested in / have hardware for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other two (iop32x/iop33x) I can't test because I don't have such systems with e100 NICs, but I expect those would work, since they're both xscale based like the ixp2400, and the ixp2400 works. I just got an iop342 board dropped on my lap. Once it's running, I'll make sure to make this the first thing to test. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Pull out core sack tagging logic
Ok, this is what I was initially trying to do, pull out the inner-most loop main code into a helper function. Pushed to net-2.6.22 commit b096b50b4bf3c923bee28751d1ed41e92361a298 Author: David S. Miller [EMAIL PROTECTED] Date: Wed Mar 28 19:35:51 2007 -0700 [TCP]: Create tcp_sacktag_one(). Worker function that implements the main logic of the inner-most loop of tcp_sacktag_write_queue(). Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 464dc80..97b9be2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -978,6 +978,115 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb, return dup_sack; } +static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp, + struct tcp_sacktag_state *state, int in_sack, + int fack_count, u32 end_seq) +{ + u8 sacked = TCP_SKB_CB(skb)-sacked; + + /* Account D-SACK for retransmitted packet. */ + if ((state-dup_sack in_sack) + (sacked TCPCB_RETRANS) + after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker)) + tp-undo_retrans--; + + /* The frame is ACKed. */ + if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) { + if (sacked TCPCB_RETRANS) { + if ((state-dup_sack in_sack) + (sacked TCPCB_SACKED_ACKED)) + state-reord = min(fack_count, state-reord); + } else { + /* If it was in a hole, we detected reordering. */ + if (fack_count state-prior_fackets + !(sacked TCPCB_SACKED_ACKED)) + state-reord = min(fack_count, state-reord); + } + + /* Nothing to do; acked frame is about to be dropped. */ + return; + } + + if ((sacked TCPCB_SACKED_RETRANS) + after(end_seq, TCP_SKB_CB(skb)-ack_seq) + (!state-lost_retrans || after(end_seq, state-lost_retrans))) + state-lost_retrans = end_seq; + + if (!in_sack) + return; + + if (!(sacked TCPCB_SACKED_ACKED)) { + if (sacked TCPCB_SACKED_RETRANS) { + /* If the segment is not tagged as lost, +* we do not clear RETRANS, believing +* that retransmission is still in flight. +*/ + if (sacked TCPCB_LOST) { + TCP_SKB_CB(skb)-sacked = + ~(TCPCB_LOST|TCPCB_SACKED_RETRANS); + tp-lost_out -= tcp_skb_pcount(skb); + tp-retrans_out -= tcp_skb_pcount(skb); + + /* clear lost hint */ + tp-retransmit_skb_hint = NULL; + } + } else { + /* New sack for not retransmitted frame, +* which was in hole. It is reordering. +*/ + if (!(sacked TCPCB_RETRANS) + fack_count state-prior_fackets) + state-reord = min(fack_count, state-reord); + + if (sacked TCPCB_LOST) { + TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST; + tp-lost_out -= tcp_skb_pcount(skb); + + /* clear lost hint */ + tp-retransmit_skb_hint = NULL; + } + /* SACK enhanced F-RTO detection. +* Set flag if and only if non-rexmitted +* segments below frto_highmark are +* SACKed (RFC4138; Appendix B). +* Clearing correct due to in-order walk +*/ + if (after(end_seq, tp-frto_highmark)) { + state-flag = ~FLAG_ONLY_ORIG_SACKED; + } else { + if (!(sacked TCPCB_RETRANS)) + state-flag |= FLAG_ONLY_ORIG_SACKED; + } + } + + TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED; + state-flag |= FLAG_DATA_SACKED; + tp-sacked_out += tcp_skb_pcount(skb); + + if (fack_count tp-fackets_out) + tp-fackets_out = fack_count; + + if (after(TCP_SKB_CB(skb)-seq, + tp-highest_sack)) + tp-highest_sack = TCP_SKB_CB(skb)-seq; + } else { + if (state-dup_sack (sackedTCPCB_RETRANS)) + state-reord = min(fack_count, state-reord); + } + +
Re: [2.6 patch] the scheduled eepro100 removal
On 3/28/07, Jeff Garzik [EMAIL PROTECTED] wrote: Kok, Auke wrote: Sounds sane to me. My overall opinion on eepro100 removal is that we're not there yet. Rare problem cases remain where e100 fails but eepro100 works, and it's older drivers so its low priority for everybody. Needs to happen, though... It seems that several Tyan Opteron base system that were using IPMI add on card. the IPMI card share intel 100Mhz nic onboard. you need to use eepro100 instead of e100 otherwise the e100 will shutdown OOB (out of Band) connection for IPMI when shut down the OS. YH - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] e100 driver on ARM
Kok, Auke wrote: Lennert Buytenhek wrote: On Mon, Sep 04, 2006 at 06:39:29AM -0400, Jeff Garzik wrote: 1) Does e100 driver work on ARM? FWIW, e100 seems to work okay for me on an intel ixp2400 (xscale based) board, an ixp2850 (xscale based) board and an ixp2350 (xscale3 based) board. ixp2350 works both with hardware coherency turned on (cpu snoops bus) and turned off (manual dma cache clean/invalidate as usual.) As for the other ARM platforms that I'm interested in / have hardware for / maintain, the at91/ep93xx/pxa270 don't have PCI, and the other two (iop32x/iop33x) I can't test because I don't have such systems with e100 NICs, but I expect those would work, since they're both xscale based like the ixp2400, and the ixp2400 works. I just got an iop342 board dropped on my lap. Once it's running, I'll make sure to make this the first thing to test. I have a pxa255 based system with PCI added to it. The e100 would have memory corruption in its receive buffers detected by slab debugging unless I put in the patch to use the S-bit. Here is a link to the patch posting: http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc3-mm1/broken-out/git-netdev-all.patch Search for e100.c. http://www-gatago.com/linux/kernel/15457063.html - This discussion seems to hit the issue. There appears to be a race on the cache line where the EL bit and the next packet info live. In my case the hardware appeared to write to a free packet. The S-bit seems to make the hardware stop and spin on the bit, while the EL bit seems to let the hardware try to use that packet. This race would occur less often when the receive buffer chain is always refilled before the hardware can use them up. On our 400 Mhz Xscale, we can use up all 256 buffers if the PCI bus has another busy device on it. In our case it is an 802.11g miniPCI card and our software was routing all ethernet packets to the wireless interface and vice versa while TCP streams were running accross these connections. -Ack - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html