Re: [PATCH 2/7] CAN: Add PF_CAN core module
On Thu, 2007-11-15 at 08:40 +0100, Oliver Hartkopp wrote: Stephen Hemminger wrote: +#ifdef CONFIG_CAN_DEBUG_CORE +extern void can_debug_skb(struct sk_buff *skb); +extern void can_debug_cframe(const char *msg, struct can_frame *cframe); +#define DBG(fmt, args...) (DBG_VAR 1 ? printk( \ + KERN_DEBUG DBG_PREFIX : %s: fmt, \ + __func__, ##args) : 0) +#define DBG_FRAME(fmt, cf) (DBG_VAR 2 ? can_debug_cframe(fmt, cf) : 0) +#define DBG_SKB(skb) (DBG_VAR 4 ? can_debug_skb(skb) : 0) +#else +#define DBG(fmt, args...) +#define DBG_FRAME(fmt, cf) +#define DBG_SKB(skb) +#endif I would prefer the more frequently used macro style: #define DBG(fmt, args...) \ do { if (DBG_VAR 1) printk(KERN_DEBUG DBG_PREFIX : %s: fmt, \ __func__, ##args); } while (0) #define DBG_FRAME(fmt, cf) \ do { if (DBG_VAR 2) can_debug_cframe(fmt, cf); } while (0) #define DBG_SKB(skb) \ do { if (DBG_VAR 4) can_debug_skb(skb); } while (0) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
Andi Kleen a écrit : Eric Dumazet [EMAIL PROTECTED] writes: Using a if (need_resched()) test before calling cond_resched(); is necessary to avoid spending too much time doing the resched check. The only difference between cond_resched() and if (need_resched()) cond_resched() is one function call less and one might_sleep less. If the might_sleep or the function call are really problems (did you measure it? -- i doubt it somewhat) then it would be better to fix the generic code to either inline that or supply a __cond_resched() without might_sleep. Please note that : if (need_resched()) cond_resched(); will re-test need_resched() once cond_resched() is called. So it may sound unnecessary but in the rt_check_expire() case, with a loop potentially doing XXX.XXX iterations, being able to bypass the function call is a clear win (in my bench case, 25 ms instead of 88 ms). Impact on I-cache is irrelevant here as this rt_check_expires() runs once every 60 sec. I think the actual cond_resched() is fine for other uses in the kernel, that are not used in a loop : In the general case, kernel text size should be as small as possible to reduce I-cache pressure, so a function call is better than an inline. A cheaper change might have been to just limit the number of buckets scanned. Well, not in some particular cases, when there are 3 millions of routes for example in the cache. We really want to scan/free them eventually :) An admin already has the possibility to tune /proc/sys/net/ipv4/route/gc_interval and /proc/sys/net/ipv4/route/gc_timeout, so on a big cache, it will probably set gc_interval to 1 instead of 60 Next step will be to move ip route flush cache and rt_secret_rebuild() handling from softirq to process context too, since this still can kill a machine. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2][INET] (resend) Fix potential kfree on vmalloc-ed area of request_sock_queue
The request_sock_queue's listen_opt is either vmalloc-ed or kmalloc-ed depending on the number of table entries. Thus it is expected to be handled properly on free, which is done in the reqsk_queue_destroy(). However the error path in inet_csk_listen_start() calls the lite version of reqsk_queue_destroy, called __reqsk_queue_destroy, which calls the kfree unconditionally. Fix this and move the __reqsk_queue_destroy into a .c file as it looks too big to be inline. As David also noticed, this is an error recovery path only, so no locking is required and the lopt is known to be not NULL. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 7aed02c..0a954ee 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -136,11 +136,7 @@ static inline struct listen_sock *reqsk_queue_yank_listen_sk(struct request_sock return lopt; } -static inline void __reqsk_queue_destroy(struct request_sock_queue *queue) -{ - kfree(reqsk_queue_yank_listen_sk(queue)); -} - +extern void __reqsk_queue_destroy(struct request_sock_queue *queue); extern void reqsk_queue_destroy(struct request_sock_queue *queue); static inline struct request_sock * diff --git a/net/core/request_sock.c b/net/core/request_sock.c index 5f0818d..dd78b85 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -71,6 +71,28 @@ int reqsk_queue_alloc(struct request_sock_queue *queue, EXPORT_SYMBOL(reqsk_queue_alloc); +void __reqsk_queue_destroy(struct request_sock_queue *queue) +{ + struct listen_sock *lopt; + size_t lopt_size; + + /* +* this is an error recovery path only +* no locking needed and the lopt is not NULL +*/ + + lopt = queue-listen_opt; + lopt_size = sizeof(struct listen_sock) + + lopt-nr_table_entries * sizeof(struct request_sock *); + + if (lopt_size PAGE_SIZE) + vfree(lopt); + else + kfree(lopt); +} + +EXPORT_SYMBOL(__reqsk_queue_destroy); + void reqsk_queue_destroy(struct request_sock_queue *queue) { /* make all the listen_opt local to us */ -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2][INET] (resend) Move the reqsk_queue_yank_listen_sk from header
This function is used in the net/core/request_sock.c only. No need in keeping it in the header file. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 0a954ee..cff4608 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -124,18 +124,6 @@ struct request_sock_queue { extern int reqsk_queue_alloc(struct request_sock_queue *queue, unsigned int nr_table_entries); -static inline struct listen_sock *reqsk_queue_yank_listen_sk(struct request_sock_queue *queue) -{ - struct listen_sock *lopt; - - write_lock_bh(queue-syn_wait_lock); - lopt = queue-listen_opt; - queue-listen_opt = NULL; - write_unlock_bh(queue-syn_wait_lock); - - return lopt; -} - extern void __reqsk_queue_destroy(struct request_sock_queue *queue); extern void reqsk_queue_destroy(struct request_sock_queue *queue); diff --git a/net/core/request_sock.c b/net/core/request_sock.c index dd78b85..45aed75 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -93,6 +93,19 @@ void __reqsk_queue_destroy(struct request_sock_queue *queue) EXPORT_SYMBOL(__reqsk_queue_destroy); +static inline struct listen_sock *reqsk_queue_yank_listen_sk( + struct request_sock_queue *queue) +{ + struct listen_sock *lopt; + + write_lock_bh(queue-syn_wait_lock); + lopt = queue-listen_opt; + queue-listen_opt = NULL; + write_unlock_bh(queue-syn_wait_lock); + + return lopt; +} + void reqsk_queue_destroy(struct request_sock_queue *queue) { /* make all the listen_opt local to us */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
On Wed, 14 Nov 2007, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 14 Nov 2007 15:32:58 +0200 (EET) [PATCH] [TCP] FRTO: Clear frto_highmark only after process_frto that uses it I broke this in commit 3de96471bd7fb76406e975ef6387abe3a0698149. tcp_process_frto should always see a valid frto_highmark. An invalid frto_highmark (zero) is very likely what ultimately caused a seqno compare in tcp_frto_enter_loss to do the wrong leading to the LOST-bit leak. Having LOST-bits integry ensured like done after commit 23aeeec365dcf8bc87fae44c533e50d0bb4f23cc won't hurt. It may still be useful in some other, possibly legimate, scenario. Reported by Chazarain Guillaume [EMAIL PROTECTED]. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied. Thanks for making such an incredibly thorough investigation into this bug! I suppose this bug also caused all those spurious rtos I used to see with my home connection (~10% of all RTOs during 10M scp transfer). They seemed a bit out of place because it's all wired and low RTT. Though there are bw limits enforced by ISP which I first suspected could cause it, except for suspecting bug in my code of course :-). ...It seems I can drop investigating them now since last evening test run gave 0 spurious RTOs :-). Thanks Chazarain for you report. -- i.
Re: [PATCH 2/2][INET] Move the reqsk_queue_yank_listen_sk from header
Simon Horman wrote: On Wed, Nov 14, 2007 at 09:11:06PM +0300, Pavel Emelyanov wrote: This function is used in the net/core/request_sock.c only. No need in keeping it in the header file. I feel like I am missing something here, but doesn't __reqsk_queue_destroy() in include/net/request_sock.h use reqsk_queue_yank_listen_sk()? It does, but this is a patch number 2. The patch number 1 moved this __reqsk_queue_destroy() into request_sock.c. static inline void __reqsk_queue_destroy(struct request_sock_queue *queue) { kfree(reqsk_queue_yank_listen_sk(queue)); } Thanks, Pavel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] via-velocity: don't oops on MTU change.
On 15-11-2007 04:38, Stephen Hemminger wrote: Simple mtu change when device is down. Fix http://bugzilla.kernel.org/show_bug.cgi?id=9382. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/via-velocity.c 2007-10-22 09:38:11.0 -0700 +++ b/drivers/net/via-velocity.c 2007-11-14 19:34:30.0 -0800 @@ -1963,6 +1963,11 @@ static int velocity_change_mtu(struct ne return -EINVAL; } + if (!netif_running(dev)) { + dev-mtu = new_mtu; + return 0; + } + if (new_mtu != oldmtu) { spin_lock_irqsave(vptr-lock, flags); Shouldn't this latter 'if' be removed now, btw? Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
From: Andi Kleen [EMAIL PROTECTED] Date: Thu, 15 Nov 2007 08:30:16 +0100 Eric Dumazet [EMAIL PROTECTED] writes: Using a if (need_resched()) test before calling cond_resched(); is necessary to avoid spending too much time doing the resched check. The only difference between cond_resched() and if (need_resched()) cond_resched() is one function call less and one might_sleep less. If the might_sleep or the function call are really problems (did you measure it? -- i doubt it somewhat) then it would be better to fix the generic code to either inline that or supply a __cond_resched() without might_sleep. A cheaper change might have been to just limit the number of buckets scanned. Fix up unmap_vmas() too if this is done as it does a similar need_resched() check too. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : rt_check_expire() can take a long time, add a cond_resched()
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 15 Nov 2007 09:25:59 +0100 Please note that : if (need_resched()) cond_resched(); will re-test need_resched() once cond_resched() is called. So it may sound unnecessary but in the rt_check_expire() case, with a loop potentially doing XXX.XXX iterations, being able to bypass the function call is a clear win (in my bench case, 25 ms instead of 88 ms). Impact on I-cache is irrelevant here as this rt_check_expires() runs once every 60 sec. BTW, Eric, initially I was going to recommend that you do something like: if ((goal % SOME_POWER_OF_2) == 0) cond_resched(); to mitigate this cost but decided it wasn't worth the bother. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2][INET] (resend) Fix potential kfree on vmalloc-ed area of request_sock_queue
On Thu, 15 Nov 2007 11:41:37 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: The request_sock_queue's listen_opt is either vmalloc-ed or kmalloc-ed depending on the number of table entries. Thus it is expected to be handled properly on free, which is done in the reqsk_queue_destroy(). However the error path in inet_csk_listen_start() calls the lite version of reqsk_queue_destroy, called __reqsk_queue_destroy, which calls the kfree unconditionally. Fix this and move the __reqsk_queue_destroy into a .c file as it looks too big to be inline. As David also noticed, this is an error recovery path only, so no locking is required and the lopt is known to be not NULL. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Eric Dumazet [EMAIL PROTECTED] Thank you for finding this bug Pavel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3: strange errors and non-working-ness
On 13-11-2007 19:57, Jon Nelson wrote: I'm not sure if this is the right place, Me too. Looks more like acpi or pci problem. Did you try to experiment with something like: pci=noacpi or acpi=off boot parameters? Probably some point to your .config and dmesg should be useful too, so taking it to bugzilla and sending a number as a follow up to this thread should be resonable. Btw, I add main kernel to cc. Regards, Jarek P. but I've got a pair of GiG-E cards that do not work correctly. Everything appears to come up just fine, but sooner or later (typically fairly quickly) the cards weird out and never really come back. The best info I've got is this: Nov 10 22:21:19 frank kernel: tg3.c:v3.65 (August 07, 2006) Nov 10 22:21:19 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] - Link [LNKB] - GSI 3 (level, low) - IRQ 3 Nov 10 22:21:19 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:09:5b:09:b1:69 Nov 10 22:21:19 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0] Nov 10 22:21:19 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and on for RX. Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: ACPI: PCI interrupt for device :00:0b.0 disabled Nov 10 22:21:20 frank kernel: PCI: Enabling device :00:0b.0 (0100 - 0102) Nov 10 22:21:20 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] - Link [LNKB] - GSI 3 (level, low) - IRQ 3 Nov 10 22:21:20 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:09:5b:09:b1:69 Nov 10 22:21:20 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0] Nov 10 22:21:20 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and on for RX. Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2
[PATCH][NET-2.6.25] Move sock_valbool_flag to socket.c
The sock_valbool_flag() helper is used in setsockopt to set or reset some flag on the sock. This helper is required in the net/socket.c only, so move it there. Besides, patch two places in sys_setsockopt() that repeat this helper functionality manually. Since this is not a bugfix, but a trivial cleanup, I prepared this patch against net-2.6.25, but it also applies (with a single offset) to the latest net-2.6. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/include/net/sock.h b/include/net/sock.h index cfb946a..80ca671 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1393,14 +1393,6 @@ extern int net_msg_warn; lock_sock(sk); \ } -static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool) -{ - if (valbool) - sock_set_flag(sk, bit); - else - sock_reset_flag(sk, bit); -} - extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; diff --git a/net/core/sock.c b/net/core/sock.c index 2029d09..98b243a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -419,6 +419,14 @@ out: return ret; } +static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool) +{ + if (valbool) + sock_set_flag(sk, bit); + else + sock_reset_flag(sk, bit); +} + /* * This is meant for all protocols to use and covers goings on * at the socket level. Everything here is generic. @@ -463,11 +471,8 @@ int sock_setsockopt(struct socket *sock, int level, int optname, case SO_DEBUG: if (val !capable(CAP_NET_ADMIN)) { ret = -EACCES; - } - else if (valbool) - sock_set_flag(sk, SOCK_DBG); - else - sock_reset_flag(sk, SOCK_DBG); + } else + sock_valbool_flag(sk, SOCK_DBG, valbool); break; case SO_REUSEADDR: sk-sk_reuse = valbool; @@ -477,10 +482,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname, ret = -ENOPROTOOPT; break; case SO_DONTROUTE: - if (valbool) - sock_set_flag(sk, SOCK_LOCALROUTE); - else - sock_reset_flag(sk, SOCK_LOCALROUTE); + sock_valbool_flag(sk, SOCK_LOCALROUTE, valbool); break; case SO_BROADCAST: sock_valbool_flag(sk, SOCK_BROADCAST, valbool); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc2-mm1 -- strange apparent network failures
When testing some of the later 2.6.24-rc2-mm1+hotfix combinations on three of our test systems one job from each batch (1/4) failed. In each case the machine appears to have booted normally all the way to a login: prompt. However in the failed boots the networking though apparently initialised completely and correctly (as far as I can tell from the console output), is reported as not responding to ssh connections. The network interface seems to have been initialised on the right port, and the ssh daemons started. Two of the machines are powerpc boxes, the other an older x86_64. One machine is 4/4 in testing, just one. Most of the other machines are still not able to compile this stack so do not contribute to our knowledge. Any ideas? -apw - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Re : Bug in using inet_lookup ()
On Wed, Nov 14, 2007 at 04:47:22PM +, Nj A ([EMAIL PROTECTED]) wrote: By setting the ID of the ingress device to the inet_lookup() to 0, the machine reboots automatically. Setting proc/sys/kernel/panic* to non zero values dosn't help more.. Sorry, I did not understand? You mean after you provide zero to inet_lookup() instead of device id it strted to reboot? -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9384] New: Appletalk packets are delivered to the last interface FD_SET
(switching to email for netdev - please repond via emailed reply-to-all, not via the bugzilla UI) On Thu, 15 Nov 2007 01:56:07 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9384 Summary: Appletalk packets are delivered to the last interface FD_SET Product: Networking Version: 2.5 KernelVersion: 2.6.21.3 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.10. Maybe 2.6.15? It was in 2.6.18 along with bug 7421 which caused me to disable netatalk until now. Distribution: Debian etch (4.0) Hardware Environment: Pentium 4 2.8GHz, HT off, Intel D865GLC motherboard, 256MB RAM, onboard Intel GigE, PCI Intel e100. Software Environment: Netatalk 2.0.3, ipset patch for iptables and kernel Problem Description: Appletalk packets appear to come from the wrong interface, specifically the last one FD_SET. Using wireshark I see Appletalk rtmp packets arrive from the upstream router on eth1 (the e100). Netatalk then reports the packet as having arrived on eth0.3, which is the only other appletalk enabled interface, and prints rtmp_packet interface mismatch because the packet appears to come from the wrong interface. I'm fairly sure it's the kernel doing it, because wireshark is listening on eth1 and shows the packet from the upstream router's MAC address and DDP address, then the debug code in atalkd immediately after the recvfrom prints the ifr_name which is eth0.3. Also netatalk 2.0.3 was released over 2 years ago, so the only code that's changed is the kernel. Enabling appletalk on eth0.2 clarifies the problem - packets are delivered to fds belonging to the last interface FD_SET. Reordering the interfaces also shows this, as in the config file changing the order of the interfaces changes the order they're looped through for FD_SET. Steps to reproduce: Set up a multi-interface netatalk config and watch for rtmp_packet interface mismatch messages. I added a bunch of log statements to debug this, the most useful places to put them are at the end of setaddr() and after the select() in main(). The machine is a router, so I have to minimise the downtime of testing different kernel versions. I am happy to instrument atalkd or provide packet captures. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
David Miller [EMAIL PROTECTED] wrote: Chazarain please let us know if it does indeed cure your problem. Unfortunately, I couldn't manage to reproduce the problem with an unpatched kernel. But your investigation Ilpo was really impressive. BTW, even though I messed up the yahoo webmail configuration, you can call me by my first name: Guillaume ;-) Thanks again for such an awesome bug fixing attitude! -- Guillaume - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2][INET] (resend) Fix potential kfree on vmalloc-ed area of request_sock_queue
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 15 Nov 2007 10:21:01 +0100 On Thu, 15 Nov 2007 11:41:37 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: The request_sock_queue's listen_opt is either vmalloc-ed or kmalloc-ed depending on the number of table entries. Thus it is expected to be handled properly on free, which is done in the reqsk_queue_destroy(). However the error path in inet_csk_listen_start() calls the lite version of reqsk_queue_destroy, called __reqsk_queue_destroy, which calls the kfree unconditionally. Fix this and move the __reqsk_queue_destroy into a .c file as it looks too big to be inline. As David also noticed, this is an error recovery path only, so no locking is required and the lopt is known to be not NULL. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Eric Dumazet [EMAIL PROTECTED] Thank you for finding this bug Pavel Indeed. I applied this, but what I did was I combined both changes into one because to me they logically belong together. Thanks again Pavel! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NET-2.6.25] Move sock_valbool_flag to socket.c
From: Pavel Emelyanov [EMAIL PROTECTED] Date: Thu, 15 Nov 2007 12:43:51 +0300 The sock_valbool_flag() helper is used in setsockopt to set or reset some flag on the sock. This helper is required in the net/socket.c only, so move it there. Besides, patch two places in sys_setsockopt() that repeat this helper functionality manually. Since this is not a bugfix, but a trivial cleanup, I prepared this patch against net-2.6.25, but it also applies (with a single offset) to the latest net-2.6. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Applied to net-2.6.25, thanks Pavel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)
Fix arp reply when received arp probe with sender ip 0. Can't find any ground in RFC2131 to send a non-valid arp-reply in the special case of sender ip being set to 0. - Bug fix for arp handling when sender ip is set to 0. Send a correct arp reply instead of one with sender ip and sender hardware adress in target fields. Now sends target ip and target hw as received in arp probe. Signed-off-by: Jonas Danielsson [EMAIL PROTECTED] --- arp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: arp.c === RCS file: /usr/local/cvs/linux/os/linux-2.6/net/ipv4/arp.c,v retrieving revision 1.22 diff -u -w -r1.22 arp.c --- arp.c 13 Oct 2006 12:45:47 - 1.22 +++ arp.c 15 Nov 2007 10:34:44 - @@ -827,7 +827,8 @@ if (arp-ar_op == htons(ARPOP_REQUEST) inet_addr_type(tip) == RTN_LOCAL !arp_ignore(in_dev,dev,sip,tip)) - arp_send(ARPOP_REPLY,ETH_P_ARP,tip,dev,tip,sha,dev-dev_addr,dev-dev_addr); + arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha, +dev-dev_addr, sha); goto out; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
Stephen Hemminger [EMAIL PROTECTED] writes: +#ifdef CONFIG_CAN_DEBUG_CORE +extern void can_debug_skb(struct sk_buff *skb); +extern void can_debug_cframe(const char *msg, struct can_frame *cframe); +#define DBG(fmt, args...) (DBG_VAR 1 ? printk( \ + KERN_DEBUG DBG_PREFIX : %s: fmt, \ + __func__, ##args) : 0) +#define DBG_FRAME(fmt, cf) (DBG_VAR 2 ? can_debug_cframe(fmt, cf) : 0) +#define DBG_SKB(skb) (DBG_VAR 4 ? can_debug_skb(skb) : 0) +#else +#define DBG(fmt, args...) +#define DBG_FRAME(fmt, cf) +#define DBG_SKB(skb) +#endif This non-standard debugging seems like it needs a better interface. Also, need paren's around (DBG_VAR 1) and don't use UPPERCASE for variable names. No additional parenthesis is needed here. ?: is the lowest precedence operator above assignment and ,. Also, DBG_VAR is no variable name. It's a macro that expands to a variable name like can_debug, raw_debug or bcm_debug. +HLIST_HEAD(rx_dev_list); Please either make rx_dev_list static or call it can_rx_dev_list to avoid name conflices. +static struct dev_rcv_lists rx_alldev_list; +static DEFINE_SPINLOCK(rcv_lists_lock); + +static struct kmem_cache *rcv_cache __read_mostly; + +/* table of registered CAN protocols */ +static struct can_proto *proto_tab[CAN_NPROTO] __read_mostly; +static DEFINE_SPINLOCK(proto_tab_lock); + +struct timer_list stattimer; /* timer for statistics update */ +struct s_stats stats; /* packet statistics */ +struct s_pstats pstats; /* receive list statistics */ More global variables without prefix. These variables are not exported with EXPORT_SYMBOL, so there should be no name conflict. They cannot be made static because they are used in af_can.c and proc.c. Nevertheless we can prefix them with can_ if you still think it's necessary. +static int can_proc_read_stats(char *page, char **start, off_t off, + int count, int *eof, void *data) +{ +} The read interface should use seq_file interface rather than formatting into page buffer. Why? For this simple function a page buffer is enough space and the seq_file API would require more effort. IMHO, seq_files offer advantages if the proc file shows some sequence of data generated in an iteration through some loop (see below). +static int can_proc_read_reset_stats(char *page, char **start, off_t off, +int count, int *eof, void *data) +{ +} Why not have a write interface to do the reset? I haven't looked into writable proc files yet. Will do so. +static int can_proc_read_rcvlist(char *page, char **start, off_t off, +int count, int *eof, void *data) +{ + /* double cast to prevent GCC warning */ + int idx = (int)(long)data; +} This is were I would prefer sequence files. However, the seq file interface doesn't allow me to pass additional info like the `data' argument. This means I would have to write separate functions instead. Output from checkpatch: WARNING: do not add new typedefs #116: FILE: include/linux/can.h:41: +typedef __u32 canid_t; WARNING: do not add new typedefs #124: FILE: include/linux/can.h:49: +typedef __u32 can_err_mask_t; These typedef were considered OK in previous discussions on the list. ERROR: use tabs not spaces #498: FILE: net/can/af_can.c:159: +^I^I^I^I not implemented.\n, module_name);$ Fixed. WARNING: braces {} are not necessary for single statement blocks #1080: FILE: net/can/af_can.c:741: + if (!proto_tab[proto]) { + printk(KERN_ERR BUG: can: protocol %d is not registered\n, +proto); + } Hm, isn't it common to use braces for single statements if they span more than one line? Thanks for your review. urs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)
In article [EMAIL PROTECTED] (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, Fred L [EMAIL PROTECTED] says: --- linux-2.6.24-rc2/net/ipv6/addrconf.c.orig 2007-11-08 11:59:35.0 -0800 +++ linux-2.6.24-rc2/net/ipv6/addrconf.c 2007-11-14 22:17:28.0 -0800 @@ -1424,6 +1424,21 @@ static int addrconf_ifid_infiniband(u8 * return 0; } +static int addrconf_ifid_isatap(u8 *eui, __be32 addr) +{ + + eui[0] = 0x02; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE; + memcpy (eui+4, addr, 4); + + if (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) || + LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) || + ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) || + MULTICAST(addr) || BADCLASS(addr)) + eui[0] = ~0x02; + + return 0; +} + static int ipv6_generate_eui64(u8 *eui, struct net_device *dev) { switch (dev-type) { { eui[0] = (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) || LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) || ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) || MULTICAST(addr) || BADCLASS(addr)) ? 0 : 2; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE; memcpy (eui+4, addr, 4); } @@ -2167,7 +2185,8 @@ static void addrconf_dev_config(struct n (dev-type != ARPHRD_FDDI) (dev-type != ARPHRD_IEEE802_TR) (dev-type != ARPHRD_ARCNET) - (dev-type != ARPHRD_INFINIBAND)) { + (dev-type != ARPHRD_INFINIBAND) + !(dev-priv_flags IFF_ISATAP)) { /* Alas, we support only Ethernet autoconfiguration. */ return; } Because priv_flags are local to device type, you need to check dev-type: (dev-type == ARPHRD_SIT !(dev-priv_flags IFF_ISATAP)) or something like this. + struct ip_tunnel *t = netdev_priv(ifp-idev-dev); + if (t-parms.i_key != INADDR_NONE) { + spin_lock(ifp-lock); I guess INADDR_ANY. --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducible cxgb kernel panic in FC8 kernel 2.6.23.1-49
Ben Greear wrote: This panic happens (almost?) immediately after starting TCP traffic between the cxgb nic on this system and another. We also got at least one crash on a custom/tainted 2.6.20.12 kernel, but it would run for at least a few minutes at ~1Gbps first. I think my serial console chomped some of this..but it's very reproducible, so if you need more info I can make the terminal wider and do it again. Hi Ben, I just posted a patch fixing this T2 crash. It appeared in 2.6.22, when eth_type_trans() was modified to set skb-dev. cxgb3 got fixed at the time, but I obviously forgot the chelsio driver. I'm a bit behind on T2 updates. I will get to it in a few days. Cheers, Divy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)
In article [EMAIL PROTECTED] (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, Fred L [EMAIL PROTECTED] says: From: Fred L. Templin [EMAIL PROTECTED] This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the iproute2 utility. The following diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This message includes the full and patchable diff text; please use this version to apply patches. Signed-off-by: Fred L. Templin [EMAIL PROTECTED] BTW, how will we handle DNS name (and TTL) and/or multiple PRL entries in RFC4214? I'm doubting if we really need to handle PRL refresh in kernel. --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
On Thu, 15 Nov 2007, Guillaume Chazarain wrote: David Miller [EMAIL PROTECTED] wrote: Chazarain please let us know if it does indeed cure your problem. Unfortunately, I couldn't manage to reproduce the problem with an unpatched kernel. But your investigation Ilpo was really impressive. These are usually very sensitive on other traffic because even a simple change in packet pattern changes behavior enough for it do disappear. The same thing occurred with the month ago fackets_out miscount as well, at different weekday it just wasn't reproducable. ...Anyway, I'm pretty sure it's now fixed because there's a simple explination to it due to the frto_highmark premature clearing bug. But if you would still end up seeing them after that, make sure to report it... :-) BTW, even though I messed up the yahoo webmail configuration, you can call me by my first name: Guillaume ;-) Fair enough. :-) Thanks again for such an awesome bug fixing attitude! The best thing is that usually when forced to really think what could go wrong, also other, unrelated bugs seem to come up, though up to 10% of the initial oh-nos end up being genuine bugs. ...Thus I still have couple of miscount-due-to-GSOhints fixes to do as a result of this venture besides the problems already fixed. -- i. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
Joe Perches [EMAIL PROTECTED] writes: On Thu, 2007-11-15 at 08:40 +0100, Oliver Hartkopp wrote: Stephen Hemminger wrote: +#ifdef CONFIG_CAN_DEBUG_CORE +extern void can_debug_skb(struct sk_buff *skb); +extern void can_debug_cframe(const char *msg, struct can_frame *cframe); +#define DBG(fmt, args...) (DBG_VAR 1 ? printk( \ +KERN_DEBUG DBG_PREFIX : %s: fmt, \ +__func__, ##args) : 0) +#define DBG_FRAME(fmt, cf) (DBG_VAR 2 ? can_debug_cframe(fmt, cf) : 0) +#define DBG_SKB(skb) (DBG_VAR 4 ? can_debug_skb(skb) : 0) +#else +#define DBG(fmt, args...) +#define DBG_FRAME(fmt, cf) +#define DBG_SKB(skb) +#endif I would prefer the more frequently used macro style: #define DBG(fmt, args...) \ do { if (DBG_VAR 1) printk(KERN_DEBUG DBG_PREFIX : %s: fmt, \ __func__, ##args); } while (0) #define DBG_FRAME(fmt, cf) \ do { if (DBG_VAR 2) can_debug_cframe(fmt, cf); } while (0) #define DBG_SKB(skb) \ do { if (DBG_VAR 4) can_debug_skb(skb); } while (0) I prefer our code because it is shorter (fits into one line) and can be used anywhere where an expression is allowed compared to only where a statement is allowed. Actually, I first had #define DBG( ... ) ((debug 1) printk( ... )) and so on, but that didn't work with can_debug_{cframe,sbk} since they return void. Admitted, the benefit of expr vs. statement is really negligible and since this issue has come up several times I will change these macros using do-while. urs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [BUG] New Kernel Bugs
On Thu, Nov 15, 2007 at 06:59:34AM +0100, Rene Herman wrote: On 15-11-07 05:16, Bron Gondwana wrote: Totally unrelated - I sent something to the kolab mailing list a couple [ ... ] I'm sure if I had something that I considered worth informing the ALSA project of, I'd be wary of spending the same effort writing a good post knowing it may be dropped in between the by a list moderator just selecing all and bouncing them. Totally unrelated indeed so why are spouting crap? If the kohab list has a problem take it up with them but keep ALSA out of it. alsa-devel has only ever moderated out spam -- nothing else. As an outsider to the list, how do I know what your policy will be other than I've been rejected out of hand by someone else's list, so my experience is that member only lists aren't willing to listen to something I have to say unless I make the effort to sign up and have yet another folder accumulating unread messages. I don't. Well, ok - maybe I do here since I've let myself be dragged in to the debate. Oops. I get the same information from both project websites: moderated for non-members, public archives - no way of knowing that ALSA will accept me informing them of something they would be interested without committing to reading or bit-bucketing their list. The alternative is to subscribe just long enough to send something and then unsubscribe again or cold-email a member and ask them to pass a message along. Or post and hope it doesn't get rejected, not even knowing for a day or so. Bron. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
From: Urs Thuermann [EMAIL PROTECTED] Date: 15 Nov 2007 12:51:34 +0100 I prefer our code because it is shorter (fits into one line) and can be used anywhere where an expression is allowed compared to only where a statement is allowed. Actually, I first had #define DBG( ... ) ((debug 1) printk( ... )) and so on, but that didn't work with can_debug_{cframe,sbk} since they return void. Admitted, the benefit of expr vs. statement is really negligible and since this issue has come up several times I will change these macros using do-while. I really frown upon these local debugging macros people tend to want to submit with their changes. It really craps up the tree, even though it might be useful to you. So please remove this stuff or replace the debugging statements with some generic kernel debugging facility, there are several. Thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Null pointer dereference in nf_nat_move_storage(), kernel 2.6.23.1
Hi Chuck. On Wed, Nov 14, 2007 at 06:25:15PM -0500, Chuck Ebbert ([EMAIL PROTECTED]) wrote: https://bugzilla.redhat.com/show_bug.cgi?id=259501#c14 [f8b61643] __nf_ct_ext_add+0x12f/0x1c4 [nf_conntrack] nf_nat_move_storage(): /usr/src/debug/kernel-2.6.23/linux-2.6.23.i686/net/ipv4/netfilter/nf_nat_core.c:612 87: f7 47 64 80 01 00 00testl $0x180,0x64(%edi) 8e: 74 39 je c9 nf_nat_move_storage+0x65 line 612: if (!(ct-status IPS_NAT_DONE_MASK)) return; Please test attached patch. This routing is called each time hash should be replaced, nf_conn has extension list which contains pointers to connection tracking users (like nat, which is right now the only such user), so when replace takes place it should copy own extensions. Loop above checks for own extension, but tries to move higer-layer one, which can lead to above oops. Not tested, derived from code observation only. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c index a1a65a1..cf6ba66 100644 --- a/net/netfilter/nf_conntrack_extend.c +++ b/net/netfilter/nf_conntrack_extend.c @@ -109,7 +109,7 @@ void *__nf_ct_ext_add(struct nf_conn *ct, enum nf_ct_ext_id id, gfp_t gfp) rcu_read_lock(); t = rcu_dereference(nf_ct_ext_types[i]); if (t t-move) - t-move(ct, ct-ext + ct-ext-offset[id]); + t-move(ct, ct-ext + ct-ext-offset[i]); rcu_read_unlock(); } kfree(ct-ext); -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter : struct xt_table_info diet
Eric Dumazet wrote: On Wed, 14 Nov 2007 18:19:41 +0100 Patrick McHardy [EMAIL PROTECTED] wrote: diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 2909c92..ed3bd0b 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -811,7 +811,7 @@ static int do_replace(void __user *user, unsigned int len) return -ENOPROTOOPT; /* overflow check */ -if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS - +if (tmp.size = (INT_MAX - XT_TABLE_INFO_SZ) / NR_CPUS - SMP_CACHE_BYTES) Shouldn't NR_CPUs be replaced by nr_cpu_ids here? I'm wondering why we still include NR_CPUs in the calculation at all though, unlike in 2.4, we don't allocate one huge area of memory anymore but do one allocation per CPU. IIRC it even was you who changed that. Yes, doing an allocation per possible cpu was better than one giant allocation (memory savings and NUMA aware) Well, technically speaking you are right, we may also replace these divides per NR_CPUS by nr_cpu_ids (or even better : num_possible_cpus()) Because with NR_CPUS=4096, we actually limit tmp.size to about 524000, what a shame ! :) We actually had complaints about number of rule limitations, but that was more likely caused by vmalloc limits :) But of course we do need to include the number of CPUs in the check, I misread the code. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] chelsio - Fix skb-dev setting
From: Divy Le Ray [EMAIL PROTECTED] eth_type_trans() now sets skb-dev. Access skb-def after it gets set. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/chelsio/sge.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c index ffa7e64..4436662 100644 --- a/drivers/net/chelsio/sge.c +++ b/drivers/net/chelsio/sge.c @@ -1379,11 +1379,11 @@ static void sge_rx(struct sge *sge, struct freelQ *fl, unsigned int len) } __skb_pull(skb, sizeof(*p)); - skb-dev-last_rx = jiffies; st = per_cpu_ptr(sge-port_stats[p-iff], smp_processor_id()); st-rx_packets++; skb-protocol = eth_type_trans(skb, adapter-port[p-iff].dev); + skb-dev-last_rx = jiffies; if ((adapter-flags RX_CSUM_ENABLED) p-csum == 0x skb-protocol == htons(ETH_P_IP) (skb-data[9] == IPPROTO_TCP || skb-data[9] == IPPROTO_UDP)) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [BUG] New Kernel Bugs
On 15-11-07 13:02, Bron Gondwana wrote: I get the same information from both project websites: moderated for non-members, public archives - no way of knowing that ALSA will accept me informing them of something they would be interested without committing to reading or bit-bucketing their list. Can you please just shelve this crap? You have a way of knowing that ALSA will accept you and that is knowing or assuming that the ALSA project doesn't consist of drooling retards. When a project list goes to the difficulty of moderating non-subscribers it has made the explicit choice to _not_ become subscriber only. Then refusing valid non-subscribers after all makes no sense whatsoever. I'm sorry you got your feelings hurt by that other list but it was no doubt an accident; take it up with them. Rene. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, take2] netfilter : struct xt_table_info diet
Eric Dumazet wrote: [PATCH] netfilter : struct xt_table_info diet Instead of using a big array of NR_CPUS entries, we can compute the size needed at runtime, using nr_cpu_ids This should save some ram (especially on David's machines where NR_CPUS=4096 : 32 KB can be saved per table, and 64KB for dynamically allocated ones (because of slab/slub alignements) ) In particular, the 'bootstrap' tables are not any more static (in data section) but on stack as their size is now very small. This also should reduce the size used on stack in compat functions (get_info() declares an automatic variable, that could be bigger than kernel stack size for big NR_CPUS) I fixed a compilation error with CONFIG_COMPAT and applied it, thanks Eric. One question though: +#define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \ + + nr_cpu_ids * sizeof(char *)) /* overflow check */ - if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS - - SMP_CACHE_BYTES) + if (tmp.size = INT_MAX / num_possible_cpus()) return -ENOMEM; We need to make sure offsetof(struct xt_table_info, entries) + nr_cpu_ids * sizeof(char *) doesn't overflow, so why doesn't it use nr_cpu_ids here as well? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [BUG] New Kernel Bugs
On Thu, 15 November 2007 13:26:51 +0100, Rene Herman wrote: Can you please just shelve this crap? You have a way of knowing that ALSA will accept you and that is knowing or assuming that the ALSA project doesn't consist of drooling retards. Well, my experience with moderation has been that moderated mails are stuck in some queue for weeks. Two seperate lists, neither of them was alsa. If also is doing a better job, great. But it still has to live with the general reputation of non-subscriber moderation. When a project list goes to the difficulty of moderating non-subscribers it has made the explicit choice to _not_ become subscriber only. Then refusing valid non-subscribers after all makes no sense whatsoever. I'm sorry you got your feelings hurt by that other list but it was no doubt an accident; take it up with them. Been there, done that. In spite of people not being drooling retards, the amount of time and effort they invest into either moderation or improving the ruleset is quite limited. Problems persist. And even without mails being held hostage for weeks, every single moderation mail is annoying. Like the one I'm sure to receive after sending this out. Jörn -- Joern's library part 5: http://www.faqs.org/faqs/compression-faq/part2/section-9.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [BUG] New Kernel Bugs
On Thu, Nov 15, 2007 at 06:59:34AM +0100, Rene Herman wrote: Totally unrelated indeed so why are spouting crap? If the kohab list has a problem take it up with them but keep ALSA out of it. alsa-devel has only ever moderated out spam -- nothing else. That is incorrect. Hopefully it is the case now though, since my experience of the subject was years ago. OG. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [BUG] New Kernel Bugs
At Thu, 15 Nov 2007 14:17:27 +0100, Olivier Galibert wrote: On Thu, Nov 15, 2007 at 06:59:34AM +0100, Rene Herman wrote: Totally unrelated indeed so why are spouting crap? If the kohab list has a problem take it up with them but keep ALSA out of it. alsa-devel has only ever moderated out spam -- nothing else. That is incorrect. Hopefully it is the case now though, since my experience of the subject was years ago. Yeah, it was really years ago that we once switched to the open list. Funny that people never forget such a thing :) Takashi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: network interface state
On Thu, 2007-15-11 at 10:11 +0800, Herbert Xu wrote: We don't make use of that on recvmsg() though although theoretically user-space is supposed to be ready to handle that too. iproute2 handles that well. Anyone writting netlink apps should program with the thought that a single received datagram will include many netlink messages. On the concept of putting some generation marker/counter: It is one of those things that have bothered me as well for sometime, but i cant think of a clean way to solve it for every user of netlink. One way to transport this from the kernel is stash it in the netlink sequence but that would violate things when a user expects a specific sequence. For the ifla/iflink, it should be trivial to solve by adding a marker in the kernel that gets set to jiffies (or some incremental counter) every time an event happens. You then transport this to user space as an attribute anytime someone does a GET. Clearly the best way to solve it is to be generic, but we would need to revamp netlink totaly. Note, we do today signal to user space that a message was lost because of buffer overrun. So a hack (not applicable to the poster given they dont have a daemon) would be to listen to events and set the rx socket buffer to be very small so you loose every message. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] [TCP]: Make lost retrans detection more self-contained
Highest_sack_end_seq is no longer calculated in the loop, thus it can be pushed to the worker function altogether making that function independent of the sacktag. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 20 +++- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c25704f..b7af304 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1115,16 +1115,23 @@ static int tcp_is_sackblock_valid(struct tcp_sock *tp, int is_dsack, * * Search retransmitted skbs from write_queue that were sent when snd_nxt was * less than what is now known to be received by the other end (derived from - * SACK blocks by the caller). Also calculate the lowest snd_nxt among the - * remaining retransmitted skbs to avoid some costly processing per ACKs. + * highest SACK block). Also calculate the lowest snd_nxt among the remaining + * retransmitted skbs to avoid some costly processing per ACKs. */ -static int tcp_mark_lost_retrans(struct sock *sk, u32 received_upto) +static int tcp_mark_lost_retrans(struct sock *sk) { + const struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; int flag = 0; int cnt = 0; u32 new_low_seq = tp-snd_nxt; + u32 received_upto = TCP_SKB_CB(tp-highest_sack)-end_seq; + + if (!tcp_is_fack(tp) || !tp-retrans_out || + !after(received_upto, tp-lost_retrans_low) || + icsk-icsk_ca_state != TCP_CA_Recovery) + return flag; tcp_for_write_queue(skb, sk) { u32 ack_seq = TCP_SKB_CB(skb)-ack_seq; @@ -1245,7 +1252,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3; int reord = tp-packets_out; int prior_fackets; - u32 highest_sack_end_seq; int flag = 0; int found_dup_sack = 0; int cached_fack_count; @@ -1513,11 +1519,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ flag = ~FLAG_ONLY_ORIG_SACKED; } - highest_sack_end_seq = TCP_SKB_CB(tp-highest_sack)-end_seq; - if (tcp_is_fack(tp) tp-retrans_out - after(highest_sack_end_seq, tp-lost_retrans_low) - icsk-icsk_ca_state == TCP_CA_Recovery) - flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq); + flag |= tcp_mark_lost_retrans(sk); tcp_verify_left_out(tp); -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 09/10] [TCP]: Rewrite SACK block processing sack_recv_cache use
Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Corrently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively special cased fastpath is dropped. This change adds some complexity to introduce better coveraged fastpath, though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath special case. DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/linux/tcp.h |3 - include/net/tcp.h |1 - net/ipv4/tcp_input.c | 277 +++-- net/ipv4/tcp_output.c | 14 +--- 4 files changed, 175 insertions(+), 120 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 794497c..08027f1 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -343,10 +343,7 @@ struct tcp_sock { struct sk_buff *scoreboard_skb_hint; struct sk_buff *retransmit_skb_hint; struct sk_buff *forward_skb_hint; - struct sk_buff *fastpath_skb_hint; - int fastpath_cnt_hint; /* Lags behind by current skb's pcount -* compared to respective fackets_out */ int lost_cnt_hint; int retransmit_cnt_hint; diff --git a/include/net/tcp.h b/include/net/tcp.h index 3444647..0844261 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1078,7 +1078,6 @@ static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { tcp_clear_retrans_hints_partial(tp); - tp-fastpath_skb_hint = NULL; } /* MD5 Signature */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 69f2f79..5833b01 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1333,6 +1333,88 @@ static int tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp, return flag; } +static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk, + struct tcp_sack_block *next_dup, + u32 start_seq, u32 end_seq, + int dup_sack_in, int *fack_count, + int *reord, int *flag) +{ + struct tcp_sock *tp = tcp_sk(sk); + + tcp_for_write_queue_from(skb, sk) { + int in_sack = 0; + int dup_sack = dup_sack_in; + + if (skb == tcp_send_head(sk)) + break; + + /* queue is in-order = we can short-circuit the walk early */ + if (!before(TCP_SKB_CB(skb)-seq, end_seq)) + break; + + if ((next_dup != NULL) +
[PATCH 06/10] [TCP]: Prior_fackets can be replaced by highest_sack seq
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b7af304..29fff81 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1251,7 +1251,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ struct sk_buff *cached_skb; int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3; int reord = tp-packets_out; - int prior_fackets; int flag = 0; int found_dup_sack = 0; int cached_fack_count; @@ -1264,7 +1263,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tp-fackets_out = 0; tp-highest_sack = tcp_write_queue_head(sk); } - prior_fackets = tp-fackets_out; found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, prior_snd_una); @@ -1457,7 +1455,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ /* New sack for not retransmitted frame, * which was in hole. It is reordering. */ - if (fack_count prior_fackets) + if (before(TCP_SKB_CB(skb)-seq, + tcp_highest_sack_seq(tp))) reord = min(fack_count, reord); /* SACK enhanced F-RTO (RFC4138; Appendix B) */ -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] [TCP]: Convert highest_sack to sk_buff to allow direct access
It is going to replace the sack fastpath hint quite soon... :-) Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/linux/tcp.h |6 -- include/net/tcp.h | 10 ++ net/ipv4/tcp_input.c | 12 ++-- net/ipv4/tcp_output.c | 19 ++- 4 files changed, 30 insertions(+), 17 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index bac17c5..34acee6 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -332,8 +332,10 @@ struct tcp_sock { struct tcp_sack_block_wire recv_sack_cache[4]; - u32 highest_sack; /* Start seq of globally highest revd SACK -* (validity guaranteed only if sacked_out 0) */ + struct sk_buff *highest_sack; /* highest skb with SACK received +* (validity guaranteed only if +* sacked_out 0) +*/ /* from STCP, retrans queue hinting */ struct sk_buff* lost_skb_hint; diff --git a/include/net/tcp.h b/include/net/tcp.h index d695cea..3444647 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1306,6 +1306,16 @@ static inline int tcp_write_queue_empty(struct sock *sk) return skb_queue_empty(sk-sk_write_queue); } +/* Start sequence of the highest skb with SACKed bit, valid only if + * sacked 0 or when the caller has ensured validity by itself. + */ +static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp) +{ + if (!tp-sacked_out) + return tp-snd_una; + return TCP_SKB_CB(tp-highest_sack)-seq; +} + /* /proc */ enum tcp_seq_states { TCP_SEQ_STATE_LISTENING, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ef8187b..c25704f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1245,7 +1245,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3; int reord = tp-packets_out; int prior_fackets; - u32 highest_sack_end_seq = tp-lost_retrans_low; + u32 highest_sack_end_seq; int flag = 0; int found_dup_sack = 0; int cached_fack_count; @@ -1256,7 +1256,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (!tp-sacked_out) { if (WARN_ON(tp-fackets_out)) tp-fackets_out = 0; - tp-highest_sack = tp-snd_una; + tp-highest_sack = tcp_write_queue_head(sk); } prior_fackets = tp-fackets_out; @@ -1483,10 +1483,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (fack_count tp-fackets_out) tp-fackets_out = fack_count; - if (after(TCP_SKB_CB(skb)-seq, tp-highest_sack)) { - tp-highest_sack = TCP_SKB_CB(skb)-seq; - highest_sack_end_seq = TCP_SKB_CB(skb)-end_seq; - } + if (after(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(tp))) + tp-highest_sack = skb; + } else { if (dup_sack (sackedTCPCB_RETRANS)) reord = min(fack_count, reord); @@ -1514,6 +1513,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ flag = ~FLAG_ONLY_ORIG_SACKED; } + highest_sack_end_seq = TCP_SKB_CB(tp-highest_sack)-end_seq; if (tcp_is_fack(tp) tp-retrans_out after(highest_sack_end_seq, tp-lost_retrans_low) icsk-icsk_ca_state == TCP_CA_Recovery) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 324b420..a5863f9 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -657,13 +657,15 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb, unsigned * tweak SACK fastpath hint too as it would overwrite all changes unless * hint is also changed. */ -static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb, +static void tcp_adjust_fackets_out(struct sock *sk, struct sk_buff *skb, int decr) { + struct tcp_sock *tp = tcp_sk(sk); + if (!tp-sacked_out || tcp_is_reno(tp)) return; - if (!before(tp-highest_sack, TCP_SKB_CB(skb)-seq)) + if (!before(tcp_highest_sack_seq(tp), TCP_SKB_CB(skb)-seq)) tp-fackets_out -= decr; /* cnt_hint is off-by-one compared with fackets_out (see sacktag) */ @@ -712,9 +714,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss TCP_SKB_CB(buff)-end_seq = TCP_SKB_CB(skb)-end_seq;
[RFC PATCH net-2.6.25 0/10] [TCP]: Cleanups, tweaks sacktag recode
Hi Dave, Here's the sacktag recv_sack_cache usage rewrite which you were interested to look at earlier, due to other fixes it has dragged on this long... Besides that, couple of new bugs^W^Wcleanups tweaks are there as well :-). I'll probably have to summon create tcp_sacktag_state patch back to avoid all that pointer passing all-around. But those won't be earth-shattering changes. The first two are probably trivial enough to be accepted as is. Boot simple transfer tested, minor fixes after that. I'll try to arrange time at some point of time to do more verification for the new sacktag and rfc3517 code, and compare old and new sacktag to get some numbers from accessed skbs per sacktag operation. -- i. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] [TCP]: Move !in_sack test earlier in sacktag reorganize if()s
All intermediate conditions include it already, make them simpler as well. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 31 ++- 1 files changed, 14 insertions(+), 17 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0f0c1c9..c470b5a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1406,28 +1406,25 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (unlikely(in_sack 0)) break; + if (!in_sack) { + fack_count += tcp_skb_pcount(skb); + continue; + } + sacked = TCP_SKB_CB(skb)-sacked; /* Account D-SACK for retransmitted packet. */ - if ((dup_sack in_sack) - (sacked TCPCB_RETRANS) - after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker)) - tp-undo_retrans--; - - /* The frame is ACKed. */ - if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) { - if (sackedTCPCB_RETRANS) { - if ((dup_sack in_sack) - (sackedTCPCB_SACKED_ACKED)) - reord = min(fack_count, reord); - } - - /* Nothing to do; acked frame is about to be dropped. */ - fack_count += tcp_skb_pcount(skb); - continue; + if (dup_sack (sacked TCPCB_RETRANS)) { + if (after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker)) + tp-undo_retrans--; + if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una) + (sacked TCPCB_SACKED_ACKED)) + reord = min(fack_count, reord); } - if (!in_sack) { + + /* Nothing to do; acked frame is about to be dropped (was ACKed). */ + if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) { fack_count += tcp_skb_pcount(skb); continue; } -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/10] [TCP]: non-FACK SACK follows conservative SACK loss recovery
Many assumptions that are true when no reordering or other strange events happen are not a part of the RFC3517. FACK implementation is based on such assumptions. Previously (before the rewrite) the non-FACK SACK was basically doing fast rexmit and then it times out all skbs when first cumulative ACK arrives, which cannot really be called SACK based recovery :-). RFC3517 SACK disables these things: - Per SKB timeouts head timeout entry to recovery - Marking at least one skb while in recovery (RFC3517 does this only for the fast retransmission but not for the other skbs when cumulative ACKs arrive in the recovery) - Sacktag's loss detection flavors B and C (see comment before tcp_sacktag_write_queue) This does not implement the last resort rule 3 of NextSeg, which allows retransmissions also when not enough SACK blocks have yet arrived above a segment for IsLost to return true [RFC3517]. The implementation differs from RFC3517 in these points: - Rate-halving is used instead of FlightSize / 2 - Instead of using dupACKs to trigger the recovery, the number of SACK blocks is used as FACK does with SACK blocks+holes (which provides more accurate number). It seems that the difference can affect negatively only if the receiver does not generate SACK blocks at all even though it claimed to be SACK-capable. - Dupthresh is not a constant one. Dynamical adjustments include both holes and sacked segments (equal to what FACK has) due to complexity involved in determining the number sacked blocks between highest_sack and the reordered segment. Thus it's will be an over-estimate. Implementation note: tcp_clean_rtx_queue doesn't need a lost_cnt tweak because head skb at that point cannot be SACKED_ACKED (nor would such situation last for long enough to cause problems). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 80 ++--- 1 files changed, 62 insertions(+), 18 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 48c059d..c1b5339 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -863,6 +863,9 @@ void tcp_enter_cwr(struct sock *sk, const int set_ssthresh) */ static void tcp_disable_fack(struct tcp_sock *tp) { + /* RFC3517 uses different metric in lost marker = reset on change */ + if (tcp_is_fack(tp)) + tp-lost_skb_hint = NULL; tp-rx_opt.sack_ok = ~2; } @@ -1470,6 +1473,13 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tp-sacked_out += tcp_skb_pcount(skb); fack_count += tcp_skb_pcount(skb); + + /* Lost marker hint past SACKed? Tweak RFC3517 cnt */ + if (!tcp_is_fack(tp) (tp-lost_skb_hint != NULL) + before(TCP_SKB_CB(skb)-seq, + TCP_SKB_CB(tp-lost_skb_hint)-seq)) + tp-lost_cnt_hint += tcp_skb_pcount(skb); + if (fack_count tp-fackets_out) tp-fackets_out = fack_count; @@ -1504,7 +1514,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ flag = ~FLAG_ONLY_ORIG_SACKED; } - if (tp-retrans_out + if (tcp_is_fack(tp) tp-retrans_out after(highest_sack_end_seq, tp-lost_retrans_low) icsk-icsk_ca_state == TCP_CA_Recovery) flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq); @@ -1858,6 +1868,26 @@ static inline int tcp_fackets_out(struct tcp_sock *tp) return tcp_is_reno(tp) ? tp-sacked_out+1 : tp-fackets_out; } +/* Heurestics to calculate number of duplicate ACKs. There's no dupACKs + * counter when SACK is enabled (without SACK, sacked_out is used for + * that purpose). + * + * Instead, with FACK TCP uses fackets_out that includes both SACKed + * segments up to the highest received SACK block so far and holes in + * between them. + * + * With reordering, holes may still be in flight, so RFC3517 recovery + * uses pure sacked_out (total number of SACKed segments) even though + * it violates the RFC that uses duplicate ACKs, often these are equal + * but when e.g. out-of-window ACKs or packet duplication occurs, + * they differ. Since neither occurs due to loss, TCP should really + * ignore them. + */ +static inline int tcp_dupack_heurestics(struct tcp_sock *tp) +{ + return tcp_is_fack(tp) ? tp-fackets_out : tp-sacked_out + 1; +} + static inline int tcp_skb_timedout(struct sock *sk, struct sk_buff *skb) { return (tcp_time_stamp - TCP_SKB_CB(skb)-when inet_csk(sk)-icsk_rto); @@ -1978,13 +2008,13 @@ static int tcp_time_to_recover(struct sock *sk) return 1; /* Not-A-Trick#2 : Classic rule... */ - if (tcp_fackets_out(tp)
[PATCH 02/10] [TCP]: Extend reordering detection to cover CA_Loss partially
This implements more accurately what is stated in sacktag's overall comment: Both of these heuristics are not used in Loss state, when we cannot account for retransmits accurately. When CA_Loss state is entered, the state changer ensures that undo_marker is only set if no TCPCB_RETRANS skbs were found, thus having non-zero undo_marker in CA_Loss basically tells that the R-bits still accurately reflect the current state of TCP. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c470b5a..48c059d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1511,7 +1511,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tcp_verify_left_out(tp); - if ((reord tp-fackets_out) icsk-icsk_ca_state != TCP_CA_Loss + if ((reord tp-fackets_out) + ((icsk-icsk_ca_state != TCP_CA_Loss) || tp-undo_marker) (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark))) tcp_update_reordering(sk, tp-fackets_out - reord, 0); -- 1.5.0.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] [TCP]: Create tcp_sacktag_one().
Worker function that implements the main logic of the inner-most loop of tcp_sacktag_write_queue(). Idea was originally presented by David S. Miller. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 192 +- 1 files changed, 96 insertions(+), 96 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 29fff81..b301abb 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1240,6 +1240,99 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb, return in_sack; } +static int tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp, + int *reord, int dup_sack, int fack_count) +{ + u8 sacked = TCP_SKB_CB(skb)-sacked; + int flag = 0; + + /* Account D-SACK for retransmitted packet. */ + if (dup_sack (sacked TCPCB_RETRANS)) { + if (after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker)) + tp-undo_retrans--; + if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una) + (sacked TCPCB_SACKED_ACKED)) + *reord = min(fack_count, *reord); + } + + /* Nothing to do; acked frame is about to be dropped (was ACKed). */ + if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) + return flag; + + if (!(sacked TCPCB_SACKED_ACKED)) { + if (sacked TCPCB_SACKED_RETRANS) { + /* If the segment is not tagged as lost, +* we do not clear RETRANS, believing +* that retransmission is still in flight. +*/ + if (sacked TCPCB_LOST) { + TCP_SKB_CB(skb)-sacked = + ~(TCPCB_LOST|TCPCB_SACKED_RETRANS); + tp-lost_out -= tcp_skb_pcount(skb); + tp-retrans_out -= tcp_skb_pcount(skb); + + /* clear lost hint */ + tp-retransmit_skb_hint = NULL; + } + } else { + if (!(sacked TCPCB_RETRANS)) { + /* New sack for not retransmitted frame, +* which was in hole. It is reordering. +*/ + if (before(TCP_SKB_CB(skb)-seq, + tcp_highest_sack_seq(tp))) + *reord = min(fack_count, *reord); + + /* SACK enhanced F-RTO (RFC4138; Appendix B) */ + if (!after(TCP_SKB_CB(skb)-end_seq, tp-frto_highmark)) + flag |= FLAG_ONLY_ORIG_SACKED; + } + + if (sacked TCPCB_LOST) { + TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST; + tp-lost_out -= tcp_skb_pcount(skb); + + /* clear lost hint */ + tp-retransmit_skb_hint = NULL; + } + } + + TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED; + flag |= FLAG_DATA_SACKED; + tp-sacked_out += tcp_skb_pcount(skb); + + fack_count += tcp_skb_pcount(skb); + + /* Lost marker hint past SACKed? Tweak RFC3517 cnt */ + if (!tcp_is_fack(tp) (tp-lost_skb_hint != NULL) + before(TCP_SKB_CB(skb)-seq, + TCP_SKB_CB(tp-lost_skb_hint)-seq)) + tp-lost_cnt_hint += tcp_skb_pcount(skb); + + if (fack_count tp-fackets_out) + tp-fackets_out = fack_count; + + if (after(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(tp))) + tp-highest_sack = skb; + + } else { + if (dup_sack (sacked TCPCB_RETRANS)) + *reord = min(fack_count, *reord); + } + + /* D-SACK. We can detect redundant retransmission in S|R and plain R +* frames and clear it. undo_retrans is decreased above, L|R frames +* are accounted above as well. +*/ + if (dup_sack (TCP_SKB_CB(skb)-sacked TCPCB_SACKED_RETRANS)) { + TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS; + tp-retrans_out -= tcp_skb_pcount(skb); + tp-retransmit_skb_hint = NULL; + } + + return flag; +} + static int tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una) { @@ -1375,7 +1468,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tcp_for_write_queue_from(skb, sk) { int in_sack = 0; - u8 sacked; if (skb
[PATCH 08/10] [TCP]: Earlier SACK block verification simplify access to them
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/linux/tcp.h |2 +- net/ipv4/tcp_input.c | 85 ++ 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 34acee6..794497c 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -330,7 +330,7 @@ struct tcp_sock { struct tcp_sack_block duplicate_sack[1]; /* D-SACK block */ struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/ - struct tcp_sack_block_wire recv_sack_cache[4]; + struct tcp_sack_block recv_sack_cache[4]; struct sk_buff *highest_sack; /* highest skb with SACK received * (validity guaranteed only if diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b301abb..69f2f79 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1340,9 +1340,11 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ struct tcp_sock *tp = tcp_sk(sk); unsigned char *ptr = (skb_transport_header(ack_skb) + TCP_SKB_CB(ack_skb)-sacked); - struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2); + struct tcp_sack_block_wire *sp_wire = (struct tcp_sack_block_wire *)(ptr+2); + struct tcp_sack_block sp[4]; struct sk_buff *cached_skb; int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3; + int used_sacks; int reord = tp-packets_out; int flag = 0; int found_dup_sack = 0; @@ -1357,7 +1359,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tp-highest_sack = tcp_write_queue_head(sk); } - found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, + found_dup_sack = tcp_check_dsack(tp, ack_skb, sp_wire, num_sacks, prior_snd_una); if (found_dup_sack) flag |= FLAG_DSACKING_ACK; @@ -1372,14 +1374,49 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (!tp-packets_out) goto out; + used_sacks = 0; + first_sack_index = 0; + for (i = 0; i num_sacks; i++) { + int dup_sack = !i found_dup_sack; + + sp[used_sacks].start_seq = ntohl(get_unaligned(sp_wire[i].start_seq)); + sp[used_sacks].end_seq = ntohl(get_unaligned(sp_wire[i].end_seq)); + + if (!tcp_is_sackblock_valid(tp, dup_sack, + sp[used_sacks].start_seq, + sp[used_sacks].end_seq)) { + if (dup_sack) { + if (!tp-undo_marker) + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDNOUNDO); + else + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDOLD); + } else { + /* Don't count olds caused by ACK reordering */ + if ((TCP_SKB_CB(ack_skb)-ack_seq != tp-snd_una) + !after(sp[used_sacks].end_seq, tp-snd_una)) + continue; + NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD); + } + if (i == 0) + first_sack_index = -1; + continue; + } + + /* Ignore very old stuff early */ + if (!after(sp[used_sacks].end_seq, prior_snd_una)) + continue; + + used_sacks++; + } + /* SACK fastpath: * if the only SACK change is the increase of the end_seq of * the first block then only apply that SACK block * and use retrans queue hinting otherwise slowpath */ force_one_sack = 1; - for (i = 0; i num_sacks; i++) { - __be32 start_seq = sp[i].start_seq; - __be32 end_seq = sp[i].end_seq; + for (i = 0; i used_sacks; i++) { + u32 start_seq = sp[i].start_seq; + u32 end_seq = sp[i].end_seq; if (i == 0) { if (tp-recv_sack_cache[i].start_seq != start_seq) @@ -1398,19 +1435,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tp-recv_sack_cache[i].end_seq = 0; } - first_sack_index = 0; if (force_one_sack) - num_sacks = 1; + used_sacks = 1; else { int j; tp-fastpath_skb_hint = NULL; /* order SACK blocks to allow in order walk of the retrans queue */ - for (i = num_sacks-1; i 0; i--) { + for (i = used_sacks - 1; i 0; i--) {
[PATCH 10/10] [TCP]: Track sacktag (DEVEL PATCH)
This is not intented to go to mainline, provided just for those who are interested enough about the algorithm internals during a test. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/linux/snmp.h | 19 +++ net/ipv4/proc.c | 19 +++ net/ipv4/tcp_input.c | 50 -- 3 files changed, 86 insertions(+), 2 deletions(-) diff --git a/include/linux/snmp.h b/include/linux/snmp.h index 89f0c2b..fbcd62d 100644 --- a/include/linux/snmp.h +++ b/include/linux/snmp.h @@ -214,6 +214,25 @@ enum LINUX_MIB_TCPDSACKIGNOREDOLD, /* TCPSACKIgnoredOld */ LINUX_MIB_TCPDSACKIGNOREDNOUNDO,/* TCPSACKIgnoredNoUndo */ LINUX_MIB_TCPSPURIOUSRTOS, /* TCPSpuriousRTOs */ + LINUX_MIB_TCP_SACK0, + LINUX_MIB_TCP_SACK1, + LINUX_MIB_TCP_SACK2, + LINUX_MIB_TCP_SACK3, + LINUX_MIB_TCP_SACK4, + LINUX_MIB_TCP_WALKEDSKBS, + LINUX_MIB_TCP_WALKEDDSACKS, + LINUX_MIB_TCP_SKIPPEDSKBS, + LINUX_MIB_TCP_NOCACHE, + LINUX_MIB_TCP_HEADWALK, + LINUX_MIB_TCP_FULLSKIP, + LINUX_MIB_TCP_TAILSKIP, + LINUX_MIB_TCP_HEADSKIP_TOHIGH, + LINUX_MIB_TCP_TAIL_TOHIGH, + LINUX_MIB_TCP_HEADSKIP, + LINUX_MIB_TCP_NEWSKIP, + LINUX_MIB_TCP_FULLWALK, + LINUX_MIB_TCP_TAILWALK, + LINUX_MIB_TCP_CACHEREMAINING, __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index ce34b28..a5e842d 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -227,6 +227,25 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM(TCPDSACKIgnoredOld, LINUX_MIB_TCPDSACKIGNOREDOLD), SNMP_MIB_ITEM(TCPDSACKIgnoredNoUndo, LINUX_MIB_TCPDSACKIGNOREDNOUNDO), SNMP_MIB_ITEM(TCPSpuriousRTOs, LINUX_MIB_TCPSPURIOUSRTOS), + SNMP_MIB_ITEM(TCP_SACK0, LINUX_MIB_TCP_SACK0), + SNMP_MIB_ITEM(TCP_SACK1, LINUX_MIB_TCP_SACK1), + SNMP_MIB_ITEM(TCP_SACK2, LINUX_MIB_TCP_SACK2), + SNMP_MIB_ITEM(TCP_SACK3, LINUX_MIB_TCP_SACK3), + SNMP_MIB_ITEM(TCP_SACK4, LINUX_MIB_TCP_SACK4), + SNMP_MIB_ITEM(TCP_WALKEDSKBS, LINUX_MIB_TCP_WALKEDSKBS), + SNMP_MIB_ITEM(TCP_WALKEDDSACKS, LINUX_MIB_TCP_WALKEDDSACKS), + SNMP_MIB_ITEM(TCP_SKIPPEDSKBS, LINUX_MIB_TCP_SKIPPEDSKBS), + SNMP_MIB_ITEM(TCP_NOCACHE, LINUX_MIB_TCP_NOCACHE), + SNMP_MIB_ITEM(TCP_FULLWALK, LINUX_MIB_TCP_FULLWALK), + SNMP_MIB_ITEM(TCP_HEADWALK, LINUX_MIB_TCP_HEADWALK), + SNMP_MIB_ITEM(TCP_TAILWALK, LINUX_MIB_TCP_TAILWALK), + SNMP_MIB_ITEM(TCP_FULLSKIP, LINUX_MIB_TCP_FULLSKIP), + SNMP_MIB_ITEM(TCP_TAILSKIP, LINUX_MIB_TCP_TAILSKIP), + SNMP_MIB_ITEM(TCP_HEADSKIP, LINUX_MIB_TCP_HEADSKIP), + SNMP_MIB_ITEM(TCP_HEADSKIP_TOHIGH, LINUX_MIB_TCP_HEADSKIP_TOHIGH), + SNMP_MIB_ITEM(TCP_TAIL_TOHIGH, LINUX_MIB_TCP_TAIL_TOHIGH), + SNMP_MIB_ITEM(TCP_NEWSKIP, LINUX_MIB_TCP_NEWSKIP), + SNMP_MIB_ITEM(TCP_CACHEREMAINING, LINUX_MIB_TCP_CACHEREMAINING), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5833b01..87ab327 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1370,6 +1370,10 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk, *flag |= tcp_sacktag_one(skb, tp, reord, dup_sack, *fack_count); *fack_count += tcp_skb_pcount(skb); + + NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDSKBS); + if (dup_sack) + NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDDSACKS); } return skb; } @@ -1386,6 +1390,8 @@ static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk, if (before(TCP_SKB_CB(skb)-end_seq, skip_to_seq)) break; + + NET_INC_STATS_BH(LINUX_MIB_TCP_SKIPPEDSKBS); } return skb; } @@ -1434,6 +1440,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int fack_count; int i, j; int first_sack_index; + int fullwalk = 1; if (!tp-sacked_out) { if (WARN_ON(tp-fackets_out)) @@ -1523,6 +1530,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ cache++; } + switch (used_sacks) { + case 0: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK0); break; + case 1: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK1); break; + case 2: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK2); break; + case 3: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK3); break; + case 4: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK4); break; + } + + if (!tcp_sack_cache_ok(tp, cache)) + NET_INC_STATS_BH(LINUX_MIB_TCP_NOCACHE); + while (i used_sacks) { u32 start_seq = sp[i].start_seq; u32 end_seq = sp[i].end_seq; @@
Re: [PATCH 1/2] cleanup pernet operation without CONFIG_NET_NS
Denis V. Lunev [EMAIL PROTECTED] writes: If CONFIG_NET_NS is not set, the only namespace is possible. This patch removes list of pernet_operations and cleanups code a bit. This list is not needed if there are no namespaces. We should just call -init method. Additionally, the -exit will be called on module unloading only. This case is safe - the code is not discarded. For the in/kernel code, -exit should never be called. This patch looks sane, and reasonable in the !CONFIG_NET_NS case. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] move unneeded data to initdata section
Eric W. Biederman wrote: Denis V. Lunev [EMAIL PROTECTED] writes: This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35 It diets .text .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Ok. This patch is technically safe because none of the touched code can live in a module and so we never touch the exit code path. However in the general case and as a code idiom this __net_initdata on struct pernet_operations is fundamentally horribly broken. Look at what happens if we use this idiom in module. There is only one definition of __initdata .init.data. The module loader places all sections that begin with .init in a region of memory that will be discarded after module initialization. nothing is discarded after module load. Though, I can be wrong. Could you point me to the exact place? Regards, Den - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] move unneeded data to initdata section
Denis V. Lunev [EMAIL PROTECTED] writes: This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35 It diets .text .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Ok. This patch is technically safe because none of the touched code can live in a module and so we never touch the exit code path. However in the general case and as a code idiom this __net_initdata on struct pernet_operations is fundamentally horribly broken. Look at what happens if we use this idiom in module. There is only one definition of __initdata .init.data. The module loader places all sections that begin with .init in a region of memory that will be discarded after module initialization. So in register_pernet_operations we pass in the a pointer to struct pernet_operations and call the init method. Later when we remove the module we again pass in the pointer to struct pernet_operations which lived in an init section so it has been discarded. We dereference that pointer to find the exit method and KABOOM So I'm still opposed to __net_initdata on the grounds that at best it is like putting our head under a guillotine and reaching up and sawing at the row that holds the blade up with a pocket knife. It is a think rope and a puny knife so you are safe for a while Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [BUG] New Kernel Bugs
On 15-11-07 14:00, Jörn Engel wrote: And even without mails being held hostage for weeks, every single moderation mail is annoying. Like the one I'm sure to receive after sending this out. Certainly. Upto this thread I wasn't actually aware the list was doing that. While it might be informative once, getting it each time quickly gets old. Don't know if mailman can do anything like it but I'd suggest anyone running a non-subscriber-moderation list configure it to send such messages at most once a time-period per address or some such. And just disable the message if it cannot do that. Fortunately, alsa-devel is (almost) no longer such a list anyway as it's moving to vger. Hurrah. David -- thanks. Rene. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
+ +struct timer_list stattimer; /* timer for statistics update */ +struct s_stats stats; /* packet statistics */ +struct s_pstats pstats; /* receive list statistics */ More global variables without prefix. These variables are not exported with EXPORT_SYMBOL, so there should be no name conflict. They cannot be made static because they are used in af_can.c and proc.c. Nevertheless we can prefix them with can_ if you still think it's necessary. When this is build-in they will be in the global kernel namespace. So please add can_ prefix. Sam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] CAN: Add PF_CAN core module
On Thu, Nov 15, 2007 at 04:05:30AM -0800, David Miller wrote: From: Urs Thuermann [EMAIL PROTECTED] Date: 15 Nov 2007 12:51:34 +0100 I prefer our code because it is shorter (fits into one line) and can be used anywhere where an expression is allowed compared to only where a statement is allowed. Actually, I first had #define DBG( ... ) ((debug 1) printk( ... )) and so on, but that didn't work with can_debug_{cframe,sbk} since they return void. Admitted, the benefit of expr vs. statement is really negligible and since this issue has come up several times I will change these macros using do-while. I really frown upon these local debugging macros people tend to want to submit with their changes. It really craps up the tree, even though it might be useful to you. So please remove this stuff or replace the debugging statements with some generic kernel debugging facility, there are several. It would be usefull if someone could make a short intro to the preferred ones and we could stuff it in Documentation/* Had same comment but had nowhere to point the can guys at. Sam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] move unneeded data to initdata section
On Thu, Nov 15, 2007 at 05:42:04PM +0300, Denis V. Lunev wrote: Eric W. Biederman wrote: Denis V. Lunev [EMAIL PROTECTED] writes: This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35 It diets .text .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Ok. This patch is technically safe because none of the touched code can live in a module and so we never touch the exit code path. However in the general case and as a code idiom this __net_initdata on struct pernet_operations is fundamentally horribly broken. Look at what happens if we use this idiom in module. There is only one definition of __initdata .init.data. The module loader places all sections that begin with .init in a region of memory that will be discarded after module initialization. nothing is discarded after module load. Though, I can be wrong. Could you point me to the exact place? If __initdata is not discarded after module load then we should do it. There is no reason to waste __initdata RAM when the module is loaded. Sam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25 4/6] net: Make AF_PACKET handle multiple network namespaces
This is done by making packet_sklist_lock and packet_sklist per network namespace and adding an additional filter condition on received packets to ensure they came from the proper network namespace. Changes from v1: - prohibit to call inet_dgram_ops.ioctl in other than init_net Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/net/net_namespace.h |4 + net/packet/af_packet.c | 131 --- 2 files changed, 89 insertions(+), 46 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 90802a6..4d0d634 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -32,6 +32,10 @@ struct net { struct hlist_head *dev_index_head; struct sock *rtnl; /* rtnetlink socket */ + + /* List of all packet sockets. */ + rwlock_tpacket_sklist_lock; + struct hlist_head packet_sklist; }; #ifdef CONFIG_NET diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8a7807d..45e3cbc 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -135,10 +135,6 @@ dev-hard_header == NULL (ll header is added by device, we cannot control it) packet classifier depends on it. */ -/* List of all packet sockets. */ -static HLIST_HEAD(packet_sklist); -static DEFINE_RWLOCK(packet_sklist_lock); - /* Private packet socket structures. */ struct packet_mclist @@ -246,9 +242,6 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, struct struct sock *sk; struct sockaddr_pkt *spkt; - if (dev-nd_net != init_net) - goto out; - /* * When we registered the protocol we saved the socket in the data * field for just this event. @@ -270,6 +263,9 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, struct if (skb-pkt_type == PACKET_LOOPBACK) goto out; + if (dev-nd_net != sk-sk_net) + goto out; + if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) goto oom; @@ -341,7 +337,7 @@ static int packet_sendmsg_spkt(struct kiocb *iocb, struct socket *sock, */ saddr-spkt_device[13] = 0; - dev = dev_get_by_name(init_net, saddr-spkt_device); + dev = dev_get_by_name(sk-sk_net, saddr-spkt_device); err = -ENODEV; if (dev == NULL) goto out_unlock; @@ -449,15 +445,15 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, struct packet int skb_len = skb-len; unsigned int snaplen, res; - if (dev-nd_net != init_net) - goto drop; - if (skb-pkt_type == PACKET_LOOPBACK) goto drop; sk = pt-af_packet_priv; po = pkt_sk(sk); + if (dev-nd_net != sk-sk_net) + goto drop; + skb-dev = dev; if (dev-header_ops) { @@ -566,15 +562,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, struct packe struct sk_buff *copy_skb = NULL; struct timeval tv; - if (dev-nd_net != init_net) - goto drop; - if (skb-pkt_type == PACKET_LOOPBACK) goto drop; sk = pt-af_packet_priv; po = pkt_sk(sk); + if (dev-nd_net != sk-sk_net) + goto drop; + if (dev-header_ops) { if (sk-sk_type != SOCK_DGRAM) skb_push(skb, skb-data - skb_mac_header(skb)); @@ -732,7 +728,7 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock, } - dev = dev_get_by_index(init_net, ifindex); + dev = dev_get_by_index(sk-sk_net, ifindex); err = -ENXIO; if (dev == NULL) goto out_unlock; @@ -799,15 +795,17 @@ static int packet_release(struct socket *sock) { struct sock *sk = sock-sk; struct packet_sock *po; + struct net *net; if (!sk) return 0; + net = sk-sk_net; po = pkt_sk(sk); - write_lock_bh(packet_sklist_lock); + write_lock_bh(net-packet_sklist_lock); sk_del_node_init(sk); - write_unlock_bh(packet_sklist_lock); + write_unlock_bh(net-packet_sklist_lock); /* * Unhook packet receive handler. @@ -916,7 +914,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, int add return -EINVAL; strlcpy(name,uaddr-sa_data,sizeof(name)); - dev = dev_get_by_name(init_net, name); + dev = dev_get_by_name(sk-sk_net, name); if (dev) { err = packet_do_bind(sk, dev, pkt_sk(sk)-num); dev_put(dev); @@ -943,7 +941,7 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len if (sll-sll_ifindex) {
[PATCH 2.6.25 5/6] net: Make AF_UNIX per network namespace safe (v2)
From 337f0867c81ab93a1bc645e62896a798d0c864ac Mon Sep 17 00:00:00 2001 From: Denis V. Lunev [EMAIL PROTECTED] Date: Thu, 15 Nov 2007 15:04:12 +0300 Subject: [PATCH] net: Make AF_UNIX per network namespace safe [v2] Because of the global nature of garbage collection, and because of the cost of per namespace hash tables unix_socket_table has been kept global. With a filter added on lookups so we don't see sockets from the wrong namespace. Currently I don't fold the namesapce into the hash so multiple namespaces using the same socket name will be guaranteed a hash collision. Changes from v1: - fixed unix_seq_open Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- net/unix/af_unix.c | 118 --- 1 files changed, 92 insertions(+), 26 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index e835da8..93d7e55 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -270,7 +270,8 @@ static inline void unix_insert_socket(struct hlist_head *list, struct sock *sk) spin_unlock(unix_table_lock); } -static struct sock *__unix_find_socket_byname(struct sockaddr_un *sunname, +static struct sock *__unix_find_socket_byname(struct net *net, + struct sockaddr_un *sunname, int len, int type, unsigned hash) { struct sock *s; @@ -279,6 +280,9 @@ static struct sock *__unix_find_socket_byname(struct sockaddr_un *sunname, sk_for_each(s, node, unix_socket_table[hash ^ type]) { struct unix_sock *u = unix_sk(s); + if (s-sk_net != net) + continue; + if (u-addr-len == len !memcmp(u-addr-name, sunname, len)) goto found; @@ -288,21 +292,22 @@ found: return s; } -static inline struct sock *unix_find_socket_byname(struct sockaddr_un *sunname, +static inline struct sock *unix_find_socket_byname(struct net *net, + struct sockaddr_un *sunname, int len, int type, unsigned hash) { struct sock *s; spin_lock(unix_table_lock); - s = __unix_find_socket_byname(sunname, len, type, hash); + s = __unix_find_socket_byname(net, sunname, len, type, hash); if (s) sock_hold(s); spin_unlock(unix_table_lock); return s; } -static struct sock *unix_find_socket_byinode(struct inode *i) +static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) { struct sock *s; struct hlist_node *node; @@ -312,6 +317,9 @@ static struct sock *unix_find_socket_byinode(struct inode *i) unix_socket_table[i-i_ino (UNIX_HASH_SIZE - 1)]) { struct dentry *dentry = unix_sk(s)-dentry; + if (s-sk_net != net) + continue; + if(dentry dentry-d_inode == i) { sock_hold(s); @@ -631,9 +639,6 @@ out: static int unix_create(struct net *net, struct socket *sock, int protocol) { - if (net != init_net) - return -EAFNOSUPPORT; - if (protocol protocol != PF_UNIX) return -EPROTONOSUPPORT; @@ -677,6 +682,7 @@ static int unix_release(struct socket *sock) static int unix_autobind(struct socket *sock) { struct sock *sk = sock-sk; + struct net *net = sk-sk_net; struct unix_sock *u = unix_sk(sk); static u32 ordernum = 1; struct unix_address * addr; @@ -703,7 +709,7 @@ retry: spin_lock(unix_table_lock); ordernum = (ordernum+1)0xF; - if (__unix_find_socket_byname(addr-name, addr-len, sock-type, + if (__unix_find_socket_byname(net, addr-name, addr-len, sock-type, addr-hash)) { spin_unlock(unix_table_lock); /* Sanity yield. It is unusual case, but yet... */ @@ -723,7 +729,8 @@ out:mutex_unlock(u-readlock); return err; } -static struct sock *unix_find_other(struct sockaddr_un *sunname, int len, +static struct sock *unix_find_other(struct net *net, + struct sockaddr_un *sunname, int len, int type, unsigned hash, int *error) { struct sock *u; @@ -741,7 +748,7 @@ static struct sock *unix_find_other(struct sockaddr_un *sunname, int len, err = -ECONNREFUSED; if (!S_ISSOCK(nd.dentry-d_inode-i_mode)) goto put_fail; - u=unix_find_socket_byinode(nd.dentry-d_inode); + u=unix_find_socket_byinode(net, nd.dentry-d_inode); if (!u) goto put_fail; @@ -757,7 +764,7 @@
[PATCH 2.6.25 6/6] net: consolidate net namespace related proc files creation
net: consolidate net namespace related proc files creation Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- fs/proc/proc_net.c | 38 ++ include/linux/seq_file.h | 13 + net/core/dev.c | 28 +--- net/core/dev_mcast.c | 26 -- net/netlink/af_netlink.c | 33 +++-- net/packet/af_packet.c | 26 -- net/unix/af_unix.c | 31 ++- net/wireless/wext.c | 24 +++- 8 files changed, 80 insertions(+), 139 deletions(-) diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 131f9c6..421ea28 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -22,10 +22,48 @@ #include linux/mount.h #include linux/nsproxy.h #include net/net_namespace.h +#include linux/seq_file.h #include internal.h +int seq_open_net(struct inode *ino, struct file *f, +const struct seq_operations *ops, int size) +{ + struct net *net; + struct seq_net_private *p; + + BUG_ON(size sizeof(*p)); + + net = get_proc_net(ino); + if (net == NULL) + return -ENXIO; + + p = __seq_open_private(f, ops, size); + if (p == NULL) { + put_net(net); + return -ENOMEM; + } + p-net = net; + return 0; +} +EXPORT_SYMBOL_GPL(seq_open_net); + +int seq_release_net(struct inode *ino, struct file *f) +{ + struct seq_file *seq; + struct seq_net_private *p; + + seq = f-private_data; + p = seq-private; + + put_net(p-net); + seq_release_private(ino, f); + return 0; +} +EXPORT_SYMBOL_GPL(seq_release_net); + + struct proc_dir_entry *proc_net_fops_create(struct net *net, const char *name, mode_t mode, const struct file_operations *fops) { diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index ebbc02b..648dfeb 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -63,5 +63,18 @@ extern struct list_head *seq_list_start_head(struct list_head *head, extern struct list_head *seq_list_next(void *v, struct list_head *head, loff_t *ppos); +struct net; +struct seq_net_private { + struct net *net; +}; + +int seq_open_net(struct inode *, struct file *, +const struct seq_operations *, int); +int seq_release_net(struct inode *, struct file *); +static inline struct net *seq_file_net(struct seq_file *seq) +{ + return ((struct seq_net_private *)seq-private)-net; +} + #endif #endif diff --git a/net/core/dev.c b/net/core/dev.c index 86d6261..043e2f8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2360,7 +2360,7 @@ static int dev_ifconf(struct net *net, char __user *arg) */ void *dev_seq_start(struct seq_file *seq, loff_t *pos) { - struct net *net = seq-private; + struct net *net = seq_file_net(seq); loff_t off; struct net_device *dev; @@ -2378,7 +2378,7 @@ void *dev_seq_start(struct seq_file *seq, loff_t *pos) void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct net *net = seq-private; + struct net *net = seq_file_net(seq); ++*pos; return v == SEQ_START_TOKEN ? first_net_device(net) : next_net_device((struct net_device *)v); @@ -2477,26 +2477,8 @@ static const struct seq_operations dev_seq_ops = { static int dev_seq_open(struct inode *inode, struct file *file) { - struct seq_file *seq; - int res; - res = seq_open(file, dev_seq_ops); - if (!res) { - seq = file-private_data; - seq-private = get_proc_net(inode); - if (!seq-private) { - seq_release(inode, file); - res = -ENXIO; - } - } - return res; -} - -static int dev_seq_release(struct inode *inode, struct file *file) -{ - struct seq_file *seq = file-private_data; - struct net *net = seq-private; - put_net(net); - return seq_release(inode, file); + return seq_open_net(inode, file, dev_seq_ops, + sizeof(struct seq_net_private)); } static const struct file_operations dev_seq_fops = { @@ -2504,7 +2486,7 @@ static const struct file_operations dev_seq_fops = { .open= dev_seq_open, .read= seq_read, .llseek = seq_lseek, - .release = dev_seq_release, + .release = seq_release_net, }; static const struct seq_operations softnet_seq_ops = { diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c index 69fff16..63f0b33 100644 --- a/net/core/dev_mcast.c +++ b/net/core/dev_mcast.c @@ -187,7 +187,7 @@ EXPORT_SYMBOL(dev_mc_unsync); #ifdef CONFIG_PROC_FS static void *dev_mc_seq_start(struct seq_file *seq, loff_t *pos) { - struct net *net = seq-private; +
Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)
Hello! Send a correct arp reply instead of one with sender ip and sender hardware adress in target fields. I do not see anything more legal in setting target address to 0. Actually, semantics of target address in ARP reply is ambiguous. If it is a reply to some real request, it is set to address of requestor and protocol requires recipient of this arp reply to test that the address matches its own address before creating new entry triggered by unsolicited arp reply. That's all. In the case of duplicate address detection, requestor does not have any address, so that it is absolutely not essential what we use as target address. The only place, which could depend on this is the tool, which tests for duplicate address. At least, arping written by me, should work with any variant. So, please, could you explain what did force you to think that use of 0 is better? Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, take2] netfilter : struct xt_table_info diet
On Thu, 15 Nov 2007 13:41:54 +0100 Patrick McHardy [EMAIL PROTECTED] wrote: Eric Dumazet wrote: [PATCH] netfilter : struct xt_table_info diet Instead of using a big array of NR_CPUS entries, we can compute the size needed at runtime, using nr_cpu_ids This should save some ram (especially on David's machines where NR_CPUS=4096 : 32 KB can be saved per table, and 64KB for dynamically allocated ones (because of slab/slub alignements) ) In particular, the 'bootstrap' tables are not any more static (in data section) but on stack as their size is now very small. This also should reduce the size used on stack in compat functions (get_info() declares an automatic variable, that could be bigger than kernel stack size for big NR_CPUS) I fixed a compilation error with CONFIG_COMPAT and applied it, thanks Eric. One question though: +#define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \ + + nr_cpu_ids * sizeof(char *)) /* overflow check */ - if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS - - SMP_CACHE_BYTES) + if (tmp.size = INT_MAX / num_possible_cpus()) return -ENOMEM; We need to make sure offsetof(struct xt_table_info, entries) + nr_cpu_ids * sizeof(char *) doesn't overflow, so why doesn't it use nr_cpu_ids here as well? nr_cpu_ids is = NR_CPUS, so XT_TABLE_INFO_SZ cannot overflow The 'overflow check' we do here is in fact not very usefull now that we dont need to multiply tmp.size by NR_CPUS and potentially overflow the result. We can delete the test, because kmalloc()/vmalloc() will probably fail gracefully if we ask too much memory. We could imagine a dual Opteron machine, with a total of 32GB of ram, and it could be possible to load a 3GB iptable (that would consume 2*3GB of ram), but the 'overflow check' test actually forbids such a scenario. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25 1/6] net: Modify all rtnetlink methods to only work in the initial namespace (v2)
Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- net/bridge/br_netlink.c |9 + net/core/fib_rules.c| 11 +++ net/core/neighbour.c| 18 ++ net/core/rtnetlink.c| 19 +++ net/decnet/dn_dev.c | 12 net/decnet/dn_fib.c |8 net/decnet/dn_route.c |8 net/decnet/dn_table.c |4 net/ipv4/devinet.c | 12 net/ipv4/fib_frontend.c | 12 net/ipv4/route.c|4 net/ipv6/addrconf.c | 31 +++ net/ipv6/addrlabel.c| 12 net/ipv6/ip6_fib.c |4 net/ipv6/route.c| 12 net/sched/act_api.c | 10 ++ net/sched/cls_api.c | 10 ++ net/sched/sch_api.c | 21 + 18 files changed, 217 insertions(+), 0 deletions(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 53ab8e0..a4ffa2b 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -13,6 +13,7 @@ #include linux/kernel.h #include net/rtnetlink.h #include net/net_namespace.h +#include net/sock.h #include br_private.h static inline size_t br_nlmsg_size(void) @@ -107,9 +108,13 @@ errout: */ static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) { + struct net *net = skb-sk-sk_net; struct net_device *dev; int idx; + if (net != init_net) + return 0; + idx = 0; for_each_netdev(init_net, dev) { /* not a bridge port */ @@ -135,12 +140,16 @@ skip: */ static int br_rtm_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) { + struct net *net = skb-sk-sk_net; struct ifinfomsg *ifm; struct nlattr *protinfo; struct net_device *dev; struct net_bridge_port *p; u8 new_state; + if (net != init_net) + return -EINVAL; + if (nlmsg_len(nlh) sizeof(*ifm)) return -EINVAL; diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 848132b..3b20b6f 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -228,6 +228,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) struct nlattr *tb[FRA_MAX+1]; int err = -EINVAL, unresolved = 0; + if (net != init_net) + return -EINVAL; + if (nlh-nlmsg_len nlmsg_msg_size(sizeof(*frh))) goto errout; @@ -358,12 +361,16 @@ errout: static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) { + struct net *net = skb-sk-sk_net; struct fib_rule_hdr *frh = nlmsg_data(nlh); struct fib_rules_ops *ops = NULL; struct fib_rule *rule, *tmp; struct nlattr *tb[FRA_MAX+1]; int err = -EINVAL; + if (net != init_net) + return -EINVAL; + if (nlh-nlmsg_len nlmsg_msg_size(sizeof(*frh))) goto errout; @@ -539,9 +546,13 @@ skip: static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb) { + struct net *net = skb-sk-sk_net; struct fib_rules_ops *ops; int idx = 0, family; + if (net != init_net) + return -EINVAL; + family = rtnl_msg_family(cb-nlh); if (family != AF_UNSPEC) { /* Protocol specific dump request */ diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 175bbc0..29f0a4d 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1449,6 +1449,9 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct net_device *dev = NULL; int err = -EINVAL; + if (net != init_net) + return -EINVAL; + if (nlmsg_len(nlh) sizeof(*ndm)) goto out; @@ -1515,6 +1518,9 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct net_device *dev = NULL; int err; + if (net != init_net) + return -EINVAL; + err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL); if (err 0) goto out; @@ -1789,11 +1795,15 @@ static const struct nla_policy nl_ntbl_parm_policy[NDTPA_MAX+1] = { static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) { + struct net *net = skb-sk-sk_net; struct neigh_table *tbl; struct ndtmsg *ndtmsg; struct nlattr *tb[NDTA_MAX+1]; int err; + if
[PATCH 2.6.25 2/6] net: Make rtnetlink infrastructure network namespace aware (v3)
After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/rtnetlink.h |8 +++--- include/net/net_namespace.h |3 ++ net/bridge/br_netlink.c |4 +- net/core/fib_rules.c|4 +- net/core/neighbour.c|4 +- net/core/rtnetlink.c| 63 +++--- net/decnet/dn_dev.c |4 +- net/decnet/dn_route.c |2 +- net/decnet/dn_table.c |4 +- net/ipv4/devinet.c |4 +- net/ipv4/fib_semantics.c|4 +- net/ipv4/ipmr.c |4 +- net/ipv4/route.c|2 +- net/ipv6/addrconf.c | 14 +- net/ipv6/addrlabel.c|2 +- net/ipv6/ndisc.c|5 ++- net/ipv6/route.c|6 ++-- net/sched/act_api.c |8 +++--- net/sched/cls_api.c |2 +- net/sched/sch_api.c |4 +- net/wireless/wext.c |5 +++- 21 files changed, 102 insertions(+), 54 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index e20dcc8..b014f6b 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -620,11 +620,11 @@ extern int __rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr, ({ data = RTA_PAYLOAD(rta) = len ? RTA_DATA(rta) : NULL; \ __rtattr_parse_nested_compat(tb, max, rta, len); }) -extern int rtnetlink_send(struct sk_buff *skb, u32 pid, u32 group, int echo); -extern int rtnl_unicast(struct sk_buff *skb, u32 pid); -extern int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group, +extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 group, int echo); +extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid); +extern int rtnl_notify(struct sk_buff *skb, struct net *net, u32 pid, u32 group, struct nlmsghdr *nlh, gfp_t flags); -extern void rtnl_set_sk_err(u32 group, int error); +extern void rtnl_set_sk_err(struct net *net, u32 group, int error); extern int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics); extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id, u32 ts, u32 tsage, long expires, diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 5dd6d90..90802a6 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -10,6 +10,7 @@ struct proc_dir_entry; struct net_device; +struct sock; struct net { atomic_tcount; /* To decided when the network * namespace should be freed. @@ -29,6 +30,8 @@ struct net { struct list_headdev_base_head; struct hlist_head *dev_name_head; struct hlist_head *dev_index_head; + + struct sock *rtnl; /* rtnetlink socket */ }; #ifdef CONFIG_NET diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index a4ffa2b..f5d6933 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -97,10 +97,10 @@ void br_ifinfo_notify(int event, struct net_bridge_port *port) kfree_skb(skb); goto errout; } - err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC); + err = rtnl_notify(skb, init_net,0, RTNLGRP_LINK, NULL, GFP_ATOMIC); errout: if (err 0) - rtnl_set_sk_err(RTNLGRP_LINK, err); + rtnl_set_sk_err(init_net, RTNLGRP_LINK, err); } /* diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 3b20b6f..0af0538 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -599,10 +599,10 @@ static void notify_rule_change(int event, struct fib_rule *rule, kfree_skb(skb); goto errout; } - err = rtnl_notify(skb, pid, ops-nlgroup, nlh, GFP_KERNEL); + err = rtnl_notify(skb, init_net, pid, ops-nlgroup, nlh, GFP_KERNEL); errout: if (err 0) - rtnl_set_sk_err(ops-nlgroup, err); + rtnl_set_sk_err(init_net, ops-nlgroup, err); } static void attach_rules(struct list_head *rules, struct net_device *dev) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 29f0a4d..a8b72c1 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2467,10 +2467,10 @@ static void __neigh_notify(struct neighbour *n, int type, int flags) kfree_skb(skb); goto errout; } - err = rtnl_notify(skb, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC); + err = rtnl_notify(skb, init_net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
[PATCH 2.6.25 3/6] net: Make the netlink methods in rtnetlink handle multiple network namespaces
From: Eric W. Biederman [EMAIL PROTECTED] After the previous prep work this just consists of removing checks limiting the code to work in the initial network namespace, and updating rtmsg_ifinfo so we can generate events for devices in something other then the initial network namespace. Referring to network other network devices like the IFLA_LINK and IFLA_MASTER attributes do, gets interesting if those network devices happen to be in other network namespaces. Currently ifindex numbers are allocated globally so I have taken the path of least resistance and not still report the information even though the devices they are talking about are invisible. If applications start getting confused or when ifindex numbers become local to the network namespace we may need to do something different in the future. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/rtnetlink.c | 27 +++ 1 files changed, 3 insertions(+), 24 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index be8e10c..4a07e83 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -705,9 +705,6 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) int s_idx = cb-args[0]; struct net_device *dev; - if (net != init_net) - return 0; - idx = 0; for_each_netdev(net, dev) { if (idx s_idx) @@ -910,9 +907,6 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct nlattr *tb[IFLA_MAX+1]; char ifname[IFNAMSIZ]; - if (net != init_net) - return -EINVAL; - err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err 0) goto errout; @@ -961,9 +955,6 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct nlattr *tb[IFLA_MAX+1]; int err; - if (net != init_net) - return -EINVAL; - err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err 0) return err; @@ -1045,9 +1036,6 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct nlattr *linkinfo[IFLA_INFO_MAX+1]; int err; - if (net != init_net) - return -EINVAL; - #ifdef CONFIG_KMOD replay: #endif @@ -1174,9 +1162,6 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg) struct sk_buff *nskb; int err; - if (net != init_net) - return -EINVAL; - err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err 0) return err; @@ -1212,13 +1197,9 @@ errout: static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb) { - struct net *net = skb-sk-sk_net; int idx; int s_idx = cb-family; - if (net != init_net) - return 0; - if (s_idx == 0) s_idx = 1; for (idx=1; idxNPROTO; idx++) { @@ -1240,6 +1221,7 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb) void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change) { + struct net *net = dev-nd_net; struct sk_buff *skb; int err = -ENOBUFS; @@ -1254,10 +1236,10 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change) kfree_skb(skb); goto errout; } - err = rtnl_notify(skb, init_net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL); + err = rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL); errout: if (err 0) - rtnl_set_sk_err(init_net, RTNLGRP_LINK, err); + rtnl_set_sk_err(net, RTNLGRP_LINK, err); } /* Protected by RTNL sempahore. */ @@ -1350,9 +1332,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi { struct net_device *dev = ptr; - if (dev-nd_net != init_net) - return NOTIFY_DONE; - switch (event) { case NETDEV_UNREGISTER: rtmsg_ifinfo(RTM_DELLINK, dev, ~0U); -- 1.5.3.rc5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[MACVLAN 00/02]: Macvlan update
These two patches remove an unnecessary check in macvlan_broadcast() and add the ability to change the mac address while the device is up. Please apply, thanks. drivers/net/macvlan.c | 26 -- 1 files changed, 24 insertions(+), 2 deletions(-) Patrick McHardy (2): [MACVLAN]: Remove unnecessary IFF_UP check [MACVLAN]: Allow setting mac address while device is up - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[MACVLAN 01/02]: Remove unnecessary IFF_UP check
[MACVLAN]: Remove unnecessary IFF_UP check Only devices that are UP are in the hash, so macvlan_broadcast() doesn't need to check for IFF_UP. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit e2d06a34b52a999e8c539d1cdef51ff523e2f2c2 tree f95a5eef37c421950ddc7318797909c0031ee948 parent 86aa441a13a474e66d484af38575609d9a0ff8ec author Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:33:24 +0100 committer Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:33:24 +0100 drivers/net/macvlan.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 2e4bcd5..461149c 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -73,8 +73,6 @@ static void macvlan_broadcast(struct sk_buff *skb, for (i = 0; i MACVLAN_HASH_SIZE; i++) { hlist_for_each_entry_rcu(vlan, n, port-vlan_hash[i], hlist) { dev = vlan-dev; - if (unlikely(!(dev-flags IFF_UP))) - continue; nskb = skb_clone(skb, GFP_ATOMIC); if (nskb == NULL) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, take2] netfilter : struct xt_table_info diet
Eric Dumazet wrote: On Thu, 15 Nov 2007 13:41:54 +0100 Patrick McHardy [EMAIL PROTECTED] wrote: +#define XT_TABLE_INFO_SZ (offsetof(struct xt_table_info, entries) \ + + nr_cpu_ids * sizeof(char *)) /* overflow check */ - if (tmp.size = (INT_MAX - sizeof(struct xt_table_info)) / NR_CPUS - - SMP_CACHE_BYTES) + if (tmp.size = INT_MAX / num_possible_cpus()) return -ENOMEM; We need to make sure offsetof(struct xt_table_info, entries) + nr_cpu_ids * sizeof(char *) doesn't overflow, so why doesn't it use nr_cpu_ids here as well? nr_cpu_ids is = NR_CPUS, so XT_TABLE_INFO_SZ cannot overflow Yes, but nr_cpu_ids is = num_possible_cpus, which is what we're using with your patch. The 'overflow check' we do here is in fact not very usefull now that we dont need to multiply tmp.size by NR_CPUS and potentially overflow the result. We can delete the test, because kmalloc()/vmalloc() will probably fail gracefully if we ask too much memory. You're right, I'll remove it. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Re : Re : Bug in using inet_lookup ()
On Thu, Nov 15, 2007 at 05:29:52PM +0100, Nj A ([EMAIL PROTECTED]) wrote: Hello all, No bugs are due to the inet_lookup call now using the following: if ((s_skb = alloc_skb (MAX_TCP_HEADER + 15, GFP_ATOMIC)) == NULL) { printk (%s: Unable to allocate memory \n, __FUNCTION__); err = -ENOMEM; } dev = s_skb-dev; if (!dev) printk (%s: no device attached to s_skb\n, __FUNCTION__); goto process_dev; sk = inet_lookup (tcp_hashinfo, src, p_src, dst, p_dst, inet_iif (s_skb)); bh_lock_sock (sk); process_dev: spin_lock (tmp_lock); new_dev = list_entry (tmp, struct net_device, todo_list); spin_unlock (tmp_lock); if (!new_dev) printk (%s: no device attached to new_dev \n, __FUNCTION__); s_skb-dev = new_dev; ... bh_unlock_sock (sk); ... However, I am not having the right results. I checked with an established socket and expected to see that the socket is established (which is the case) but got the wrong state when testing on (sk-sk_state) and the socket seems in the TIME_WAIT / CLOSE state. May be I am corrupting the search by manually attaching a device to the skb? Any idea please? Well, your code will oops just like before - you provide empty skb to the inet_iif(), which is wrong. Actually you will not even reach that point, since your code will exit after skb-dev check. Try simple inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0). It does work. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Re : Re : Bug in using inet_lookup ()
Well, your code will oops just like before - you provide empty skb to the inet_iif(), which is wrong. Actually you will not even reach that point, since your code will exit after skb-dev check. Try simple inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0). But trying inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0), the machine either hangs or panics. Is there any clean manner to come across this issue? Cheers, _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote: I don't see any reason that we couldn't have a tool accessible to Ubuntu users that does a real git bisect. Git is really good at being scripted by fancy GUIs. It should be easy enough to have a drop down with all of the Ubuntu kernel package releases, where the user selects what works and what doesn't. It's possible users who haven't yet downloaded a git repository have to surmount some obstacles that might cause them to lose interest. First, they have to download some 190 megs of git repository, and if they have a slow link, that can take a while, and then they have to build each kernel, which can take a while. A full kernel build with everything selected can take good 30 minutes or more, and that's on a fast dual-core machine with 4gigs of memory and 7200rpm disk drives. On a slower, memory limited laptop, doing a single kernel build can take more time than the user has patiences; multiply that by 7 or 8 build and test boots, and it starts to get tiresome. And then on top of that there are the issues about whether there is enough support for dealing with hitting kernel revisions that fail due to other bugs getting merged in during the -rc1 process, etc. I agree that a tool that automated the bisection process and walked the user through it would be helpful, but I believe it would be possible for us do better. - Ted - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re : Re : Re : Re : Bug in using inet_lookup ()
On Thu, Nov 15, 2007 at 04:57:17PM +, Nj A ([EMAIL PROTECTED]) wrote: Well, your code will oops just like before - you provide empty skb to the inet_iif(), which is wrong. Actually you will not even reach that point, since your code will exit after skb-dev check. Try simple inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0). But trying inet_lookup(tcp_hashinfo, src, p_src, dst, p_dst, 0), the machine either hangs or panics. Hmmm, it does not. Please show at least one bug trace when inet_lookup(tcp_hashinfo, 0, 0, 0, 0, 0) fails :) Is there any clean manner to come across this issue? Yes, to show the code you are using. Sorry, all mind readers are on vacations. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Thu, 15 Nov 2007, Theodore Tso wrote: On Wed, Nov 14, 2007 at 06:23:34PM -0500, Daniel Barkalow wrote: I don't see any reason that we couldn't have a tool accessible to Ubuntu users that does a real git bisect. Git is really good at being scripted by fancy GUIs. It should be easy enough to have a drop down with all of the Ubuntu kernel package releases, where the user selects what works and what doesn't. It's possible users who haven't yet downloaded a git repository have to surmount some obstacles that might cause them to lose interest. First, they have to download some 190 megs of git repository, and if they have a slow link, that can take a while, and then they have to build each kernel, which can take a while. It should be possible for it to clone only the portion that they actually care about based on where the known-good version is. It should also (in theory, anyway) be possible to put off some amount of the download until it's actually going to be relevant. A full kernel build with everything selected can take good 30 minutes or more, and that's on a fast dual-core machine with 4gigs of memory and 7200rpm disk drives. On a slower, memory limited laptop, doing a single kernel build can take more time than the user has patiences; multiply that by 7 or 8 build and test boots, and it starts to get tiresome. None of this is going to take as long, even on a slow link and a slow computer, as waiting for a response to a mailing list post. It'd annoy users who are specifically waiting for it, but if the interface is that the user says kernel package X didn't work but the current kernel does, and it says I'll let you know when I've got something to test, and the user watches a DVD, and afterward finds a message saying there's something to test, and tries it, and reports how it went, and the process repeats until it narrows it down to a single commit after a couple of days of the user getting occasional responses, it's not that different from asking for help online. And then on top of that there are the issues about whether there is enough support for dealing with hitting kernel revisions that fail due to other bugs getting merged in during the -rc1 process, etc. Could have a distro-provided mask of things that aren't worth testing and possibly back-ported fixes for revisions in particular ranges. I agree that a tool that automated the bisection process and walked the user through it would be helpful, but I believe it would be possible for us do better. That would probably help for giving the user something to try right away. I still think that the main cost to the user is the number of times that the user has to stop doing stuff to reboot with a kernel to test, whether the test kernels are available quickly from the distro site, slowly built locally, or slowly as suggested by humans helping online. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re : Re : Re : Bug in using inet_lookup ()
Hello all, No bugs are due to the inet_lookup call now using the following: if ((s_skb = alloc_skb (MAX_TCP_HEADER + 15, GFP_ATOMIC)) == NULL) { printk (%s: Unable to allocate memory \n, __FUNCTION__); err = -ENOMEM; } dev = s_skb-dev; if (!dev) printk (%s: no device attached to s_skb\n, __FUNCTION__); goto process_dev; sk = inet_lookup (tcp_hashinfo, src, p_src, dst, p_dst, inet_iif (s_skb)); bh_lock_sock (sk); process_dev: spin_lock (tmp_lock); new_dev = list_entry (tmp, struct net_device, todo_list); spin_unlock (tmp_lock); if (!new_dev) printk (%s: no device attached to new_dev \n, __FUNCTION__); s_skb-dev = new_dev; ... bh_unlock_sock (sk); ... However, I am not having the right results. I checked with an established socket and expected to see that the socket is established (which is the case) but got the wrong state when testing on (sk-sk_state) and the socket seems in the TIME_WAIT / CLOSE state. May be I am corrupting the search by manually attaching a device to the skb? Any idea please? Cheers, - Message d'origine De : Evgeniy Polyakov [EMAIL PROTECTED] À : Nj A [EMAIL PROTECTED] Cc : netdev@vger.kernel.org Envoyé le : Jeudi, 15 Novembre 2007, 11h12mn 28s Objet : Re: Re : Re : Bug in using inet_lookup () On Wed, Nov 14, 2007 at 04:47:22PM +, Nj A ([EMAIL PROTECTED]) wrote: By setting the ID of the ingress device to the inet_lookup() to 0, the machine reboots automatically. Setting proc/sys/kernel/panic* to non zero values dosn't help more.. Sorry, I did not understand? You mean after you provide zero to inet_lookup() instead of device id it strted to reboot? -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html _ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[MACVLAN 02/02]: Allow setting mac address while device is up
[MACVLAN]: Allow setting mac address while device is up Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 3c50588260810d735231220f9a8ebaa6a6e8fb1e tree 48ee2625502caf2454263a05b5a9869648de3aed parent e2d06a34b52a999e8c539d1cdef51ff523e2f2c2 author Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:38:06 +0100 committer Patrick McHardy [EMAIL PROTECTED] Thu, 15 Nov 2007 16:38:06 +0100 drivers/net/macvlan.c | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 461149c..3acf8cd 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -213,6 +213,29 @@ static int macvlan_stop(struct net_device *dev) return 0; } +static int macvlan_set_mac_address(struct net_device *dev, void *p) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + struct net_device *lowerdev = vlan-lowerdev; + struct sockaddr *addr = p; + int err; + + if (!is_valid_ether_addr(addr-sa_data)) + return -EADDRNOTAVAIL; + + if (!(dev-flags IFF_UP)) + goto out; + + err = dev_unicast_add(lowerdev, addr-sa_data, ETH_ALEN); + if (err 0) + return err; + dev_unicast_delete(lowerdev, dev-dev_addr, ETH_ALEN); + +out: + memcpy(dev-dev_addr, addr-sa_data, ETH_ALEN); + return 0; +} + static void macvlan_change_rx_flags(struct net_device *dev, int change) { struct macvlan_dev *vlan = netdev_priv(dev); @@ -300,6 +323,7 @@ static void macvlan_setup(struct net_device *dev) dev-stop = macvlan_stop; dev-change_mtu = macvlan_change_mtu; dev-change_rx_flags= macvlan_change_rx_flags; + dev-set_mac_address= macvlan_set_mac_address; dev-set_multicast_list = macvlan_set_multicast_list; dev-hard_start_xmit= macvlan_hard_start_xmit; dev-destructor = free_netdev; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)
Yoshifuji, See below for follow-up: -Original Message- From: YOSHIFUJI Hideaki / 吉藤英明 [mailto:[EMAIL PROTECTED] Sent: Thursday, November 15, 2007 3:22 AM To: Templin, Fred L Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2) In article [EMAIL PROTECTED] eing.com (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, Fred L [EMAIL PROTECTED] says: --- linux-2.6.24-rc2/net/ipv6/addrconf.c.orig 2007-11-08 11:59:35.0 -0800 +++ linux-2.6.24-rc2/net/ipv6/addrconf.c2007-11-14 22:17:28.0 -0800 @@ -1424,6 +1424,21 @@ static int addrconf_ifid_infiniband(u8 * return 0; } +static int addrconf_ifid_isatap(u8 *eui, __be32 addr) +{ + + eui[0] = 0x02; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE; + memcpy (eui+4, addr, 4); + + if (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) || + LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) || + ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) || + MULTICAST(addr) || BADCLASS(addr)) + eui[0] = ~0x02; + + return 0; +} + static int ipv6_generate_eui64(u8 *eui, struct net_device *dev) { switch (dev-type) { { eui[0] = (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) || LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) || ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) || MULTICAST(addr) || BADCLASS(addr)) ? 0 : 2; eui[1] = 0; eui[2] = 0x5E; eui[3] = 0xFE; memcpy (eui+4, addr, 4); } OK; I'll make this change. @@ -2167,7 +2185,8 @@ static void addrconf_dev_config(struct n (dev-type != ARPHRD_FDDI) (dev-type != ARPHRD_IEEE802_TR) (dev-type != ARPHRD_ARCNET) - (dev-type != ARPHRD_INFINIBAND)) { + (dev-type != ARPHRD_INFINIBAND) + !(dev-priv_flags IFF_ISATAP)) { /* Alas, we support only Ethernet autoconfiguration. */ return; } Because priv_flags are local to device type, you need to check dev-type: (dev-type == ARPHRD_SIT !(dev-priv_flags IFF_ISATAP)) or something like this. OK. + struct ip_tunnel *t = netdev_priv(ifp-idev-dev); + if (t-parms.i_key != INADDR_NONE) { + spin_lock(ifp-lock); I guess INADDR_ANY. No; INADDR_NONE is correct. Non-zero router value is the way 'ip' tells the kernel that the interface is ISATAP. INADDR_NONE means ISATAP, but no router. The ISATAP router will never be INADDR_ANY. Thanks - Fred [EMAIL PROTECTED] --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)
Yoshifuji, -Original Message- From: YOSHIFUJI Hideaki / 吉藤英明 [mailto:[EMAIL PROTECTED] Sent: Thursday, November 15, 2007 3:48 AM To: Templin, Fred L Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2) In article [EMAIL PROTECTED] eing.com (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, Fred L [EMAIL PROTECTED] says: From: Fred L. Templin [EMAIL PROTECTED] This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the iproute2 utility. The following diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This message includes the full and patchable diff text; please use this version to apply patches. Signed-off-by: Fred L. Templin [EMAIL PROTECTED] BTW, how will we handle DNS name (and TTL) and/or multiple PRL entries in RFC4214? I'm doubting if we really need to handle PRL refresh in kernel. DNS name and PRL refresh are done in a daemon that either exec's 'ip' or issues the device ioctl's directly. When there are multiple default router IPv4 addresses, the daemon picks one as the primary and writes it to the kernel. It can then change to a different primary later if it wants to. Also possible is something like VRRP to allow several routers for fault tolerance even though there is only a single default router address. Thanks - Fred [EMAIL PROTECTED] --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] move unneeded data to initdata section
Sam Ravnborg [EMAIL PROTECTED] writes: On Thu, Nov 15, 2007 at 05:42:04PM +0300, Denis V. Lunev wrote: nothing is discarded after module load. Though, I can be wrong. Could you point me to the exact place? If __initdata is not discarded after module load then we should do it. There is no reason to waste __initdata RAM when the module is loaded. Down at the bottom of sys_init_module we have: /* Drop initial reference. */ module_put(mod); unwind_remove_table(mod-unwind_info, 1); module_free(mod, mod-module_init); ^ mod-module_init = NULL; mod-init_size = 0; mod-init_text_size = 0; mutex_unlock(module_mutex); return 0; Which frees the memory for the .init sections. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] add bnx2x driver for BCM57710 - bnx2x_init.h
posting individual files for comments. --- /* bnx2x_init.h: Broadcom Everest network driver. * * Copyright (c) 2007 Broadcom Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation. * * Written by: Eliezer Tamir [EMAIL PROTECTED] */ #ifndef BNX2X_INIT_H #define BNX2X_INIT_H #define COMMON 0x1 #define PORT0 0x2 #define PORT1 0x4 #define INIT_EMULATION 0x1 #define INIT_FPGA 0x2 #define INIT_ASIC 0x4 #define INIT_HARDWARE 0x7 #define STORM_INTMEM_SIZE (0x5800 / 4) #define TSTORM_INTMEM_ADDR 0x1a #define CSTORM_INTMEM_ADDR 0x22 #define XSTORM_INTMEM_ADDR 0x2a #define USTORM_INTMEM_ADDR 0x32 /* Init operation types and structures */ #define OP_RD 0x1 /* read single register */ #define OP_WR 0x2 /* write single register */ #define OP_IW 0x3 /* write single register using mailbox */ #define OP_SW 0x4 /* copy a string to the device */ #define OP_SI 0x5 /* copy a string using mailbox */ #define OP_ZR 0x6 /* clear memory */ #define OP_ZP 0x7 /* unzip then copy with DMAE */ #define OP_WB 0x8 /* copy a string using DMAE */ struct raw_op { u32 op :8; u32 offset :24; u32 raw_data; }; struct op_read { u32 op :8; u32 offset :24; u32 pad; }; struct op_write { u32 op :8; u32 offset :24; u32 val; }; struct op_string_write { u32 op :8; u32 offset :24; #ifdef __LITTLE_ENDIAN u16 data_off; u16 data_len; #else /* __BIG_ENDIAN */ u16 data_len; u16 data_off; #endif }; struct op_zero { u32 op :8; u32 offset :24; u32 len; }; union init_op { struct op_read read; struct op_write write; struct op_string_write str_wr; struct op_zero zero; struct raw_op raw; }; #include bnx2x_init_values.h static void bnx2x_reg_wr_ind(struct bnx2x *bp, u32 addr, u32 val); static void bnx2x_write_dmae(struct bnx2x *bp, dma_addr_t dma_addr, u32 dst_addr, u32 len32); static int bnx2x_gunzip(struct bnx2x *bp, u8 *zbuf, int len); static void bnx2x_init_str_wr(struct bnx2x *bp, u32 addr, const u32 *data, u32 len) { int i; for (i = 0; i len; i++) { REG_WR(bp, addr + i*4, data[i]); if (!(i % 1)) { touch_softlockup_watchdog(); cpu_relax(); } } } #define INIT_MEM_WR(reg, data, reg_off, len) \ bnx2x_init_str_wr(bp, reg + reg_off*4, data, len) static void bnx2x_init_ind_wr(struct bnx2x *bp, u32 addr, const u32 *data, u16 len) { int i; for (i = 0; i len; i++) { REG_WR_IND(bp, addr + i*4, data[i]); if (!(i % 1)) { touch_softlockup_watchdog(); cpu_relax(); } } } static void bnx2x_init_wr_wb(struct bnx2x *bp, u32 addr, const u32 *data, u32 len, int gunzip) { int offset = 0; if (gunzip) { int rc; #ifdef __BIG_ENDIAN int i, size; u32 *temp; temp = kmalloc(len, GFP_KERNEL); size = (len / 4) + ((len % 4) ? 1 : 0); for (i = 0; i size; i++) temp[i] = swab32(data[i]); data = temp; #endif rc = bnx2x_gunzip(bp, (u8 *)data, len); if (rc) { DP(NETIF_MSG_HW, gunzip failed ! rc %d\n, rc); return; } len = bp-gunzip_outlen; #ifdef __BIG_ENDIAN kfree(temp); for (i = 0; i len; i++) ((u32 *)bp-gunzip_buf)[i] = swab32(((u32 *)bp-gunzip_buf)[i]); #endif } else { if ((len * 4) FW_BUF_SIZE) { BNX2X_ERR(LARGE DMAE OPERATION ! len 0x%x\n, len*4); return; } memcpy(bp-gunzip_buf, data, len * 4); } while (len DMAE_LEN32_MAX) { bnx2x_write_dmae(bp, bp-gunzip_mapping + offset, addr + offset, DMAE_LEN32_MAX); offset += DMAE_LEN32_MAX * 4; len -=
Re: tg3: strange errors and non-working-ness
On Thu, 2007-11-15 at 13:17 -0600, Jon Nelson wrote: Is this what you mean? I pulled this from the quoted text: Nov 10 22:45:52 frank kernel: NETDEV WATCHDOG: eth0: transmit timed out Right. This explains the reset at 22:45:52, but not the earlier reset at 22:24:40. Link never came up after that earlier reset. Is this a new problem introduced by a new driver? I notice you are using tg3 3.65. Have you used newer versions or older versions? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] move unneeded data to initdata section
On Thu, Nov 15, 2007 at 10:17:14PM +0300, Denis V. Lunev wrote: will you mind against this? diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 5dd6d90..d136707 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -119,10 +119,14 @@ static inline struct net *maybe_get_net(struct net *net) #ifdef CONFIG_NET_NS #define __net_init #define __net_exit -#define __net_initdata #else #define __net_init __init #define __net_exit __exit_refok +#endif + +#if defined(CONFIG_NET_NS) || defined(MODULE) +#define __net_initdata +#else #define __net_initdata __initdata #endif n principle I am against this approach. __initdata is far too overloaded with different stuff. A much more preferred approach should be to create new sections named for example .init.data.net and .init.data.net.module And then in include/asm-generic/vmlinux.lds.h decide the location of these sections. On top of this we would have to teach modpost about these new sections. But the advantage of this approach is that the section mismatch checks are *independent* of the module being a MODULE or build-in. The check will still happen. In this way we avoid the situation where a warning only pops up in certain configurations. To do so will obviously require a bit more linker script consolidation but if you or some else could step in a do this it would be great! Sam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3: strange errors and non-working-ness
On 11/15/07, Michael Chan [EMAIL PROTECTED] wrote: On Thu, 2007-11-15 at 10:47 +0100, Jarek Poplawski wrote: On 13-11-2007 19:57, Jon Nelson wrote: The best info I've got is this: It looks like the card is being reset periodically. Every time the card gets reset, you'll see those PM messages in the version of the driver you're using. Do you see NETDEV WATCHDOG message as well in the dmesg log? Is this what you mean? I pulled this from the quoted text: Nov 10 22:45:52 frank kernel: NETDEV WATCHDOG: eth0: transmit timed out -- Jon - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r6040 various bugfixes
Hello Stephen, Le jeudi 15 novembre 2007, Stephen Hemminger a écrit : Looks good, thanks: There is a function to make this easier: @@ -756,10 +803,8 @@ r6040_open(struct net_device *dev) if (lp-switch_sig != ICPLUS_PHY_ID) { /* set and active a timer process */ init_timer(lp-timer); - lp-timer.expires = TIMER_WUT; lp-timer.data = (unsigned long)dev; lp-timer.function = r6040_timer; - add_timer(lp-timer); Could be: setup_timer(lp-timer, r6040_timer, dev); if (lp-switch_sig != ICPLUS_PHY_ID) mod_timer(lp-timer, jiffies + HZ); I will send a fix later when I have tested your suggestion to use slightly larger buffer and skb_reserve(skb, NET_IP_ALIGN). Thank you. -- Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2)
In article [EMAIL PROTECTED] (at Thu, 15 Nov 2007 10:11:16 -0800), Templin, Fred L [EMAIL PROTECTED] says: Yoshifuji, -Original Message- From: YOSHIFUJI Hideaki / 吉藤英明 [mailto:[EMAIL PROTECTED] Sent: Thursday, November 15, 2007 3:48 AM To: Templin, Fred L Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.2) In article [EMAIL PROTECTED] eing.com (at Wed, 14 Nov 2007 22:44:17 -0800), Templin, Fred L [EMAIL PROTECTED] says: From: Fred L. Templin [EMAIL PROTECTED] This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the iproute2 utility. The following diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This message includes the full and patchable diff text; please use this version to apply patches. Signed-off-by: Fred L. Templin [EMAIL PROTECTED] BTW, how will we handle DNS name (and TTL) and/or multiple PRL entries in RFC4214? I'm doubting if we really need to handle PRL refresh in kernel. DNS name and PRL refresh are done in a daemon that either exec's 'ip' or issues the device ioctl's directly. When there are multiple default router IPv4 addresses, the daemon picks one as the primary and writes it to the kernel. It can then change to a different primary later if it wants to. Also possible is something like VRRP to allow several routers for fault tolerance even though there is only a single default router address. Why? All PRLs should be installed in kernel so that standard router selection can be used. For this, I think we should have just one isatap interface per set of PRLs provideing virtual link, especially if each of them provides the same prefix. --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] move unneeded data to initdata section
On Thu, Nov 15, 2007 at 11:19:26AM -0700, Eric W. Biederman wrote: Sam Ravnborg [EMAIL PROTECTED] writes: On Thu, Nov 15, 2007 at 05:42:04PM +0300, Denis V. Lunev wrote: nothing is discarded after module load. Though, I can be wrong. Could you point me to the exact place? If __initdata is not discarded after module load then we should do it. There is no reason to waste __initdata RAM when the module is loaded. Down at the bottom of sys_init_module we have: /* Drop initial reference. */ module_put(mod); unwind_remove_table(mod-unwind_info, 1); module_free(mod, mod-module_init); ^ mod-module_init = NULL; mod-init_size = 0; mod-init_text_size = 0; mutex_unlock(module_mutex); return 0; Which frees the memory for the .init sections. Thanks for clarifying this Eric - should have looked myself.. Sam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3: strange errors and non-working-ness
On Thu, 2007-11-15 at 10:47 +0100, Jarek Poplawski wrote: On 13-11-2007 19:57, Jon Nelson wrote: The best info I've got is this: It looks like the card is being reset periodically. Every time the card gets reset, you'll see those PM messages in the version of the driver you're using. Do you see NETDEV WATCHDOG message as well in the dmesg log? Nov 10 22:21:19 frank kernel: tg3.c:v3.65 (August 07, 2006) Nov 10 22:21:19 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] - Link [LNKB] - GSI 3 (level, low) - IRQ 3 Nov 10 22:21:19 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:09:5b:09:b1:69 Nov 10 22:21:19 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0] Nov 10 22:21:19 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:19 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and on for RX. Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: ACPI: PCI interrupt for device :00:0b.0 disabled Nov 10 22:21:20 frank kernel: PCI: Enabling device :00:0b.0 (0100 - 0102) Nov 10 22:21:20 frank kernel: ACPI: PCI Interrupt :00:0b.0[A] - Link [LNKB] - GSI 3 (level, low) - IRQ 3 Nov 10 22:21:20 frank kernel: eth0: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:09:5b:09:b1:69 Nov 10 22:21:20 frank kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0] Nov 10 22:21:20 frank kernel: eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Nov 10 22:21:20 frank kernel: tg3: eth0: Flow control is on for TX and on for RX. Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 0 (was 164514e4, writing 3ea173b) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset b (was 164514e4, writing 302a1385) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 3 (was 0, writing 4008) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 2 (was 200, writing 215) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device :00:0b.0 at offset 1 (was 2b0, writing 2b00106) Nov 10 22:21:20 frank kernel: PM: Writing back config space on device
Re: [PATCH 2.6.25] add bnx2x driver for BCM57710 - bnx2x_fw_defs.h
posting individual files for comments. --- /* bnx2x_fw_defs.h: Broadcom Everest network driver. * * Copyright (c) 2007 Broadcom Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation. */ #define CSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\ (0x1922 + (port * 0x40) + (index * 0x4)) #define CSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\ (0x1900 + (port * 0x40)) #define CSTORM_HC_BTR_OFFSET(port)\ (0x1984 + (port * 0xc0)) #define CSTORM_SB_HC_DISABLE_OFFSET(port, cpu_id, index)\ (0x141a + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4)) #define CSTORM_SB_HC_TIMEOUT_OFFSET(port, cpu_id, index)\ (0x1418 + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4)) #define CSTORM_SB_HOST_SB_ADDR_OFFSET(port, cpu_id)\ (0x1400 + (port * 0x280) + (cpu_id * 0x28)) #define CSTORM_STATS_FLAGS_OFFSET(port) (0x5108 + (port * 0x8)) #define TSTORM_CLIENT_CONFIG_OFFSET(port, client_id)\ (0x1510 + (port * 0x240) + (client_id * 0x20)) #define TSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\ (0x138a + (port * 0x28) + (index * 0x4)) #define TSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\ (0x1370 + (port * 0x28)) #define TSTORM_ETH_STATS_QUERY_ADDR_OFFSET(port)\ (0x4b70 + (port * 0x8)) #define TSTORM_FUNCTION_COMMON_CONFIG_OFFSET(function)\ (0x1418 + (function * 0x30)) #define TSTORM_HC_BTR_OFFSET(port)\ (0x13c4 + (port * 0x18)) #define TSTORM_INDIRECTION_TABLE_OFFSET(port)\ (0x22c8 + (port * 0x80)) #define TSTORM_INDIRECTION_TABLE_SIZE 0x80 #define TSTORM_MAC_FILTER_CONFIG_OFFSET(port)\ (0x1420 + (port * 0x30)) #define TSTORM_RCQ_PROD_OFFSET(port, client_id)\ (0x1508 + (port * 0x240) + (client_id * 0x20)) #define TSTORM_STATS_FLAGS_OFFSET(port) (0x4b90 + (port * 0x8)) #define USTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\ (0x191a + (port * 0x28) + (index * 0x4)) #define USTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\ (0x1900 + (port * 0x28)) #define USTORM_HC_BTR_OFFSET(port)\ (0x1954 + (port * 0xb8)) #define USTORM_MEM_WORKAROUND_ADDRESS_OFFSET(port)\ (0x5408 + (port * 0x8)) #define USTORM_SB_HC_DISABLE_OFFSET(port, cpu_id, index)\ (0x141a + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4)) #define USTORM_SB_HC_TIMEOUT_OFFSET(port, cpu_id, index)\ (0x1418 + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4)) #define USTORM_SB_HOST_SB_ADDR_OFFSET(port, cpu_id)\ (0x1400 + (port * 0x280) + (cpu_id * 0x28)) #define XSTORM_ASSERT_LIST_INDEX_OFFSET 0x1000 #define XSTORM_ASSERT_LIST_OFFSET(idx) (0x1020 + (idx * 0x10)) #define XSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\ (0x141a + (port * 0x28) + (index * 0x4)) #define XSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\ (0x1400 + (port * 0x28)) #define XSTORM_ETH_STATS_QUERY_ADDR_OFFSET(port)\ (0x5408 + (port * 0x8)) #define XSTORM_HC_BTR_OFFSET(port)\ (0x1454 + (port * 0x18)) #define XSTORM_SPQ_PAGE_BASE_OFFSET(port)\ (0x5328 + (port * 0x18)) #define XSTORM_SPQ_PROD_OFFSET(port)\ (0x5330 + (port * 0x18)) #define XSTORM_STATS_FLAGS_OFFSET(port) (0x53f8 + (port * 0x8)) #define COMMON_ASM_INVALID_ASSERT_OPCODE 0x0 /** * This file defines HSI constatnts for the ETH flow */ /* hash types */ #define DEFAULT_HASH_TYPE 0 #define IPV4_HASH_TYPE 1 #define TCP_IPV4_HASH_TYPE 2 #define IPV6_HASH_TYPE 3 #define TCP_IPV6_HASH_TYPE 4 /* values of command IDs in the ramrod message */ #define RAMROD_CMD_ID_ETH_PORT_SETUP(80) #define RAMROD_CMD_ID_ETH_CLIENT_SETUP (85) #define RAMROD_CMD_ID_ETH_STAT_QUERY(90) #define RAMROD_CMD_ID_ETH_UPDATE(100) #define RAMROD_CMD_ID_ETH_HALT (105) #define RAMROD_CMD_ID_ETH_SET_MAC (110) #define RAMROD_CMD_ID_ETH_CFC_DEL (115) #define RAMROD_CMD_ID_ETH_PORT_DEL (120) #define RAMROD_CMD_ID_ETH_FORWARD_SETUP (125) /* command values for set mac command */ #define T_ETH_MAC_COMMAND_SET 0 #define T_ETH_MAC_COMMAND_INVALIDATE1 #define T_ETH_INDIRECTION_TABLE_SIZE128 /* Maximal L2 clients supported */ #define ETH_MAX_RX_CLIENTS (18) /** * This file defines HSI constatnts common to all microcode flows */ /* Connection types */ #define ETH_CONNECTION_TYPE 0 #define PROTOCOL_STATE_BIT_OFFSET 6 #define ETH_STATE (ETH_CONNECTION_TYPE PROTOCOL_STATE_BIT_OFFSET) /* microcode fixed page page size 4K (chains and ring segments)
[PATCH] r6040 various bugfixes
This patch fixes various bugfixes spotted by Stephen, thanks ! - add functions to allocate/free TX and RX buffers - recover from transmit timeout and use the 4 helpers defined below - use netdev_alloc_skb instead of dev_alloc_skb - do not use a private stats structure to store statistics - break each TX/RX error to a separate line for better reading - suppress volatiles and make checkpatch happy - better control of the timer - fix spin_unlock_irq typo in netdev_get_settings - fix various typos and spelling in the driver Signed-off-by: Florian Fainelli [EMAIL PROTECTED] -- diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c index edce5a4..529c903 100644 --- a/drivers/net/r6040.c +++ b/drivers/net/r6040.c @@ -172,7 +172,6 @@ struct r6040_private { struct net_device *dev; struct mii_if_info mii_if; struct napi_struct napi; - struct net_device_stats stats; u16 napi_rx_running; void __iomem *base; }; @@ -233,18 +232,121 @@ static void mdio_write(struct net_device *dev, int mii_id, int reg, int val) phy_write(ioaddr, lp-phy_addr, reg, val); } +static void r6040_free_txbufs(struct net_device *dev) +{ + struct r6040_private *lp = netdev_priv(dev); + int i; + + for (i = 0; i TX_DCNT; i++) { + if (lp-tx_insert_ptr-skb_ptr) { + pci_unmap_single(lp-pdev, lp-tx_insert_ptr-buf, + MAX_BUF_SIZE, PCI_DMA_TODEVICE); + dev_kfree_skb(lp-tx_insert_ptr-skb_ptr); + lp-rx_insert_ptr-skb_ptr = NULL; + } + lp-tx_insert_ptr = lp-tx_insert_ptr-vndescp; + } +} + +static void r6040_free_rxbufs(struct net_device *dev) +{ + struct r6040_private *lp = netdev_priv(dev); + int i; + + for (i = 0; i RX_DCNT; i++) { + if (lp-rx_insert_ptr-skb_ptr) { + pci_unmap_single(lp-pdev, lp-rx_insert_ptr-buf, + MAX_BUF_SIZE, PCI_DMA_FROMDEVICE); + dev_kfree_skb(lp-rx_insert_ptr-skb_ptr); + lp-rx_insert_ptr-skb_ptr = NULL; + } + lp-rx_insert_ptr = lp-rx_insert_ptr-vndescp; + } +} + +static void r6040_alloc_txbufs(struct net_device *dev) +{ + struct r6040_private *lp = netdev_priv(dev); + struct r6040_descriptor *descptr; + int i; + dma_addr_t desc_dma, start_dma; + + lp-tx_free_desc = TX_DCNT; + /* Zero all descriptors */ + memset(lp-desc_pool, 0, ALLOC_DESC_SIZE); + lp-tx_insert_ptr = (struct r6040_descriptor *)lp-desc_pool; + lp-tx_remove_ptr = lp-tx_insert_ptr; + + /* Init TX descriptor */ + descptr = lp-tx_insert_ptr; + desc_dma = lp-desc_dma; + start_dma = desc_dma; + for (i = 0; i TX_DCNT; i++) { + descptr-ndesc = cpu_to_le32(desc_dma + + sizeof(struct r6040_descriptor)); + descptr-vndescp = (descptr + 1); + descptr = (descptr + 1); + desc_dma += sizeof(struct r6040_descriptor); + } + (descptr - 1)-ndesc = cpu_to_le32(start_dma); + (descptr - 1)-vndescp = lp-tx_insert_ptr; +} + +static void r6040_alloc_rxbufs(struct net_device *dev) +{ + struct r6040_private *lp = netdev_priv(dev); + struct r6040_descriptor *descptr; + int i; + dma_addr_t desc_dma, start_dma; + + lp-rx_free_desc = 0; + /* Zero all descriptors */ + memset(lp-desc_pool, 0, ALLOC_DESC_SIZE); + lp-rx_insert_ptr = (struct r6040_descriptor *)lp-tx_insert_ptr + + TX_DCNT; + lp-rx_remove_ptr = lp-rx_insert_ptr; + + /* Init RX descriptor */ + start_dma = desc_dma; + descptr = lp-rx_insert_ptr; + for (i = 0; i RX_DCNT; i++) { + descptr-ndesc = cpu_to_le32(desc_dma + + sizeof(struct r6040_descriptor)); + descptr-vndescp = (descptr + 1); + descptr = (descptr + 1); + desc_dma += sizeof(struct r6040_descriptor); + } + (descptr - 1)-ndesc = cpu_to_le32(start_dma); + (descptr - 1)-vndescp = lp-rx_insert_ptr; +} + static void r6040_tx_timeout(struct net_device *dev) { struct r6040_private *priv = netdev_priv(dev); + void __iomem *ioaddr = priv-base; + printk(KERN_WARNING %s: transmit timed out, status %4.4x, PHY status + %4.4x\n, + dev-name, ioread16(ioaddr + MIER), + mdio_read(dev, priv-mii_if.phy_id, MII_BMSR)); disable_irq(dev-irq); napi_disable(priv-napi); + spin_lock(priv-lock); - dev-stats.tx_errors++; + /* Clear all descriptors */ + r6040_free_txbufs(dev); + r6040_free_rxbufs(dev); + r6040_alloc_txbufs(dev); + r6040_alloc_rxbufs(dev); + + /* Reset MAC */ + iowrite16(MAC_RST, ioaddr + MCR1);
Re: [PATCH] r6040 various bugfixes
On Thu, 15 Nov 2007 19:37:43 +0100 Florian Fainelli [EMAIL PROTECTED] wrote: This patch fixes various bugfixes spotted by Stephen, thanks ! - add functions to allocate/free TX and RX buffers - recover from transmit timeout and use the 4 helpers defined below - use netdev_alloc_skb instead of dev_alloc_skb - do not use a private stats structure to store statistics - break each TX/RX error to a separate line for better reading - suppress volatiles and make checkpatch happy - better control of the timer - fix spin_unlock_irq typo in netdev_get_settings - fix various typos and spelling in the driver Signed-off-by: Florian Fainelli [EMAIL PROTECTED] Looks good, thanks: There is a function to make this easier: @@ -756,10 +803,8 @@ r6040_open(struct net_device *dev) if (lp-switch_sig != ICPLUS_PHY_ID) { /* set and active a timer process */ init_timer(lp-timer); - lp-timer.expires = TIMER_WUT; lp-timer.data = (unsigned long)dev; lp-timer.function = r6040_timer; - add_timer(lp-timer); Could be: setup_timer(lp-timer, r6040_timer, dev); if (lp-switch_sig != ICPLUS_PHY_ID) mod_timer(lp-timer, jiffies + HZ); -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] add bnx2x driver for BCM57710
Dave, Here is the latest version for bnx2x. Please consider applying to 2.6.25. This patch also applies cleanly to net-2.6 for anyone that would like to test it. Major changes from last post. * parts of the slowpath have been re-factored. * slowpath task now runs in work queue context, which allowed us to replace the mdelays with msleeps. ftp link ftp://[EMAIL PROTECTED]/0001-add-bnx2x-driver-for-BCM57710.patch gzipped ftp://[EMAIL PROTECTED]/0001-add-bnx2x-driver-for-BCM57710.patch.gz I will also post individual files for review as replies to this post. Thanks, Eliezer - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] via-velocity: don't oops on MTU change.
On Thu, 15 Nov 2007 09:26:00 +0100 Jarek Poplawski [EMAIL PROTECTED] wrote: On 15-11-2007 04:38, Stephen Hemminger wrote: Simple mtu change when device is down. Fix http://bugzilla.kernel.org/show_bug.cgi?id=9382. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/via-velocity.c2007-10-22 09:38:11.0 -0700 +++ b/drivers/net/via-velocity.c2007-11-14 19:34:30.0 -0800 @@ -1963,6 +1963,11 @@ static int velocity_change_mtu(struct ne return -EINVAL; } + if (!netif_running(dev)) { + dev-mtu = new_mtu; + return 0; + } + if (new_mtu != oldmtu) { spin_lock_irqsave(vptr-lock, flags); Shouldn't this latter 'if' be removed now, btw? No, it makes sense that if mtu is same, no action need be taken. Actually, it would make sense to push the same check up into the netdevice core management. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] add bnx2x driver for BCM57710 - bnx2x.h
posting individual files for comments. --- /* bnx2x.h: Broadcom Everest network driver. * * Copyright (c) 2007 Broadcom Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation. * * Written by: Eliezer Tamir [EMAIL PROTECTED] * Based on code from Michael Chan's bnx2 driver */ #ifndef BNX2X_H #define BNX2X_H /* error/debug prints */ #define DRV_MODULE_NAME bnx2x #define PFX DRV_MODULE_NAME : /* for messages that are currently off */ #define BNX2X_MSG_OFF 0 #define BNX2X_MSG_MCP 0x1 /* was: NETIF_MSG_HW */ #define BNX2X_MSG_STATS 0x2 /* was: NETIF_MSG_TIMER */ #define NETIF_MSG_NVM 0x4 /* was: NETIF_MSG_HW */ #define NETIF_MSG_DMAE 0x8 /* was: NETIF_MSG_HW */ #define DP_LEVELKERN_NOTICE /* was: KERN_DEBUG */ /* regular debug print */ #define DP(__mask, __fmt, __args...) do { \ if (bp-msglevel (__mask)) \ printk(DP_LEVEL [%s:%d(%s)] __fmt, __FUNCTION__, \ __LINE__, bp-dev?(bp-dev-name):?, ##__args); \ } while (0) /* for errors (never masked) */ #define BNX2X_ERR(__fmt, __args...) do { \ printk(KERN_ERR [%s:%d(%s)] __fmt, __FUNCTION__, \ __LINE__, bp-dev?(bp-dev-name):?, ##__args); \ } while (0) /* before we have a dev-name use dev_info() */ #define BNX2X_DEV_INFO(__fmt, __args...) do { \ if (bp-msglevel NETIF_MSG_PROBE) \ dev_info(bp-pdev-dev, __fmt, ##__args); \ } while (0) #ifdef BNX2X_STOP_ON_ERROR #define bnx2x_panic() do { \ bp-panic = 1; \ BNX2X_ERR(driver assert\n); \ bnx2x_disable_int(bp); \ bnx2x_panic_dump(bp); \ } while (0) #else #define bnx2x_panic() do { \ BNX2X_ERR(driver assert\n); \ bnx2x_panic_dump(bp); \ } while (0) #endif #define U64_LO(x) (((u64)x) 0x) #define U64_HI(x) (((u64)x) 32) #define HILO_U64(hi, lo)(((u64)hi 32) + lo) #define REG_ADDR(bp, offset)(bp-regview + offset) #define REG_RD(bp, offset) readl(REG_ADDR(bp, offset)) #define REG_RD8(bp, offset) readb(REG_ADDR(bp, offset)) #define REG_RD64(bp, offset)readq(REG_ADDR(bp, offset)) #define REG_WR(bp, offset, val) writel((u32)val, REG_ADDR(bp, offset)) #define REG_WR8(bp, offset, val)writeb((u8)val, REG_ADDR(bp, offset)) #define REG_WR16(bp, offset, val) writew((u16)val, REG_ADDR(bp, offset)) #define REG_WR32(bp, offset, val) REG_WR(bp, offset, val) #define REG_RD_IND(bp, offset) bnx2x_reg_rd_ind(bp, offset) #define REG_WR_IND(bp, offset, val) bnx2x_reg_wr_ind(bp, offset, val) #define REG_WR_DMAE(bp, offset, val, len32) \ do { \ memcpy(bnx2x_sp(bp, wb_data[0]), val, len32 * 4); \ bnx2x_write_dmae(bp, bnx2x_sp_mapping(bp, wb_data), \ offset, len32); \ } while (0) #define SHMEM_RD(bp, type) \ REG_RD(bp, bp-shmem_base + offsetof(struct shmem_region, type)) #define SHMEM_WR(bp, type, val) \ REG_WR(bp, bp-shmem_base + offsetof(struct shmem_region, type), val) #define NIG_WR(reg, val)REG_WR(bp, reg, val) #define EMAC_WR(reg, val) REG_WR(bp, emac_base + reg, val) #define BMAC_WR(reg, val) REG_WR(bp, GRCBASE_NIG + bmac_addr + reg, val) #define for_each_queue(bp, var) for (var = 0; var bp-num_queues; var++) #define for_each_nondefault_queue(bp, var) \ for (var = 1; var bp-num_queues; var++) #define is_multi(bp)(bp-num_queues 1) struct regp { u32 lo; u32 hi; }; struct bmac_stats { struct regp tx_gtpkt; struct regp tx_gtxpf; struct regp tx_gtfcs; struct regp tx_gtmca; struct regp tx_gtgca; struct regp tx_gtfrg; struct regp tx_gtovr; struct regp tx_gt64; struct regp tx_gt127; struct regp tx_gt255; /* 10 */ struct regp tx_gt511; struct regp tx_gt1023; struct regp tx_gt1518; struct regp tx_gt2047; struct regp tx_gt4095; struct regp tx_gt9216; struct regp tx_gt16383; struct regp tx_gtmax; struct regp tx_gtufl; struct regp tx_gterr; /* 20 */ struct regp tx_gtbyt; struct regp rx_gr64; struct regp rx_gr127; struct regp rx_gr255; struct regp rx_gr511; struct regp rx_gr1023; struct regp rx_gr1518; struct regp rx_gr2047; struct regp rx_gr4095; struct regp rx_gr9216; /* 30 */ struct regp rx_gr16383; struct regp rx_grmax; struct
Re: [Bugme-new] [Bug 9386] New: sis190 network driver crash
On Thu, 15 Nov 2007 07:30:53 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9386 Summary: sis190 network driver crash Product: Drivers Version: 2.5 KernelVersion: 2.6.23.1 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Network AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] I have a problem where I can lock up a number of machines by changing the link state on a sis190 Ethernet port. For example, during a data transfer such as FTP if I unplug the Ethernet cable and plug it back in, the Ethernet interface will stop responding and the machine will lock up after a minute or so. This behaviour is repeatable. I have the sis190 driver loaded as a module. I haven't found a kernel version where this doesn't happen. It happens with kernel 2.6.20.15, for example. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3: strange errors and non-working-ness
On 11/15/07, Michael Chan [EMAIL PROTECTED] wrote: On Thu, 2007-11-15 at 13:17 -0600, Jon Nelson wrote: Is this what you mean? I pulled this from the quoted text: Nov 10 22:45:52 frank kernel: NETDEV WATCHDOG: eth0: transmit timed out Right. This explains the reset at 22:45:52, but not the earlier reset at 22:24:40. Link never came up after that earlier reset. Is this a new problem introduced by a new driver? I notice you are using tg3 3.65. Have you used newer versions or older versions? This is not a new problem - these cards have done this or something like it for as long as I've had them*. They work just fine in 100 MBit mode but not in all of my machines, and in none of them at gig-e. I've tried every version of the driver since SUSE 9.1 without much luck (at least as far back as 2.6.9). I'd try a newer driver, esp. if I could make it compile on 2.6.22.12 (I prefer but do not require to stay with the stock distro kernel, modules notwithstanding). NOTE: to avoid list noise, I can make a bug out of this on bugzilla.kernel.org and we can proceed from there if that is preferred. [*] Actually, they worked OK in 2.4.something way-back-when but only for short durations at gig-e speeds. -- Jon - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tue, Nov 13, 2007 at 10:34:37PM +, Russell King wrote: On Tue, Nov 13, 2007 at 06:25:16PM +, Alan Cox wrote: Given the wide range of ARM platforms today, it is utterly idiotic to expect a single person to be able to provide responses for all ARM bugs. I for one wish I'd never *VOLUNTEERED* to be a part of the kernel bugzilla, and really *WISH* I could pull out of that function. You can. Perhaps that bugzilla needs to point to some kind of [EMAIL PROTECTED] list for the various ARM platform maintainers ? That might work - though it would be hard to get all the platform maintainers to be signed up to yet another mailing list, I'm sure sufficient would do. As long as it would just be bug reports, I'm sure that most of us could be persuaded to subscribe. Adding another list for general discussions is probably not going to be read, the current list provides more than enough to keep us busy. -- Ben Q: What's a light-year? A: One-third less calories than a regular year. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] PATCH 1/2 [SCHED 2.6.24]: Check subqueue status before calling hard_start_xmit
You could optimize this by getting HARD_TX_LOCK after the check. I assume that netif_stop_subqueue (from another CPU) would always be called by the driver xmit, and that is not possible since we hold the __LINK_STATE_QDISC_RUNNING bit. Does that sound correct? Sorry for not responding sooner; Dave hit it on the head though with his response. I agree with your changes, and I'll incorporate them in the lockless stack patches I've been working on (in the software queuing mode). Now if I could just find some time to finish them up and get them out for review... Thanks, -PJ Waskiewicz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)
Hi, I started to look at this code when I was working on a project of rewriting a dhcp-client. I wanted to make the client use arp to determine if the offered address was free or in use. Thats when I noticed that linux machines responded in this, for me, odd way. The problem is not really the target ip address in the reply, it is the fact that the target hardware address is set to the hardware address of the machines that is sending the reply. The target hardware address should be the same as the destination address in the ethernet frame. The dhcp clients I examined, and the implementation of the arpcheck that I use will compare the target hardware field of the arp-reply and match it against its own mac, to verify the reply. And this fails with the current implementation in the kernel. As for the the target ip set to 0, that is the behavior I saw in Windows and OpenBSD machines and figured it was a valid approach. The main thing is however that the target machine address in the arp reply in this case will confuse dhcp-clients trying to verify the reply. And even if your arping implementation will work with any variant, other implementation of this approach of duplicate ip detection expects a differeant behavior. Is there a reason that the target hardware address isn't the target hardware address? -Jonas 2007/11/15, Alexey Kuznetsov [EMAIL PROTECTED]: Hello! Send a correct arp reply instead of one with sender ip and sender hardware adress in target fields. I do not see anything more legal in setting target address to 0. Actually, semantics of target address in ARP reply is ambiguous. If it is a reply to some real request, it is set to address of requestor and protocol requires recipient of this arp reply to test that the address matches its own address before creating new entry triggered by unsolicited arp reply. That's all. In the case of duplicate address detection, requestor does not have any address, so that it is absolutely not essential what we use as target address. The only place, which could depend on this is the tool, which tests for duplicate address. At least, arping written by me, should work with any variant. So, please, could you explain what did force you to think that use of 0 is better? Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] accounting unit and variable
Herbert Xu wrote: On Wed, Nov 14, 2007 at 06:30:51PM -0500, Hideo AOKI wrote: +#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE) + +static inline int sk_datagram_pages(int amt) +{ +/* Cast to unsigned as an optimization, since amt is always positive. */ +return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM); +} + Thanks, this looks OK to me. Hello, Thank you for reviewing. Then, I'll send take 8 patch set later. Regards, Hideo -- Hitachi Computer Products (America) Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] UDP memory accounting and limitation (take 8)
Hello, This is the latest patch set of UDP memory accounting and limitation. I modified sk_datagram_pages() to avoid using divide instruction. In addition, I also fixed memory accounting code in udp_recvmsg(), since, in previous takes, the accounting code referred released sk_buff's truesize. The fix can be found in 3rd patch of the patch set. The patch set is for net-2.6. Please apply. Changelog take 7 - take 8: * sk_datagram_pages(): avoided using divide instruction * udp_recvmsg(): fixed referring released truesize in accounting Changelog take 6 - take 7: * renamed /proc/sys/net/ipv4/udp_rmem to /proc/sys/net/ipv4/udp_rmem_min * renamed /proc/sys/net/ipv4/udp_wmem to /proc/sys/net/ipv4/udp_wmem_min * rebased to net-2.6 Changelog take 5 - take 6: * removed minimal limit of /proc/sys/net/ipv4/udp_mem * added udp_init() for default value calculation of parameters * added /proc/sys/net/ipv4/udp_rmem and /proc/sys/net/ipv4/udp_rmem * added limitation code to ip_ufo_append_data() * improved accounting for receiving packet * fixed typos * rebased to 2.6.24-rc1 Changelog take 4 - take 5: * removing unnessesary EXPORT_SYMBOLs * adding minimal limit of /proc/sys/net/ipv4/udp_mem * bugfix of UDP limit affecting protocol other than UDP * introducing __ip_check_max_skb_pages() * using CTL_UNNUMBERED * adding udp_mem usage to Documentation/networking/ip_sysctl.txt Best regards, Hideo Aoki -- Hitachi Computer Products (America) Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] udp: memory limitation by using udp_mem
This patch introduces memory limitation for UDP. signed-off-by: Satoshi Oshima [EMAIL PROTECTED] signed-off-by: Hideo Aoki [EMAIL PROTECTED] --- Documentation/networking/ip-sysctl.txt |6 include/net/udp.h |3 ++ net/ipv4/af_inet.c |3 ++ net/ipv4/ip_output.c | 47 ++--- net/ipv4/sysctl_net_ipv4.c | 11 +++ net/ipv4/udp.c | 24 6 files changed, 91 insertions(+), 3 deletions(-) diff -pruN net-2.6-udp-p3/Documentation/networking/ip-sysctl.txt net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt --- net-2.6-udp-p3/Documentation/networking/ip-sysctl.txt 2007-11-14 10:48:49.0 -0500 +++ net-2.6-udp-p4/Documentation/networking/ip-sysctl.txt 2007-11-15 14:44:21.0 -0500 @@ -446,6 +446,12 @@ tcp_dma_copybreak - INTEGER and CONFIG_NET_DMA is enabled. Default: 4096 +UDP variables: + +udp_mem - INTEGER + Number of pages allowed for queueing by all UDP sockets. + Default is calculated at boot time from amount of available memory. + CIPSOv4 Variables: cipso_cache_enable - BOOLEAN diff -pruN net-2.6-udp-p3/include/net/udp.h net-2.6-udp-p4/include/net/udp.h --- net-2.6-udp-p3/include/net/udp.h2007-11-15 14:44:13.0 -0500 +++ net-2.6-udp-p4/include/net/udp.h2007-11-15 14:44:21.0 -0500 @@ -66,6 +66,7 @@ extern rwlock_t udp_hash_lock; extern struct proto udp_prot; extern atomic_t udp_memory_allocated; +extern int sysctl_udp_mem; struct sk_buff; @@ -175,4 +176,6 @@ extern void udp_proc_unregister(struct u extern int udp4_proc_init(void); extern void udp4_proc_exit(void); #endif + +extern void udp_init(void); #endif /* _UDP_H */ diff -pruN net-2.6-udp-p3/net/ipv4/af_inet.c net-2.6-udp-p4/net/ipv4/af_inet.c --- net-2.6-udp-p3/net/ipv4/af_inet.c 2007-11-15 14:44:18.0 -0500 +++ net-2.6-udp-p4/net/ipv4/af_inet.c 2007-11-15 14:44:21.0 -0500 @@ -1446,6 +1446,9 @@ static int __init inet_init(void) /* Setup TCP slab cache for open requests. */ tcp_init(); + /* Setup UDP memory threshold */ + udp_init(); + /* Add UDP-Lite (RFC 3828) */ udplite4_register(); diff -pruN net-2.6-udp-p3/net/ipv4/ip_output.c net-2.6-udp-p4/net/ipv4/ip_output.c --- net-2.6-udp-p3/net/ipv4/ip_output.c 2007-11-15 14:44:18.0 -0500 +++ net-2.6-udp-p4/net/ipv4/ip_output.c 2007-11-15 14:44:21.0 -0500 @@ -75,6 +75,7 @@ #include net/icmp.h #include net/checksum.h #include net/inetpeer.h +#include net/udp.h #include linux/igmp.h #include linux/netfilter_ipv4.h #include linux/netfilter_bridge.h @@ -699,6 +700,20 @@ csum_page(struct page *page, int offset, return csum; } +static inline int __ip_check_max_skb_pages(struct sock *sk, int size) +{ + switch(sk-sk_protocol) { + case IPPROTO_UDP: + if (atomic_read(sk-sk_prot-memory_allocated) + size +sk-sk_prot-sysctl_mem[0]) + return -ENOBUFS; + /* Fall through */ + default: + break; + } + return 0; +} + static inline int ip_ufo_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), @@ -707,16 +722,20 @@ static inline int ip_ufo_append_data(str { struct sk_buff *skb; int err; + int size = 0; /* There is support for UDP fragmentation offload by network * device, so create one single skb packet containing complete * udp datagram */ if ((skb = skb_peek_tail(sk-sk_write_queue)) == NULL) { - skb = sock_alloc_send_skb(sk, - hh_len + fragheaderlen + transhdrlen + 20, - (flags MSG_DONTWAIT), err); + size = hh_len + fragheaderlen + transhdrlen + 20; + err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size)); + if (err) + return err; + skb = sock_alloc_send_skb(sk, size, (flags MSG_DONTWAIT), + err); if (skb == NULL) return err; @@ -737,6 +756,10 @@ static inline int ip_ufo_append_data(str sk-sk_sndmsg_off = 0; } + err = __ip_check_max_skb_pages(sk, sk_datagram_pages(size + length - +transhdrlen)); + if (err) + goto fail; err = skb_append_datato_frags(sk,skb, getfrag, from, (length - transhdrlen)); if (!err) { @@ -752,6 +775,7 @@ static inline int ip_ufo_append_data(str /* There is not enough support do UFO , * so follow normal path */ +fail: kfree_skb(skb);
[PATCH 2/5] udp: accounting unit and variable
This patch introduces global variable for UDP memory accounting. The unit is page. signed-off-by: Satoshi Oshima [EMAIL PROTECTED] signed-off-by: Hideo Aoki [EMAIL PROTECTED] --- include/net/sock.h |8 include/net/udp.h |2 ++ net/ipv4/proc.c|3 ++- net/ipv4/udp.c |2 ++ 4 files changed, 14 insertions(+), 1 deletion(-) diff -pruN net-2.6-udp-p1/include/net/sock.h net-2.6-udp-p2/include/net/sock.h --- net-2.6-udp-p1/include/net/sock.h 2007-11-15 12:42:04.0 -0500 +++ net-2.6-udp-p2/include/net/sock.h 2007-11-15 14:44:13.0 -0500 @@ -778,6 +778,14 @@ static inline int sk_stream_wmem_schedul sk_stream_mem_schedule(sk, size, 0); } +#define SK_DATAGRAM_MEM_QUANTUM ((unsigned int)PAGE_SIZE) + +static inline int sk_datagram_pages(int amt) +{ + /* Cast to unsigned as an optimization, since amt is always positive. */ + return DIV_ROUND_UP((unsigned int)amt, SK_DATAGRAM_MEM_QUANTUM); +} + /* Used by processes to lock a socket state, so that * interrupts and bottom half handlers won't change it * from under us. It essentially blocks any incoming diff -pruN net-2.6-udp-p1/include/net/udp.h net-2.6-udp-p2/include/net/udp.h --- net-2.6-udp-p1/include/net/udp.h2007-11-14 10:49:05.0 -0500 +++ net-2.6-udp-p2/include/net/udp.h2007-11-15 14:44:13.0 -0500 @@ -65,6 +65,8 @@ extern rwlock_t udp_hash_lock; extern struct proto udp_prot; +extern atomic_t udp_memory_allocated; + struct sk_buff; /* diff -pruN net-2.6-udp-p1/net/ipv4/proc.c net-2.6-udp-p2/net/ipv4/proc.c --- net-2.6-udp-p1/net/ipv4/proc.c 2007-11-14 10:49:07.0 -0500 +++ net-2.6-udp-p2/net/ipv4/proc.c 2007-11-15 14:44:13.0 -0500 @@ -56,7 +56,8 @@ static int sockstat_seq_show(struct seq_ sock_prot_inuse(tcp_prot), atomic_read(tcp_orphan_count), tcp_death_row.tw_count, atomic_read(tcp_sockets_allocated), atomic_read(tcp_memory_allocated)); - seq_printf(seq, UDP: inuse %d\n, sock_prot_inuse(udp_prot)); + seq_printf(seq, UDP: inuse %d mem %d\n, sock_prot_inuse(udp_prot), + atomic_read(udp_memory_allocated)); seq_printf(seq, UDPLITE: inuse %d\n, sock_prot_inuse(udplite_prot)); seq_printf(seq, RAW: inuse %d\n, sock_prot_inuse(raw_prot)); seq_printf(seq, FRAG: inuse %d memory %d\n, diff -pruN net-2.6-udp-p1/net/ipv4/udp.c net-2.6-udp-p2/net/ipv4/udp.c --- net-2.6-udp-p1/net/ipv4/udp.c 2007-11-14 10:49:07.0 -0500 +++ net-2.6-udp-p2/net/ipv4/udp.c 2007-11-15 14:44:13.0 -0500 @@ -114,6 +114,8 @@ DEFINE_SNMP_STAT(struct udp_mib, udp_sta struct hlist_head udp_hash[UDP_HTABLE_SIZE]; DEFINE_RWLOCK(udp_hash_lock); +atomic_t udp_memory_allocated; + static inline int __udp_lib_lport_inuse(__u16 num, const struct hlist_head udptable[]) { -- Hitachi Computer Products (America) Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] udp: memory accounting
This patch adds UDP memory usage accounting in IPv4. signed-off-by: Satoshi Oshima [EMAIL PROTECTED] signed-off-by: Hideo Aoki [EMAIL PROTECTED] --- af_inet.c | 30 +- ip_output.c | 25 ++--- udp.c | 10 ++ 3 files changed, 61 insertions(+), 4 deletions(-) diff -pruN net-2.6-udp-p2/net/ipv4/af_inet.c net-2.6-udp-p3/net/ipv4/af_inet.c --- net-2.6-udp-p2/net/ipv4/af_inet.c 2007-11-14 10:49:06.0 -0500 +++ net-2.6-udp-p3/net/ipv4/af_inet.c 2007-11-15 14:44:18.0 -0500 @@ -126,13 +126,41 @@ extern void ip_mc_drop_socket(struct soc static struct list_head inetsw[SOCK_MAX]; static DEFINE_SPINLOCK(inetsw_lock); +/** + * __skb_queue_purge_and_sub_memory_allocated + * - empty a list and subtruct memory allocation counter + * @sk: sk + * @list: list to empty + * Delete all buffers on an sk_buff list and subtruct the + * truesize of the sk_buff for memory accounting. Each buffer + * is removed from the list and one reference dropped. This + * function does not take the list lock and the caller must + * hold the relevant locks to use it. + */ +static inline void __skb_queue_purge_and_sub_memory_allocated(struct sock *sk, + struct sk_buff_head *list) +{ + struct sk_buff *skb; + int purged_skb_size = 0; + while ((skb = __skb_dequeue(list)) != NULL) { + purged_skb_size += sk_datagram_pages(skb-truesize); + kfree_skb(skb); + } + atomic_sub(purged_skb_size, sk-sk_prot-memory_allocated); +} + /* New destruction routine */ void inet_sock_destruct(struct sock *sk) { struct inet_sock *inet = inet_sk(sk); - __skb_queue_purge(sk-sk_receive_queue); + if (sk-sk_prot-memory_allocated sk-sk_type != SOCK_STREAM) + __skb_queue_purge_and_sub_memory_allocated(sk, + sk-sk_receive_queue); + else + __skb_queue_purge(sk-sk_receive_queue); + __skb_queue_purge(sk-sk_error_queue); if (sk-sk_type == SOCK_STREAM sk-sk_state != TCP_CLOSE) { diff -pruN net-2.6-udp-p2/net/ipv4/ip_output.c net-2.6-udp-p3/net/ipv4/ip_output.c --- net-2.6-udp-p2/net/ipv4/ip_output.c 2007-11-15 14:44:11.0 -0500 +++ net-2.6-udp-p3/net/ipv4/ip_output.c 2007-11-15 14:44:18.0 -0500 @@ -743,6 +743,8 @@ static inline int ip_ufo_append_data(str /* specify the length of each IP datagram fragment*/ skb_shinfo(skb)-gso_size = mtu - fragheaderlen; skb_shinfo(skb)-gso_type = SKB_GSO_UDP; + atomic_add(sk_datagram_pages(skb-truesize), + sk-sk_prot-memory_allocated); __skb_queue_tail(sk-sk_write_queue, skb); return 0; @@ -924,6 +926,9 @@ alloc_new_skb: } if (skb == NULL) goto error; + if (sk-sk_prot-memory_allocated) + atomic_add(sk_datagram_pages(skb-truesize), + sk-sk_prot-memory_allocated); /* * Fill in the control structures @@ -1023,6 +1028,8 @@ alloc_new_skb: frag = skb_shinfo(skb)-frags[i]; skb-truesize += PAGE_SIZE; atomic_add(PAGE_SIZE, sk-sk_wmem_alloc); + if (sk-sk_prot-memory_allocated) + atomic_inc(sk-sk_prot-memory_allocated); } else { err = -EMSGSIZE; goto error; @@ -1123,7 +1130,9 @@ ssize_t ip_append_page(struct sock *sk, if (unlikely(!skb)) { err = -ENOBUFS; goto error; - } + } else if (sk-sk_prot-memory_allocated) + atomic_add(sk_datagram_pages(skb-truesize), + sk-sk_prot-memory_allocated); /* * Fill in the control structures @@ -1213,13 +1222,14 @@ int ip_push_pending_frames(struct sock * struct iphdr *iph; __be16 df = 0; __u8 ttl; - int err = 0; + int err = 0, send_page_size; if ((skb = __skb_dequeue(sk-sk_write_queue)) == NULL) goto out; tail_skb = (skb_shinfo(skb)-frag_list); /* move skb-data to ip header from ext header */ + send_page_size = sk_datagram_pages(skb-truesize); if (skb-data skb_network_header(skb)) __skb_pull(skb, skb_network_offset(skb)); while ((tmp_skb = __skb_dequeue(sk-sk_write_queue)) != NULL) { @@ -1229,6 +1239,7 @@ int
[PATCH 1/5] udp: fix send buffer check
This patch introduces sndbuf size check before memory allocation for send buffer. signed-off-by: Satoshi Oshima [EMAIL PROTECTED] signed-off-by: Hideo Aoki [EMAIL PROTECTED] --- ip_output.c |5 + 1 file changed, 5 insertions(+) diff -pruN net-2.6/net/ipv4/ip_output.c net-2.6-udp-p1/net/ipv4/ip_output.c --- net-2.6/net/ipv4/ip_output.c2007-11-14 10:49:06.0 -0500 +++ net-2.6-udp-p1/net/ipv4/ip_output.c 2007-11-15 14:44:11.0 -0500 @@ -1004,6 +1004,11 @@ alloc_new_skb: frag = skb_shinfo(skb)-frags[i]; } } else if (i MAX_SKB_FRAGS) { + if (atomic_read(sk-sk_wmem_alloc) + PAGE_SIZE +2 * sk-sk_sndbuf) { + err = -ENOBUFS; + goto error; + } if (copy PAGE_SIZE) copy = PAGE_SIZE; page = alloc_pages(sk-sk_allocation, 0); -- Hitachi Computer Products (America) Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html