Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
David S. Miller a écrit : Eric, how important do you honestly think the per-hashchain spinlocks are? That's the big barrier from making rt_secret_rebuild() a simple rehash instead of flushing the whole table as it does now. No problem for me in going to a single spinlock. I did the hashed spinlock patch in order to reduce the size of the route hash table and not hurting big NUMA machines. If you think a single spinlock is OK, that's even better ! The lock is only grabbed for updates, and the access to these locks is random and as such probably non-local when taken anyways. Back before we used RCU for reads, this array-of-spinlock thing made a lot more sense. I mean something like this patch: +static DEFINE_SPINLOCK(rt_hash_lock); Just one point : This should be cache_line aligned, and use one full cache line to avoid false sharing at least. (If a cpu takes the lock, no need to invalidate *rt_hash_table for all other cpus) Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
From: Eric Dumazet [EMAIL PROTECTED] Date: Sat, 07 Jan 2006 08:53:52 +0100 I have no problem with this, since the biggest server I have is 4 way, but are you sure big machines wont suffer from this single spinlock ? It is the main question. Also I dont understand what you want to do after this single spinlock patch. How is it supposed to help the 'ip route flush cache' problem ? In my case, I have about 600.000 dst-entries : I don't claim to have a solution to this problem currently. Doing RCU and going through the whole DST GC machinery is overkill for an active system. So, perhaps a very simple solution will do: 1) On rt_run_flush(), do not rt_free(), instead collect all active routing cache entries onto a global list, begin a timer to fire in 10 seconds (or some sysctl configurable amount). 2) When a new routing cache entry is needed, check the global list appended to in #1 above first, failing that do dst_alloc() as is done currently. 3) If timer expires, rt_free() any entries in the global list. The missing trick is how to ensure RCU semantics when reallocating from the global list. The idea is that an active system will immediately repopulate itself with all of these entries just flushed from the table. RCU really doesn't handle this kind of problem very well. It truly excels when work is generated by process context work, not interrupt work. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
BTW, why does csum_tcpudp_nofold() have such a prototype? It takes IP addresses as unsigned long and proto as unsigned short; the former is bloody odd on 64bit boxen and the latter is bloody odd, period. The value we are interested in is 8bit; all callers pass either an explicit constant smaller than 256 or iphdr -protocol (__u8). Moreover, in any arithmetics both __u8 and __u16 will be promoted to the same thing, so even arguments about avoiding casts in the function body do not apply... Even funnier, prototype depends on target; amd64 csum_tcpudp_nofold() takes unsigned saddr, unsigned daddr; alpha has both unsigned long... At the same time, amd64 has static inline unsigned short int csum_tcpudp_magic(unsigned long saddr, unsigned long daddr, unsigned short len, unsigned short proto, unsigned int sum) { return csum_fold(csum_tcpudp_nofold(saddr,daddr,len,proto,sum)); } so we pass 32bit value to that puppy as 64bit argument, only to cut it back to 32bit when we pass it to csum_tcpudp_nofold()... That's one hell of a hot path, so I'd rather avoid messing with it without a very good idea of what's going on. One guaranteed-to-be-safe way of annotating it is #define csum_tcp_magic(saddr, daddr, len, proto, sum) \ csum_tcp_magic((__force u32)(__be32)saddr, (__force u32)(__be32)daddr,\ len, proto, sum) which would verify that arguments have the right types and have those casts leave the generated code as-is - actual arguments _are_ unsigned int from C point of view, so those casts will leave arguments as-is. Another issue is more subtle; what should be returned by csum_fold() and friends? Answers are split between unsigned short and unsigned int; that wouldn't matter, but if we make it __be16, we'll immediately run into major mess with sparse. The thing is, for smaller-than-int bitwise types (e.g. __be16) ~ is a prohibited operation. The reason is simple: integer promotion happens before ~, so we are guaranteed that value of ~x will _not_ be within range of our type. I can try to teach sparse about such slightly fouled __be16, so that conversion back to our type (e.g. from assignment) would go without complaints, but there are some limits to that. [following is for people familiar with sparse internals; everybody else can safely skip it] What I propose is a new node type: SYM_FOULED. It would be similar to SYM_RESTRICTED and base type always would be some smaller-than-int restricted type. Rules: * ~small_restricted = corresponding fouled * any arithmetics that would be banned for restricted = same as if we would have restricted * if t1 is restricted type and t2 - its fouled analog, then t1 t2 = t1, t1 | t2 = t2, t1 ^ t2 = t2. * conversion of t2 to t1 is silent (be it passing as argument or assignment). Anything else is banned. * x ? t1 : t2 = t2 * ~t2 = t2 (_not_ t1; something like ~(x ? y : ~y) is still fouled) * x ? t2 : t2 = t2, t2 {,|,^} t2 = t2 (yes, even ^ - same as before). * x ? t2 : constant_valid_for_t1 = t2 * !t2 = warning, ditto for comparisons involving t2 in any way. * wrt casts t2 acts exactly as t1 would. * for sizeof, typeof and alignof t2 acts as promoted t1. Note that fouled can never be an lvalue or have types derived from it - can't happen. Objections? Basically, from C POV any fouled value is int or unsigned int and we avoid generating a warning only if its upper bits will eventually be discarded. Rules above guarantee that, AFAICS. I can implement that without too much PITA, if nobody objects. Linus? [here endeth the sparse-related part] Note that some places around checksum handling *WILL* give complaints, no matter how smart sparse might become. When we stuff big-endian 16bit values or ~ of such values into u32 array and calculate checksum of that, we rely on the fact that both all-0 and all-1 16bit words _and_ any reordering of words do not affect the checksum, which is far beyond anything sparse could be expected to understand. For places like that (mostly in netfilter code) a forced cast and comment explaining what's going on and what we are relying upon would be the right thing, IMO. No need to reproduce the entire RFC1071, but reference to it would not be a bad thing either... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SiS190 on ASUS. monodirectional traffic
Hi All, first tentative to write a bug report Asus motherboard K8SMV with integrated SIS190 ethernet driver vanilla kernel 2.6.15 (but the same also for .14) on Fedora 4 The SIS190 works sending data out but hangs around 90K when receiving data in. No problem at all with the additional 8139 board I noted the same also with scp but for a clean test I used vsftp on the ASUS, ncftp-cygwin on a laptop with 100Mbit direct twisted cable and tried to move a large file in both directions out 37.68 MB at 6.32 MB/s OK receive hangs around 90K I see no relevant message on /var/log/messages and from lurking and googling I had the impression that there are no SIS190 on ASUS really working. Additional note, when the cable is not connected the driver bores me with a long list of this message kernel: eth1: PHY reset until link up any suggestion or further test to be tried is welcomed ### System info : Linux version 2.6.15 ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #7 PREEMPT Tue Jan 3 19:31:16 CET 2006 vendor_id : AuthenticAMD cpu family : 15 model : 12 model name : AMD Athlon(tm) 64 Processor 3200+ filtred /var/log/messages at boot kernel: 8139too Fast Ethernet driver 0.9.27 kernel: ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 5 kernel: PCI: setting IRQ 5 as level-triggered kernel: ACPI: PCI Interrupt :00:0a.0[A] - Link [LNKB] - GSI 5 (level, low) - IRQ 5 kernel: eth0: RealTek RTL8139 at 0x8400, 00:06:7b:03:cf:42, IRQ 5 kernel: sis190 Gigabit Ethernet driver 1.2 loaded. kernel: ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 5 kernel: ACPI: PCI Interrupt :00:04.0[A] - Link [LNKD] - GSI 5 (level, low) - IRQ 5 kernel: :00:04.0: Read MAC address from APC. kernel: :00:04.0: Realtek PHY RTL8201 transceiver at address 1. kernel: :00:04.0: Using transceiver at address 1 as default. kernel: :00:04.0: SiS 190 PCI Fast Ethernet adapter at f8806c00 (IRQ: 5), 00:13:d4:16:a8:bb kernel: eth1: GMII mode. kernel: eth1: Enabling Auto-negotiation. filtred lsmod: sis190 15876 0 8139too20160 0 filtred lspci -vvv 00:04.0 Ethernet controller: Silicon Integrated Systems [SiS]: Unknown device 0190 Subsystem: ASUSTeK Computer Inc.: Unknown device 8139 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Latency: 0 Interrupt: pin A routed to IRQ 5 Region 0: Memory at fbefbc00 (32-bit, non-prefetchable) [size=128] Region 1: I/O ports at b000 [size=128] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+, D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- filtred lspci -xxx 00:04.0 Ethernet controller: Silicon Integrated Systems [SiS]: Unknown device 0190 00: 39 10 90 01 07 00 10 02 00 00 00 02 00 00 00 00 10: 00 bc ef fb 01 b0 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 39 81 30: 00 00 00 00 40 00 00 00 00 00 00 00 05 01 00 00 40: 01 00 02 fe 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 80 f5 00 00 00 00 00 00 00 00 04 05 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Regards Marco -- marco.atzeri at fastwebnet.it La prima delle Frequently Asked Questions: dove sono le FAQ ? it.faq - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
From: Al Viro [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 08:44:42 + [ BTW Al, I noticed you subscribed to netdev, you don't need to do that if all you want to do is make a posting and be involved in that particular discussion. If you really are interested in everything else that goes on here, that's fine too :-) ] BTW, why does csum_tcpudp_nofold() have such a prototype? Historic baggage... The value we are interested in is 8bit; all callers pass either an explicit constant smaller than 256 or iphdr -protocol (__u8). Moreover, in any arithmetics both __u8 and __u16 will be promoted to the same thing, so even arguments about avoiding casts in the function body do not apply... Agreed. Even funnier, prototype depends on target; amd64 csum_tcpudp_nofold() takes unsigned saddr, unsigned daddr; alpha has both unsigned long... At the same time, amd64 has static inline unsigned short int csum_tcpudp_magic(unsigned long saddr, unsigned long daddr, unsigned short len, unsigned short proto, unsigned int sum) { return csum_fold(csum_tcpudp_nofold(saddr,daddr,len,proto,sum)); } so we pass 32bit value to that puppy as 64bit argument, only to cut it back to 32bit when we pass it to csum_tcpudp_nofold()... Arch specific typing is bad for these interfaces... But, one thing. Richard Henderson and I always talked about keeping around a 64-bit running checksum on 64-bit platforms so we didn't need to fold the values so many times while building a packet. However, this idea never materialized, so currently all platforms should bascially be using the same types. That's one hell of a hot path, so I'd rather avoid messing with it without a very good idea of what's going on. One guaranteed-to-be-safe way of annotating it is #define csum_tcp_magic(saddr, daddr, len, proto, sum) \ csum_tcp_magic((__force u32)(__be32)saddr, (__force u32)(__be32)daddr,\ len, proto, sum) which would verify that arguments have the right types and have those casts leave the generated code as-is - actual arguments _are_ unsigned int from C point of view, so those casts will leave arguments as-is. Not beautiful, but functional... The thing is, for smaller-than-int bitwise types (e.g. __be16) ~ is a prohibited operation. The reason is simple: integer promotion happens before ~, so we are guaranteed that value of ~x will _not_ be within range of our type. I can try to teach sparse about such slightly fouled __be16, so that conversion back to our type (e.g. from assignment) would go without complaints, but there are some limits to that. Why not just make some kind of negate() macro that hides away all of the typing issues? Another way to describe the operation is as a xor of X with an all-1's bitmask the same size of X. Maybe that helps describe it better? Note that some places around checksum handling *WILL* give complaints, no matter how smart sparse might become. When we stuff big-endian 16bit values or ~ of such values into u32 array and calculate checksum of that, we rely on the fact that both all-0 and all-1 16bit words _and_ any reordering of words do not affect the checksum, which is far beyond anything sparse could be expected to understand. For places like that (mostly in netfilter code) a forced cast and comment explaining what's going on and what we are relying upon would be the right thing, IMO. No need to reproduce the entire RFC1071, but reference to it would not be a bad thing either... Indeed. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
On Sat, Jan 07, 2006 at 12:59:15AM -0800, David S. Miller wrote: Why not just make some kind of negate() macro that hides away all of the typing issues? Another way to describe the operation is as a xor of X with an all-1's bitmask the same size of X. Maybe that helps describe it better? unsigned short x, y(void); ~x y(); (unsigned short)~x y(); (x ^ 0x) y(); are all equivalent, but I wouldn't trust gcc to notice that. Basically, the first one is r1 = (u32)x; r1 = ~r1; r2 = y(); r2 = 0x; r3 = r1 r2; the second r1 = (u32)x; r1 = ~r1; r1 = 0x; r2 = y(); r2 = 0x; r3 = r1 r2; and the third - r1 = (u32)x; r1 ^= 0x; r2 = y(); r2 = 0x; r3 = r1 r2; I wouldn't particulary hope that gcc gets the second variant optimized as well as the first one; I _definitely_ would not hope it does the third one as well as the rest. So this negate() might be not harmless, especially since that stuff sits on hot paths. Basically, I'd rather teach sparse to understand that at least (1) and (2) are equivalent so it could DTRT itself; that's what the sparse-related chunk had been about... BTW, is there any reason why static inline void ip_eth_mc_map(u32 addr, char *buf) { addr=ntohl(addr); buf[0]=0x01; buf[1]=0x00; buf[2]=0x5e; buf[5]=addr0xFF; addr=8; buf[4]=addr0xFF; addr=8; buf[3]=addr0x7F; } is not doing just unsigned char *p = addr; buf[0]=0x01; buf[1]=0x00; buf[2]=0x5e; buf[3]=p[1]0x7F; buf[4]=p[2]; buf[5]=p[3]; (and similar for much scarier ip_ib_mc_map() right next to it in include/net/ip.h)? I would expect that to give better code, actually... Mind you, in case of ip_ib_mc_map() I suspect that static const unsigned char prefix[16] = { 0, /* Reserved */ 0xff, /* Multicast QPN */ 0xff, 0xff, 0xff, 0x12, /* link local scope */ 0x40, /* IPv4 signature */ 0x1b }; memcpy(buf, prefix, 16); addr = htonl((128)-1); memcpy(buf + 16, addr, 4); might give even better variant, but that's a separate story... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
On Sat, Jan 07, 2006 at 02:03:42AM -0800, David S. Miller wrote: From: Al Viro [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 09:39:10 + BTW, is there any reason why static inline void ip_eth_mc_map(u32 addr, char *buf) { addr=ntohl(addr); buf[0]=0x01; buf[1]=0x00; buf[2]=0x5e; buf[5]=addr0xFF; addr=8; buf[4]=addr0xFF; addr=8; buf[3]=addr0x7F; } is not doing just unsigned char *p = addr; buf[0]=0x01; buf[1]=0x00; buf[2]=0x5e; buf[3]=p[1]0x7F; buf[4]=p[2]; buf[5]=p[3]; Because GCC can't make anything reasonable with it. And besides wouldn't that need to be: buf[3]=p[3]0x7F; buf[4]=p[2]; buf[5]=p[1]; on big-endian? :-) No. Note the lack of ntohl() in that variant. We are getting last 3 octets of address (sans one bit) into the last 3 octets of MAC, preserving the order. GCC isn't (currently) smart enough to avoid tossing addr onto the stack when you do the pointer games like that. With gcc-4.0.2 on sparc I get: mov 1, %g1 st %o0, [%sp+68] mov 94, %g2 stb %g1, [%o1] ldub[%sp+69], %g1 and %g1, 127, %g1 stb %g2, [%o1+2] stb %g1, [%o1+3] ldub[%sp+70], %g2 ldub[%sp+71], %g1 stb %g0, [%o1+1] stb %g2, [%o1+4] jmp %o7+8 stb%g1, [%o1+5] That's with -O2, 32-bit. Gaack... I've just spent a while playing with sparc-linux-gcc and apparently the only way to convince it _not_ to shit on stack is to do an equivalent of put_unaligned(). Amusingly, __builtin_memcpy(n, p, 4) does worse than store to unaligned field... Anyway, it's way too fscking ugly to be taken seriously. OK. Another question: do you have any objections against static inline void ip_eth_mc_map(__be32 addr, char *buf) { __u32 n=ntohl(addr); buf[0]=0x01; buf[1]=0x00; buf[2]=0x5e; buf[5]=n0xFF; addr=8; buf[4]=n0xFF; addr=8; buf[3]=n0x7F; } That does compile to exact same code as original - gcc fortunately has enough clue to realize that addr is never used past the initialization of n, which is, from C point of view, of exact same type as addr. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SiS190 on ASUS. monodirectional traffic
Marco Atzeri [EMAIL PROTECTED] : [...] first tentative to write a bug report You are welcome. Asus motherboard K8SMV with integrated SIS190 ethernet driver vanilla kernel 2.6.15 (but the same also for .14) on Fedora 4 The SIS190 works sending data out but hangs around 90K when receiving data in. No problem at all with the additional 8139 board I noted the same also with scp but for a clean test I used vsftp on the ASUS, ncftp-cygwin on a laptop with 100Mbit direct twisted cable and tried to move a large file in both directions out 37.68 MB at 6.32 MB/s OK receive hangs around 90K Since it implies a really moderate amount of data, can you put a complete tcpdump/tethereal log somewhere (+ send a few ping on the misbehaving interface before and after the test) ? Just to be sure: it is a soft lock. The application hangs but the system is still responsive/usable. However it is impossible to send/receive any data through the sis190 interface, even with a different application. - is the description right ? I see no relevant message on /var/log/messages and from lurking and googling I had the impression that there are no SIS190 on ASUS really working. At least for the K8SMX, my Mail/linux/support/sis190 mailbox disagrees. People have the bad habit of sending private mail _without_ Ccing netdev though. Additional note, when the cable is not connected the driver bores me with a long list of this message kernel: eth1: PHY reset until link up You can control the verbosity level of the driver with the msglvl option of 'ethtool' and/or the debug option of the module when it is modprobed. any suggestion or further test to be tried is welcomed 1 - Please send your current .config and a complete dmesg. 2 - Add 'ifconfig' and '/proc/interrupts' output after the devices are loaded and you have issued a few pings. 3 - Same thing as 2) when the transfer is hung. 4 - Don't filter lspci nor lsmod output (yep, I am a control freak). 5 - Please add the output of an 'ethtool eth1' to the mix. [...] Linux version 2.6.15 ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #7 PREEMPT Tue Jan 3 19:31:16 CET 2006 PREEMPT makes me nervous. I am ok to try and debug any issue which could be related to it but I'd prefer that the whole system is made usable with a no-PREEMPT config first (SMP is fine though). Please send the updated .config as well. If the files are big and/or you do not have a permanent web repository, you can open a PR at http://bugzilla.kernel.org and add me in the Cc: list. -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of the Union: Wireless
On Friday 06 January 2006 13:31, Johannes Berg wrote: On Fri, 2006-01-06 at 12:00 +0100, Michael Buesch wrote: * master interface as real device node * Virtual interfaces (net_devices) I didn't want to spam the netdev wiki with this (yet) so I collected some more structured things outside. Anyone feel free to edit: http://softmac.sipsolutions.net/802.11 I am confused. There is http://softmac.sipsolutions.net/softmac-snapshot.tar.bz2 at http://softmac.sipsolutions.net/SoftMAC, page also says Projects using this layer: * Broadcom 43xx driver but Broadcom driver page at ftp://ftp.berlios.de/pub/bcm43xx/snapshots/softmac/ has ftp://ftp.berlios.de/pub/bcm43xx/snapshots/softmac/ieee80211softmac-20060107.tar.bz2 which is not the same. For example, ieee80211softmac.h file exists in both tarballs but is not identical. Suppose one wants to use softmac in a project. What tarball contains the bleeding edge of softmac? I'll move that content to the netdev wiki if anyone else thinks it would be a good way forward to start with requirements, API issues and similar. Until we get there, we'll fix up softmac to make it usable for most people in basic station mode without any kind of virtual devices, which will need some slight changes to the current ieee80211. -- vda - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Additional options for resetting packet statistics
This adds some setsockopt(SOL_PACKET) options for changing the behavior when getting packet statistics from the kernel. Signed-off-by: Kris Katterjohn [EMAIL PROTECTED] This is a diff from 2.6.15 and I AM subscribed to netdev, so you dont' need to CC me anymore. I sent this to the linux-kernel mailing-list at the end of November, so I'm resending it here now. This adds PACKET_AUTO_STATISTICS, PACKET_MANUAL_STATISTICS, and PACKET_RESET_STATISTICS setsockopt() options. PACKET_AUTO_STATISTICS is the default and the kernel will zero the packet statistics when the PACKET_STATISTICS getsockopt() call is used. PACKET_MANUAL_STATISTICS changes is so that the kernel won't zero the stats unless you use PACKET_RESET_STATISTICS or call PACKET_AUTO_STATISTICS to go back to the default behavior. This way you don't have to keep track of the stats in userland if you use PACKET_MANUAL_STATISTICS/PACKET_RESET_STATISTICS. You can zero the stats with PACKET_RESET_STATISTICS even if you are in AUTO mode. Thanks! --- x/net/packet/af_packet.c2006-01-07 11:31:07.0 -0600 +++ y/net/packet/af_packet.c2006-01-07 11:28:56.0 -0600 @@ -41,6 +41,12 @@ * will simply extend the hardware address * byte arrays at the end of sockaddr_ll * and packet_mreq. + * Kris Katterjohn : Added setsockopt options: + * PACKET_AUTO_STATISTICS, + * PACKET_MANUAL_STATISTICS, and + * PACKET_RESET_STATISTICS to handle the + * zero-ing of packet stats when using + * PACKET_STATISTICS. 2005-11-29. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -189,6 +195,7 @@ struct packet_sock { /* struct sock has to be the first member of packet_sock */ struct sock sk; struct tpacket_statsstats; + int auto_reset_stats; #ifdef CONFIG_PACKET_MMAP char * *pg_vec; unsigned inthead; @@ -1020,6 +1027,7 @@ static int packet_create(struct socket * po = pkt_sk(sk); sk-sk_family = PF_PACKET; po-num = protocol; + po-auto_reset_stats = 1; sk-sk_destruct = packet_sock_destruct; atomic_inc(packet_socks_nr); @@ -1324,6 +1332,7 @@ static int packet_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen) { struct sock *sk = sock-sk; + struct packet_sock *po = pkt_sk(sk); int ret; if (level != SOL_PACKET) @@ -1352,6 +1361,21 @@ packet_setsockopt(struct socket *sock, i return ret; } #endif + + case PACKET_AUTO_STATISTICS: + po-auto_reset_stats = 1; + return 0; + + case PACKET_MANUAL_STATISTICS: + po-auto_reset_stats = 0; + return 0; + + case PACKET_RESET_STATISTICS: + spin_lock_bh(sk-sk_receive_queue.lock); + memset(po-stats, 0, sizeof po-stats); + spin_unlock_bh(sk-sk_receive_queue.lock); + return 0; + #ifdef CONFIG_PACKET_MMAP case PACKET_RX_RING: { @@ -1406,7 +1430,8 @@ static int packet_getsockopt(struct sock len = sizeof(struct tpacket_stats); spin_lock_bh(sk-sk_receive_queue.lock); st = po-stats; - memset(po-stats, 0, sizeof(st)); + if (po-auto_reset_stats) + memset(po-stats, 0, sizeof po-stats); spin_unlock_bh(sk-sk_receive_queue.lock); st.tp_packets += st.tp_drops; --- x/include/linux/if_packet.h 2006-01-02 21:21:10.0 -0600 +++ y/include/linux/if_packet.h 2006-01-07 11:43:47.0 -0600 @@ -38,7 +38,10 @@ struct sockaddr_ll /* Value 4 is still used by obsolete turbo-packet. */ #define PACKET_RX_RING 5 #define PACKET_STATISTICS 6 -#define PACKET_COPY_THRESH 7 +#define PACKET_AUTO_STATISTICS 7 +#define PACKET_MANUAL_STATISTICS 8 +#define PACKET_RESET_STATISTICS9 +#define PACKET_COPY_THRESH 10 struct tpacket_stats { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6 patch] net/ipv6/: small cleanups
This patch contains the following cleanups: - addrconf.c: make addrconf_dad_stop() static - inet6_connection_sock.c should #include net/inet6_connection_sock.h for getting the prototypes of it's global functions Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- net/ipv6/addrconf.c |2 +- net/ipv6/inet6_connection_sock.c |1 + 2 files changed, 2 insertions(+), 1 deletion(-) --- linux-2.6.15-mm2-full/net/ipv6/addrconf.c.old 2006-01-07 17:30:04.0 +0100 +++ linux-2.6.15-mm2-full/net/ipv6/addrconf.c 2006-01-07 17:30:13.0 +0100 @@ -1228,7 +1228,7 @@ /* Gets referenced address, destroys ifaddr */ -void addrconf_dad_stop(struct inet6_ifaddr *ifp) +static void addrconf_dad_stop(struct inet6_ifaddr *ifp) { if (ifp-flagsIFA_F_PERMANENT) { spin_lock_bh(ifp-lock); --- linux-2.6.15-mm2-full/net/ipv6/inet6_connection_sock.c.old 2006-01-07 17:30:42.0 +0100 +++ linux-2.6.15-mm2-full/net/ipv6/inet6_connection_sock.c 2006-01-07 17:30:57.0 +0100 @@ -25,6 +25,7 @@ #include net/inet_hashtables.h #include net/ip6_route.h #include net/sock.h +#include net/inet6_connection_sock.h int inet6_csk_bind_conflict(const struct sock *sk, const struct inet_bind_bucket *tb) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC: 2.6 patch] net/ipv4/ip_output.c: make ip_fragment() static
Since there's no longer any external user of ip_fragment() we can make it static. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- include/net/ip.h |1 - net/ipv4/ip_output.c |5 +++-- 2 files changed, 3 insertions(+), 3 deletions(-) --- linux-2.6.15-mm2-full/include/net/ip.h.old 2006-01-07 17:12:04.0 +0100 +++ linux-2.6.15-mm2-full/include/net/ip.h 2006-01-07 17:12:11.0 +0100 @@ -95,7 +95,6 @@ extern int ip_mr_input(struct sk_buff *skb); extern int ip_output(struct sk_buff *skb); extern int ip_mc_output(struct sk_buff *skb); -extern int ip_fragment(struct sk_buff *skb, int (*out)(struct sk_buff*)); extern int ip_do_nat(struct sk_buff *skb); extern voidip_send_check(struct iphdr *ip); extern int ip_queue_xmit(struct sk_buff *skb, int ipfragok); --- linux-2.6.15-mm2-full/net/ipv4/ip_output.c.old 2006-01-07 17:12:21.0 +0100 +++ linux-2.6.15-mm2-full/net/ipv4/ip_output.c 2006-01-07 17:21:33.0 +0100 @@ -85,6 +85,8 @@ int sysctl_ip_default_ttl = IPDEFTTL; +static int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*)); + /* Generate a checksum for an outgoing IP datagram. */ __inline__ void ip_send_check(struct iphdr *iph) { @@ -409,7 +411,7 @@ * single device frame, and queue such a frame for sending. */ -int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*)) +static int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*)) { struct iphdr *iph; int raw = 0; @@ -1391,7 +1393,6 @@ #endif } -EXPORT_SYMBOL(ip_fragment); EXPORT_SYMBOL(ip_generic_getfrag); EXPORT_SYMBOL(ip_queue_xmit); EXPORT_SYMBOL(ip_send_check); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
On Sat, 7 Jan 2006, Al Viro wrote: [following is for people familiar with sparse internals; everybody else can safely skip it] ... Objections? Basically, from C POV any fouled value is int or unsigned int and we avoid generating a warning only if its upper bits will eventually be discarded. Rules above guarantee that, AFAICS. I can implement that without too much PITA, if nobody objects. Linus? I don't have any objections. I can't say that it strikes me as being a huge deal, but (a) you've done most of the sparse annotations in the kernel by far by now, so whatever you say matters a lot more than my gut feel and (b) letting you get your way has historically proven to be the right thing anyway ;) So go wild. (And read the Good Kind of people section of ManagementStyle) Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of the Union: Wireless
Hi, so, can we agree on this: a)we want to distinguish between physical devices and virtual devices. Physical devices represent a network card, virtual devices a function based on the card (access point, sta, ...). Some cards can handle multiple functions parallel, we support it this way. Caveats: -rfmon can affect all virtual devices as Mike pointed out -As a matter of fact, virtual devices are not independant eveb without rfmon, simply because one physical device can only tune to one channel at a time Question: -Which link type should be used by the virtual device? Is it easier to change all protocols to support 802.11 frames or is ethernet emulation mode simplest? -If we use 802.11 natively, should we always tack on radiotap headers? b)we want to replace the wireless extension ioctl()s with netlink so we have the possibility to change multiple config items at once Questions: -what happens to iwpriv? -where are the netlink messages/wireless extensions handled? Should the device driver forward into the stack, or should the stack call into the device driver? c)which stack to use? Actually, it's intel vs. devicescape. I'm about to port the driver of a popular card to devicescape to get a personal opinion. I think we should start discussing a), beginning with b) when we have answered most questions concerning a). Thoughts? Stefan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] RCU : OOM avoidance and lower latency
On Sat, Jan 07, 2006 at 12:36:25AM -0800, David S. Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Sat, 07 Jan 2006 08:53:52 +0100 I have no problem with this, since the biggest server I have is 4 way, but are you sure big machines wont suffer from this single spinlock ? It is the main question. Also I dont understand what you want to do after this single spinlock patch. How is it supposed to help the 'ip route flush cache' problem ? In my case, I have about 600.000 dst-entries : I don't claim to have a solution to this problem currently. Doing RCU and going through the whole DST GC machinery is overkill for an active system. So, perhaps a very simple solution will do: 1) On rt_run_flush(), do not rt_free(), instead collect all active routing cache entries onto a global list, begin a timer to fire in 10 seconds (or some sysctl configurable amount). 2) When a new routing cache entry is needed, check the global list appended to in #1 above first, failing that do dst_alloc() as is done currently. 3) If timer expires, rt_free() any entries in the global list. The missing trick is how to ensure RCU semantics when reallocating from the global list. The straightforward ways of doing this require a per-entry lock in addition to the dst_entry reference count -- lots of read-side overhead. More complex approaches use a generation number that is incremented when adding to or removing from the global list. When the generation number overflows, unconditionally rt_free() it rather than adding to the global list again. Then there needs to be some clever code on the read side to detect the case when the generation number changes while acquiring a reference. And memory barriers. Also lots of read-side overhead. Also, it is now -always- necessary to acquire a reference on the read-side. The idea is that an active system will immediately repopulate itself with all of these entries just flushed from the table. RCU really doesn't handle this kind of problem very well. It truly excels when work is generated by process context work, not interrupt work. Sounds like a challenge to me. ;-) Well, one possible way to attack Eric's workload might be the following: o Size the hash table to strike the appropriate balance between read-side search overhead and memory consumption. Call the number of hash-chain headers N. o Create a hashed array of locks sized to allow the update to proceed sufficiently quickly. Call the number of locks M, probably a power of two. This means that M CPUs can be doing the update in parallel. o Create an array of M^2 list headers (call it xfer[][]), but since this is only needed during an update, it can be allocated and deallocated if need be. (Me, with my big-server experience, would probably just create the array, since M is not likely to be too large. But your mileage may vary. And you really only need M*(M-1) list headers, but that makes the index calculation a bit more annoying.) o Use a two-phase update. In the first phase, each updating CPU acquires the corresponding lock and removes entries from the corresponding partition of the hash table. If the new location of a given entry falls into the same partition, it is added back to the appropriate hash chain of that partition. Otherwise, add the entry to xfer[dst][src], where src and dst are indexes of the corresponding partitions. o When all CPUs finish removing entries from their partition, they check into a barrier. Once all have checked in, they can start the second phase of the update. o In the second phase, each CPU removes the entries from the xfer array that are destined for its partition and adds them to the hash chain that they are destined for. Some commentary and variations, in the hope that this inspires someone to come up with an even better idea: o Unless M is at least three, there is no performance gain over a single global lock with a single CPU doing the update, since each element must now undergo four list operations rather than just two. o The xfer[][] array must have each entry cache-aligned, or you lose big on cacheline effects. Note that it is -not- sufficient to simply align the rows or the columns, since each CPU has its own column when inserting and its own row when removing from xfer[][]. o And the data-skew effects are less severe if this procedure runs from process context. A spinning barrier must be used otherwise. But note that the per-partition locks could remain spinlocks, only the barrier need involve sleeping (in case that helps, am getting a bit ahead of my understanding of this part of the kernel).
Re: Multicast using bound socket
[EMAIL PROTECTED] wrote on 01/07/2006 08:13:17 AM: I've encountered the following problem: when a UDP socket was bound to a specific interface and a multicast group was joined, no packets sent to that multicast group were delivered to the application. If the socket was not bound to a specific interface, packets were delivered correctly. A bound socket will only receive packets that match the binding. If you want to receive packets sent to a particular multicast address, you need to bind to that address. Group membership is per-interface, not per-socket. You can join a group on a socket that cannot receive packets from that group. Similarly, you don't have to join a group on a particular socket to receive multicast packets from that group, as long as the binding matches and someone on the machine has joined that group. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multicast using bound socket
From: David Stevens [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 13:04:51 -0800 A bound socket will only receive packets that match the binding. If you want to receive packets sent to a particular multicast address, you need to bind to that address. That's how I see this as well. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
From: Al Viro [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 12:26:06 + OK. Another question: do you have any objections against static inline void ip_eth_mc_map(__be32 addr, char *buf) { __u32 n=ntohl(addr); buf[0]=0x01; buf[1]=0x00; buf[2]=0x5e; buf[5]=n0xFF; addr=8; buf[4]=n0xFF; addr=8; buf[3]=n0x7F; } That does compile to exact same code as original - gcc fortunately has enough clue to realize that addr is never used past the initialization of n, which is, from C point of view, of exact same type as addr. Why are you shifting addr instead of n? Are you working with some parallel universe version of C I am unaware of? :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC: 2.6 patch] net/ipv4/ip_output.c: make ip_fragment() static
From: Adrian Bunk [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 19:15:33 +0100 Since there's no longer any external user of ip_fragment() we can make it static. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Works for me, applied. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] net/ipv6/: small cleanups
From: Adrian Bunk [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 19:17:23 +0100 This patch contains the following cleanups: - addrconf.c: make addrconf_dad_stop() static - inet6_connection_sock.c should #include net/inet6_connection_sock.h for getting the prototypes of it's global functions Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Also applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bugme-new] [Bug 5848] New: pcmcia novatel merlin u530 not working properly on 2.6.13 and up
The card seems to be sending and receiving OK. I'm wondering if this is a ppp problem? Begin forwarded message: Date: Sat, 7 Jan 2006 08:29:54 -0800 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bugme-new] [Bug 5848] New: pcmcia novatel merlin u530 not working properly on 2.6.13 and up http://bugzilla.kernel.org/show_bug.cgi?id=5848 Summary: pcmcia novatel merlin u530 not working properly on 2.6.13 and up Kernel Version: 2.6.13, 2.6.14(, 2.6.15?) Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.12 Distribution: Gentoo Linux AMD64 Hardware Environment: Athlon64, asus k8v deluxe, pcmcia/pci bridge, Novatel Merlin u530 Software Environment: gcc 3.4.4-r1, binutils 2.16.1, pcmcia-cs 3.2.8-r2 Problem Description: Novatel Merlin u530 is a gprs/umts(3g) pcmcia card. When i insert the card, its mapped as ttyS2. Then i pon gprs(a script which works for my isp) and it DOES establish a connection, BUT it will start working at 1Kbps and in a matter of few seconds, decrease until reaching 0Bs. The connection isnt hung up, but it wont work anymore. My guess is that this bug might be related somehow with http://bugzilla.kernel.org/show_bug.cgi?id=5569 and http://bugzilla.kernel.org/show_bug.cgi?id=5678 . But just my guess. This is a cut from /var/log/messages on 2.6.14.5 sources from kernel.org (NON_WORKING): Jan 7 15:44:41 localhost Yenta: CardBus bridge found at :00:0d.0 [414e:454c] Jan 7 15:44:41 localhost Yenta: Enabling burst memory read transactions Jan 7 15:44:41 localhost Yenta: Using CSCINT to route CSC interrupts to PCI Jan 7 15:44:41 localhost Yenta: Routing CardBus interrupts to PCI Jan 7 15:44:41 localhost Yenta TI: socket :00:0d.0, mfunc 0x00c00d02, devctl 0x42 Jan 7 15:44:41 localhost Yenta: ISA IRQ mask 0x, PCI irq 169 Jan 7 15:44:41 localhost Socket status: 3006 Jan 7 15:44:41 localhost ACPI: PCI Interrupt :00:0d.1[A] - GSI 18 (level, low) - IRQ 169 Jan 7 15:44:41 localhost Yenta: CardBus bridge found at :00:0d.1 [414e:454c] Jan 7 15:44:41 localhost Yenta: Using CSCINT to route CSC interrupts to PCI Jan 7 15:44:41 localhost Yenta: Routing CardBus interrupts to PCI Jan 7 15:44:41 localhost Yenta TI: socket :00:0d.1, mfunc 0x00c00d02, devctl 0x42 Jan 7 15:44:42 localhost Yenta: ISA IRQ mask 0x, PCI irq 169 Jan 7 15:44:42 localhost Socket status: 3006 Jan 7 15:44:42 localhost ds: ds_open(socket 0) Jan 7 15:44:42 localhost pcmcia: Detected deprecated PCMCIA ioctl usage. Jan 7 15:44:42 localhost pcmcia: This interface will soon be removed from the kernel; please expect breakage unless you upgrade to new tools. Jan 7 15:44:42 localhost pcmcia: see http://www.kernel.org/pub/linux/utils/kernel/pcmcia/pcmcia.html for details. Jan 7 15:44:42 localhost ds: ds_open(socket 1) Jan 7 15:44:42 localhost ds: ds_open(socket 2) Jan 7 15:44:42 localhost ds: ds_open(socket 2) Jan 7 15:44:42 localhost cardmgr[7041]: watching 2 sockets Jan 7 15:45:14 localhost cardmgr[7042]: socket 1: Serial or Modem Jan 7 15:45:14 localhost ttyS2 at I/O 0x3e8 (irq = 169) is a 16550A Jan 7 15:45:32 localhost pppd[7386]: pppd 2.4.2 started by root, uid 0 Jan 7 15:45:33 localhost chat[7387]: timeout set to 5 seconds Jan 7 15:45:33 localhost chat[7387]: abort on (\nBUSY\r) Jan 7 15:45:33 localhost chat[7387]: abort on (\nERROR\r) Jan 7 15:45:33 localhost chat[7387]: abort on (\nNO ANSWER\r) Jan 7 15:45:33 localhost chat[7387]: abort on (\nNO CARRIER\r) Jan 7 15:45:33 localhost chat[7387]: abort on (\nNO DIALTONE\r) Jan 7 15:45:33 localhost chat[7387]: abort on (\nRINGING\r\n\r\nRINGING\r) Jan 7 15:45:33 localhost chat[7387]: send (^MAT^M) Jan 7 15:45:33 localhost chat[7387]: timeout set to 5 seconds Jan 7 15:45:33 localhost chat[7387]: expect (OK) Jan 7 15:45:33 localhost chat[7387]: AT^M^M Jan 7 15:45:33 localhost chat[7387]: OK Jan 7 15:45:33 localhost chat[7387]: -- got it Jan 7 15:45:33 localhost chat[7387]: send (ATE1^M) Jan 7 15:45:33 localhost chat[7387]: expect (OK) Jan 7 15:45:33 localhost chat[7387]: ^M Jan 7 15:45:33 localhost chat[7387]: ATE1^M^M Jan 7 15:45:33 localhost chat[7387]: OK Jan 7 15:45:33 localhost chat[7387]: -- got it Jan 7 15:45:33 localhost chat[7387]: send (AT+cgdcont=1,IP,movistar.es^M) Jan 7 15:45:34 localhost chat[7387]: expect (OK) Jan 7 15:45:34 localhost chat[7387]: ^M Jan 7 15:45:34 localhost chat[7387]: AT+cgdcont=1,IP,movistar.es^M^M Jan 7 15:45:34 localhost chat[7387]: OK Jan 7 15:45:34 localhost chat[7387]: -- got it Jan 7 15:45:34 localhost chat[7387]: send (ATD*99***1#^M) Jan 7 15:45:34 localhost chat[7387]: expect (CONNECT)
Re: Multicast using bound socket
On Saturday 07 January 2006 22:04, you wrote: A bound socket will only receive packets that match the binding. If you want to receive packets sent to a particular multicast address, you need to bind to that address. Group membership is per-interface, not per-socket. You can join a group on a socket that cannot receive packets from that group. Similarly, you don't have to join a group on a particular socket to receive multicast packets from that group, as long as the binding matches and someone on the machine has joined that group. Then does this mean that there is no way to ensure that a particular socket can only receive multicast packets which arrived on a specific interface? Bye, Jori - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multicast using bound socket
From: Jori Liesenborgs [EMAIL PROTECTED] Date: Sat, 7 Jan 2006 23:04:06 +0100 Then does this mean that there is no way to ensure that a particular socket can only receive multicast packets which arrived on a specific interface? Try SO_BINDTODEVICE. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multicast using bound socket
On Saturday 07 January 2006 23:10, you wrote: Then does this mean that there is no way to ensure that a particular socket can only receive multicast packets which arrived on a specific interface? Try SO_BINDTODEVICE. Ok, thanks! Bye, Jori - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC: 2.6 patch] kernel/posix-timers.c: remove do_posix_clock_notimer_create()
Is there any reason for this function that is neither used nor has any real contents? Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- include/linux/posix-timers.h |1 - kernel/posix-timers.c|6 -- 2 files changed, 7 deletions(-) --- linux-2.6.15-mm2-full/include/linux/posix-timers.h.old 2006-01-07 23:13:08.0 +0100 +++ linux-2.6.15-mm2-full/include/linux/posix-timers.h 2006-01-07 23:13:17.0 +0100 @@ -84,7 +84,6 @@ void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock); /* error handlers for timer_create, nanosleep and settime */ -int do_posix_clock_notimer_create(struct k_itimer *timer); int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *, struct timespec __user *); int do_posix_clock_nosettime(const clockid_t, struct timespec *tp); --- linux-2.6.15-mm2-full/kernel/posix-timers.c.old 2006-01-07 23:13:25.0 +0100 +++ linux-2.6.15-mm2-full/kernel/posix-timers.c 2006-01-07 23:13:30.0 +0100 @@ -875,12 +875,6 @@ } EXPORT_SYMBOL_GPL(do_posix_clock_nosettime); -int do_posix_clock_notimer_create(struct k_itimer *timer) -{ - return -EINVAL; -} -EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create); - int do_posix_clock_nonanosleep(const clockid_t clock, int flags, struct timespec *t, struct timespec __user *r) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SiS190 on ASUS. monodirectional traffic
On Sat, Jan 07, 2006 at 03:45:14PM +0100, Francois Romieu wrote: Marco Atzeri [EMAIL PROTECTED] : [...] Asus motherboard K8SMV with integrated SIS190 ethernet driver vanilla kernel 2.6.15 (but the same also for .14) on Fedora 4 out 37.68 MB at 6.32 MB/s OK receive hangs around 90K Since it implies a really moderate amount of data, can you put a complete tcpdump/tethereal log somewhere (+ send a few ping on the misbehaving interface before and after the test) ? everything uploaded at http://www.geocities.com/marco_atzeri/sis190/ Just to be sure: it is a soft lock. The application hangs but the system is still responsive/usable. However it is impossible to send/receive any data through the sis190 interface, even with a different application. - is the description right ? only the interface eth1 hang on output, the rest of system works fine. I compiled with no-PREEMPT, but the situation is worse, the interface hang during the connection to the ftp server. From the tethereal log I see communication from the laptop 192.168.1.252 to the ASUS .2 but no more in the reverse direction. Only the first ping sessions have reply, after the hang also ping have no reply and nmap see no more open ports. At least for the K8SMX, my Mail/linux/support/sis190 mailbox disagrees. People have the bad habit of sending private mail _without_ Ccing netdev though. Nice, so I have some hopes ;-) any suggestion or further test to be tried is welcomed 1 - Please send your current .config and a complete dmesg. 2 - Add 'ifconfig' and '/proc/interrupts' output after the devices are loaded and you have issued a few pings. 3 - Same thing as 2) when the transfer is hung. 4 - Don't filter lspci nor lsmod output (yep, I am a control freak). 5 - Please add the output of an 'ethtool eth1' to the mix. [...] Linux version 2.6.15 ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #7 PREEMPT Tue Jan 3 19:31:16 CET 2006 PREEMPT makes me nervous. I am ok to try and debug any issue which could be related to it but I'd prefer that the whole system is made usable with a no-PREEMPT config first (SMP is fine though). Please send the updated .config as well. Linux version 2.6.15 ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #8 Sat Jan 7 19:40:08 CET 2006 I note now that eth1 shares the IRQ5 with other drivers while eth0 is alone 5:392 XT-PIC ohci_hcd:usb2, ohci_hcd:usb4, ohci1394, eth1 10: 21238 XT-PIC eth0 could be a reason for the hang ? Ueimor Thanks Marco -- marco.atzeri at fastwebnet.it La prima delle Frequently Asked Questions: dove sono le FAQ ? it.faq - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Endian-annotate struct iphdr
On Sat, Jan 07, 2006 at 01:05:53PM -0800, David S. Miller wrote: Why are you shifting addr instead of n? Because of a braino done when (re)typing it? ;-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6][NET]: Convert net/{ipv4,ipv6,sched} to netdev_priv
Hi Dave, following are a couple of assorted patches I found while cleaning out old trees. Please apply, thanks. [NET]: Convert net/{ipv4,ipv6,sched} to netdev_priv Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 763b3dea377647e2eb0d7638143b9a22d32fb989 tree d4932b5cfe487c8d6e4caf59b9464cd68e1fa46d parent ab1afff516ec3ea3fa2afa8fcf94afdc0c2f6464 author Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 23:10:18 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 23:10:18 +0100 net/ipv4/ip_gre.c | 33 +++-- net/ipv4/ipip.c | 18 +- net/ipv4/ipmr.c | 22 +++--- net/ipv6/ip6_tunnel.c | 24 net/ipv6/sit.c| 20 ++-- net/sched/sch_teql.c | 12 ++-- 6 files changed, 63 insertions(+), 66 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 912c42f..0e7c743 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -188,7 +188,7 @@ static struct ip_tunnel * ipgre_tunnel_l } if (ipgre_fb_tunnel_dev-flagsIFF_UP) - return ipgre_fb_tunnel_dev-priv; + return netdev_priv(ipgre_fb_tunnel_dev); return NULL; } @@ -278,7 +278,7 @@ static struct ip_tunnel * ipgre_tunnel_l return NULL; dev-init = ipgre_tunnel_init; - nt = dev-priv; + nt = netdev_priv(dev); nt-parms = *parms; if (register_netdevice(dev) 0) { @@ -286,9 +286,6 @@ static struct ip_tunnel * ipgre_tunnel_l goto failed; } - nt = dev-priv; - nt-parms = *parms; - dev_hold(dev); ipgre_tunnel_link(nt); return nt; @@ -299,7 +296,7 @@ failed: static void ipgre_tunnel_uninit(struct net_device *dev) { - ipgre_tunnel_unlink((struct ip_tunnel*)dev-priv); + ipgre_tunnel_unlink(netdev_priv(dev)); dev_put(dev); } @@ -518,7 +515,7 @@ out: skb2-dst-ops-update_pmtu(skb2-dst, rel_info); rel_info = htonl(rel_info); } else if (type == ICMP_TIME_EXCEEDED) { - struct ip_tunnel *t = (struct ip_tunnel*)skb2-dev-priv; + struct ip_tunnel *t = netdev_priv(skb2-dev); if (t-parms.iph.ttl) { rel_type = ICMP_DEST_UNREACH; rel_code = ICMP_HOST_UNREACH; @@ -669,7 +666,7 @@ drop_nolock: static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) { - struct ip_tunnel *tunnel = (struct ip_tunnel*)dev-priv; + struct ip_tunnel *tunnel = netdev_priv(dev); struct net_device_stats *stats = tunnel-stat; struct iphdr *old_iph = skb-nh.iph; struct iphdr *tiph; @@ -914,7 +911,7 @@ ipgre_tunnel_ioctl (struct net_device *d t = ipgre_tunnel_locate(p, 0); } if (t == NULL) - t = (struct ip_tunnel*)dev-priv; + t = netdev_priv(dev); memcpy(p, t-parms, sizeof(p)); if (copy_to_user(ifr-ifr_ifru.ifru_data, p, sizeof(p))) err = -EFAULT; @@ -954,7 +951,7 @@ ipgre_tunnel_ioctl (struct net_device *d } else { unsigned nflags=0; -t = (struct ip_tunnel*)dev-priv; +t = netdev_priv(dev); if (MULTICAST(p.iph.daddr)) nflags = IFF_BROADCAST; @@ -1003,7 +1000,7 @@ ipgre_tunnel_ioctl (struct net_device *d if ((t = ipgre_tunnel_locate(p, 0)) == NULL) goto done; err = -EPERM; - if (t == ipgre_fb_tunnel_dev-priv) + if (t == netdev_priv(ipgre_fb_tunnel_dev)) goto done; dev = t-dev; } @@ -1020,12 +1017,12 @@ done: static struct net_device_stats *ipgre_tunnel_get_stats(struct net_device *dev) { - return (((struct ip_tunnel*)dev-priv)-stat); + return (((struct ip_tunnel*)netdev_priv(dev))-stat); } static int ipgre_tunnel_change_mtu(struct net_device *dev, int new_mtu) { - struct ip_tunnel *tunnel = (struct ip_tunnel*)dev-priv; + struct ip_tunnel *tunnel = netdev_priv(dev); if (new_mtu 68 || new_mtu 0xFFF8 - tunnel-hlen) return -EINVAL; dev-mtu = new_mtu; @@ -1065,7 +1062,7 @@ static int ipgre_tunnel_change_mtu(struc static int ipgre_header(struct sk_buff *skb, struct net_device *dev, unsigned short type, void *daddr, void *saddr, unsigned len) { - struct ip_tunnel *t = (struct ip_tunnel*)dev-priv; + struct ip_tunnel *t = netdev_priv(dev); struct iphdr *iph = (struct iphdr *)skb_push(skb, t-hlen); u16 *p = (u16*)(iph+1); @@ -1092,7 +1089,7 @@ static int ipgre_header(struct sk_buff * static int ipgre_open(struct net_device *dev) { - struct ip_tunnel *t = (struct ip_tunnel*)dev-priv; + struct ip_tunnel *t = netdev_priv(dev); if (MULTICAST(t-parms.iph.daddr)) { struct flowi fl = { .oif = t-parms.link, @@ -1116,7 +1113,7 @@ static int ipgre_open(struct net_device static int ipgre_close(struct net_device *dev) { - struct ip_tunnel *t = (struct ip_tunnel*)dev-priv; + struct ip_tunnel *t = netdev_priv(dev); if (MULTICAST(t-parms.iph.daddr) t-mlink) { struct in_device *in_dev = inetdev_by_index(t-mlink); if (in_dev) { @@ -1156,7 +1153,7 @@ static int ipgre_tunnel_init(struct net_ int mtu = ETH_DATA_LEN; int addend = sizeof(struct iphdr) + 4; - tunnel = (struct ip_tunnel*)dev-priv; + tunnel = netdev_priv(dev); iph =
[PATCH 2/6][PKT_SCHED]: Use USEC_PER_SEC
[PKT_SCHED]: Use USEC_PER_SEC Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit ab1afff516ec3ea3fa2afa8fcf94afdc0c2f6464 tree 6e41ef25fd8994436d1ab32c9503f46470e9e985 parent 0aec63e67c69545ca757a73a66f5dcf05fa484bf author Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 22:47:18 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 22:47:18 +0100 include/net/pkt_sched.h | 22 +++--- net/sched/sch_hfsc.c|8 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 6492e73..fb9ef75 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -59,8 +59,8 @@ typedef struct timeval psched_time_t; typedef long psched_tdiff_t; #define PSCHED_GET_TIME(stamp) do_gettimeofday((stamp)) -#define PSCHED_US2JIFFIE(usecs) (((usecs)+(100/HZ-1))/(100/HZ)) -#define PSCHED_JIFFIE2US(delay) ((delay)*(100/HZ)) +#define PSCHED_US2JIFFIE(usecs) (((usecs)+(USEC_PER_SEC/HZ-1))/(USEC_PER_SEC/HZ)) +#define PSCHED_JIFFIE2US(delay) ((delay)*(USEC_PER_SEC/HZ)) #else /* !CONFIG_NET_SCH_CLK_GETTIMEOFDAY */ @@ -123,9 +123,9 @@ do { \ default: \ __delta = 0; \ case 2: \ - __delta += 100; \ + __delta += USEC_PER_SEC; \ case 1: \ - __delta += 100; \ + __delta += USEC_PER_SEC; \ } \ } \ __delta; \ @@ -136,9 +136,9 @@ psched_tod_diff(int delta_sec, int bound { int delta; - if (bound = 100 || delta_sec (0x7FFF/100)-1) + if (bound = USEC_PER_SEC || delta_sec (0x7FFF/USEC_PER_SEC)-1) return bound; - delta = delta_sec * 100; + delta = delta_sec * USEC_PER_SEC; if (delta bound || delta 0) delta = bound; return delta; @@ -152,9 +152,9 @@ psched_tod_diff(int delta_sec, int bound default: \ __delta = psched_tod_diff(__delta_sec, bound); break; \ case 2: \ - __delta += 100; \ + __delta += USEC_PER_SEC; \ case 1: \ - __delta += 100; \ + __delta += USEC_PER_SEC; \ case 0: \ if (__delta bound || __delta 0) \ __delta = bound; \ @@ -170,15 +170,15 @@ psched_tod_diff(int delta_sec, int bound ({ \ int __delta = (tv).tv_usec + (delta); \ (tv_res).tv_sec = (tv).tv_sec; \ - if (__delta 100) { (tv_res).tv_sec++; __delta -= 100; } \ + if (__delta USEC_PER_SEC) { (tv_res).tv_sec++; __delta -= USEC_PER_SEC; } \ (tv_res).tv_usec = __delta; \ }) #define PSCHED_TADD(tv, delta) \ ({ \ (tv).tv_usec += (delta); \ - if ((tv).tv_usec 100) { (tv).tv_sec++; \ - (tv).tv_usec -= 100; } \ + if ((tv).tv_usec USEC_PER_SEC) { (tv).tv_sec++; \ + (tv).tv_usec -= USEC_PER_SEC; } \ }) /* Set/check that time is in the past perfect; diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index c26764b..cad1758 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -208,7 +208,7 @@ struct hfsc_sched do { \ struct timeval tv; \ do_gettimeofday(tv); \ - (stamp) = 100ULL * tv.tv_sec + tv.tv_usec; \ + (stamp) = 1ULL * USEC_PER_SEC * tv.tv_sec + tv.tv_usec; \ } while (0) #endif @@ -502,8 +502,8 @@ d2dx(u32 d) u64 dx; dx = ((u64)d * PSCHED_JIFFIE2US(HZ)); - dx += 100 - 1; - do_div(dx, 100); + dx += USEC_PER_SEC - 1; + do_div(dx, USEC_PER_SEC); return dx; } @@ -523,7 +523,7 @@ dx2d(u64 dx) { u64 d; - d = dx * 100; + d = dx * USEC_PER_SEC; do_div(d, PSCHED_JIFFIE2US(HZ)); return (u32)d; }
[PATCH 5/6][PKT_SCHED]: Fix memory leak when dumping in pedit action
[PKT_SCHED]: Fix memory leak when dumping in pedit action Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit b73952761225e41cb81afe157cb312a594a95693 tree be5312ebbabc4f10c9e08a00b34adaec3c636088 parent 0aec63e67c69545ca757a73a66f5dcf05fa484bf author Patrick McHardy [EMAIL PROTECTED] Sun, 08 Jan 2006 00:28:58 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sun, 08 Jan 2006 00:28:58 +0100 net/sched/pedit.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/sched/pedit.c b/net/sched/pedit.c index 767d24f..e469c5a 100644 --- a/net/sched/pedit.c +++ b/net/sched/pedit.c @@ -246,10 +246,12 @@ tcf_pedit_dump(struct sk_buff *skb, stru t.lastuse = jiffies_to_clock_t(jiffies - p-tm.lastuse); t.expires = jiffies_to_clock_t(p-tm.expires); RTA_PUT(skb, TCA_PEDIT_TM, sizeof(t), t); + kfree(opt); return skb-len; rtattr_failure: skb_trim(skb, b - skb-data); + kfree(opt); return -1; }
[PATCH 6/6][PKT_SCHED]: Prefix tc actions with act_
This patch prefixes the tc actions with act_. I didn't include the actual renames in the patch because I think it makes git loose its history, please execute these commands when applying the patch: git-rename net/sched/mirred.c net/sched/act_mirred.c git-rename net/sched/ipt.c net/sched/act_ipt.c git-rename net/sched/gact.c net/sched/act_gact.c git-rename net/sched/police.c net/sched/act_police.c git-rename net/sched/simple.c net/sched/act_simple.c git-rename net/sched/pedit.c net/sched/act_pedit.c Thanks. [PKT_SCHED]: Prefix tc actions with act_ Clean up the net/sched directory a bit by prefix all actions with act_. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 9162c4b1cddbc7a67e16641c249b0aac95a2194a tree db22e79bd87036655398574e93a9c38434182704 parent 1acf07902f3ae56bb78ce68b619e252d655781a4 author Patrick McHardy [EMAIL PROTECTED] Sun, 08 Jan 2006 00:04:55 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sun, 08 Jan 2006 00:04:55 +0100 net/sched/Makefile | 14 +++--- net/sched/act_api.c |2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/sched/Makefile b/net/sched/Makefile index e48d0d4..0f06aec 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -7,13 +7,13 @@ obj-y := sch_generic.o obj-$(CONFIG_NET_SCHED) += sch_api.o sch_fifo.o sch_blackhole.o obj-$(CONFIG_NET_CLS) += cls_api.o obj-$(CONFIG_NET_CLS_ACT) += act_api.o -obj-$(CONFIG_NET_ACT_POLICE) += police.o -obj-$(CONFIG_NET_CLS_POLICE) += police.o -obj-$(CONFIG_NET_ACT_GACT) += gact.o -obj-$(CONFIG_NET_ACT_MIRRED) += mirred.o -obj-$(CONFIG_NET_ACT_IPT) += ipt.o -obj-$(CONFIG_NET_ACT_PEDIT) += pedit.o -obj-$(CONFIG_NET_ACT_SIMP) += simple.o +obj-$(CONFIG_NET_ACT_POLICE) += act_police.o +obj-$(CONFIG_NET_CLS_POLICE) += act_police.o +obj-$(CONFIG_NET_ACT_GACT) += act_gact.o +obj-$(CONFIG_NET_ACT_MIRRED) += act_mirred.o +obj-$(CONFIG_NET_ACT_IPT) += act_ipt.o +obj-$(CONFIG_NET_ACT_PEDIT) += act_pedit.o +obj-$(CONFIG_NET_ACT_SIMP) += act_simple.o obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o obj-$(CONFIG_NET_SCH_HPFQ) += sch_hpfq.o diff --git a/net/sched/act_api.c b/net/sched/act_api.c index bd651a4..792ce59 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -290,7 +290,7 @@ struct tc_action *tcf_action_init_1(stru if (a_o == NULL) { #ifdef CONFIG_KMOD rtnl_unlock(); - request_module(act_name); + request_module(act_%s, act_name); rtnl_lock(); a_o = tc_lookup_action_n(act_name);
[PATCH 4/6][PKT_SCHED]: Remove some obsolete policer exports
[PKT_SCHED]: Remove some obsolete policer exports Also make sure the legacy code is only built when CONFIG_NET_CLS_ACT is not set. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 913c3439802f049aa6938116a662875d0ba1b62f tree e14e9413ea0732e397932459545134fcd0f9adf7 parent c69c8677ae026b0a9c84c4dd0384ad3bcfc11fc8 author Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 23:52:00 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 23:52:00 +0100 net/sched/police.c | 14 +++--- 1 files changed, 3 insertions(+), 11 deletions(-) diff --git a/net/sched/police.c b/net/sched/police.c index a834516..fa877f8 100644 --- a/net/sched/police.c +++ b/net/sched/police.c @@ -407,7 +407,7 @@ police_cleanup_module(void) module_init(police_init_module); module_exit(police_cleanup_module); -#endif +#else /* CONFIG_NET_CLS_ACT */ struct tcf_police * tcf_police_locate(struct rtattr *rta, struct rtattr *est) { @@ -544,6 +544,7 @@ int tcf_police(struct sk_buff *skb, stru spin_unlock(p-lock); return p-action; } +EXPORT_SYMBOL(tcf_police); int tcf_police_dump(struct sk_buff *skb, struct tcf_police *p) { @@ -600,13 +601,4 @@ errout: return -1; } - -EXPORT_SYMBOL(tcf_police); -EXPORT_SYMBOL(tcf_police_destroy); -EXPORT_SYMBOL(tcf_police_dump); -EXPORT_SYMBOL(tcf_police_dump_stats); -EXPORT_SYMBOL(tcf_police_hash); -EXPORT_SYMBOL(tcf_police_ht); -EXPORT_SYMBOL(tcf_police_locate); -EXPORT_SYMBOL(tcf_police_lookup); -EXPORT_SYMBOL(tcf_police_new_index); +#endif /* CONFIG_NET_CLS_ACT */
[PATCH 3/6][PKT_SCHED]: Convert tc action functions to single skb pointers
[PKT_SCHED]: Convert tc action functions to single skb pointers tcf_action_exec only gets a single skb pointer and doesn't own the skb, but passes double skb pointers (to a local variable) to the action functions. Change to use single skb pointers everywhere. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit c69c8677ae026b0a9c84c4dd0384ad3bcfc11fc8 tree fd2db81ea53cd361afa2d06fe7078231a0f1be3a parent 763b3dea377647e2eb0d7638143b9a22d32fb989 author Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 23:44:55 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sat, 07 Jan 2006 23:44:55 +0100 include/net/act_api.h |2 +- net/sched/act_api.c |2 +- net/sched/gact.c |3 +-- net/sched/ipt.c |6 -- net/sched/mirred.c|3 +-- net/sched/pedit.c |3 +-- net/sched/police.c|3 +-- net/sched/simple.c|3 +-- 8 files changed, 11 insertions(+), 14 deletions(-) diff --git a/include/net/act_api.h b/include/net/act_api.h index b55eb7c..11e9eaf 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -63,7 +63,7 @@ struct tc_action_ops __u32 type; /* TBD to match kind */ __u32 capab; /* capabilities includes 4 bit version */ struct module *owner; - int (*act)(struct sk_buff **, struct tc_action *, struct tcf_result *); + int (*act)(struct sk_buff *, struct tc_action *, struct tcf_result *); int (*get_stats)(struct sk_buff *, struct tc_action *); int (*dump)(struct sk_buff *, struct tc_action *,int , int); int (*cleanup)(struct tc_action *, int bind); diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 2ce1cb2..bd651a4 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -165,7 +165,7 @@ int tcf_action_exec(struct sk_buff *skb, while ((a = act) != NULL) { repeat: if (a-ops a-ops-act) { - ret = a-ops-act(skb, a, res); + ret = a-ops-act(skb, a, res); if (TC_MUNGED skb-tc_verd) { /* copied already, allow trampling */ skb-tc_verd = SET_TC_OK2MUNGE(skb-tc_verd); diff --git a/net/sched/gact.c b/net/sched/gact.c index d1c6d54..a1e68f7 100644 --- a/net/sched/gact.c +++ b/net/sched/gact.c @@ -135,10 +135,9 @@ tcf_gact_cleanup(struct tc_action *a, in } static int -tcf_gact(struct sk_buff **pskb, struct tc_action *a, struct tcf_result *res) +tcf_gact(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res) { struct tcf_gact *p = PRIV(a, gact); - struct sk_buff *skb = *pskb; int action = TC_ACT_SHOT; spin_lock(p-lock); diff --git a/net/sched/ipt.c b/net/sched/ipt.c index f50136e..b500193 100644 --- a/net/sched/ipt.c +++ b/net/sched/ipt.c @@ -201,11 +201,10 @@ tcf_ipt_cleanup(struct tc_action *a, int } static int -tcf_ipt(struct sk_buff **pskb, struct tc_action *a, struct tcf_result *res) +tcf_ipt(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res) { int ret = 0, result = 0; struct tcf_ipt *p = PRIV(a, ipt); - struct sk_buff *skb = *pskb; if (skb_cloned(skb)) { if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) @@ -222,6 +221,9 @@ tcf_ipt(struct sk_buff **pskb, struct tc worry later - danger - this API seems to have changed from earlier kernels */ + /* iptables targets take a double skb pointer in case the skb + * needs to be replaced. We don't own the skb, so this must not + * happen. The pskb_expand_head above should make sure of this */ ret = p-t-u.kernel.target-target(skb, skb-dev, NULL, p-hook, p-t-data, NULL); switch (ret) { diff --git a/net/sched/mirred.c b/net/sched/mirred.c index 20d0691..4fcccbd 100644 --- a/net/sched/mirred.c +++ b/net/sched/mirred.c @@ -158,12 +158,11 @@ tcf_mirred_cleanup(struct tc_action *a, } static int -tcf_mirred(struct sk_buff **pskb, struct tc_action *a, struct tcf_result *res) +tcf_mirred(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res) { struct tcf_mirred *p = PRIV(a, mirred); struct net_device *dev; struct sk_buff *skb2 = NULL; - struct sk_buff *skb = *pskb; u32 at = G_TC_AT(skb-tc_verd); spin_lock(p-lock); diff --git a/net/sched/pedit.c b/net/sched/pedit.c index 767d24f..b5167af 100644 --- a/net/sched/pedit.c +++ b/net/sched/pedit.c @@ -130,10 +130,9 @@ tcf_pedit_cleanup(struct tc_action *a, i } static int -tcf_pedit(struct sk_buff **pskb, struct tc_action *a, struct tcf_result *res) +tcf_pedit(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res) { struct tcf_pedit *p = PRIV(a, pedit); - struct sk_buff *skb = *pskb; int i, munged = 0; u8 *pptr; diff --git a/net/sched/police.c b/net/sched/police.c index eb39fb2..a834516 100644 --- a/net/sched/police.c +++ b/net/sched/police.c @@ -284,11 +284,10 @@ static int tcf_act_police_cleanup(struct return 0; } -static int tcf_act_police(struct sk_buff **pskb, struct tc_action *a, +static int tcf_act_police(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res) { psched_time_t now; - struct sk_buff *skb = *pskb; struct
[W1]: Remove incorrect MODULE_ALIAS
[W1]: Remove incorrect MODULE_ALIAS The w1 netlink socket is created by a hardware specific driver calling w1_add_master_device, so there is no point in including a module alias for netlink autoloading in the core. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit a8657adb8c04bbe30544306ec55005a635ba65fd tree 2c029cf104239958220629d34c76c7290bd99e43 parent b73952761225e41cb81afe157cb312a594a95693 author Patrick McHardy [EMAIL PROTECTED] Sun, 08 Jan 2006 00:42:42 +0100 committer Patrick McHardy [EMAIL PROTECTED] Sun, 08 Jan 2006 00:42:42 +0100 drivers/w1/w1_int.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c index c3f67ea..e2920f0 100644 --- a/drivers/w1/w1_int.c +++ b/drivers/w1/w1_int.c @@ -217,5 +217,3 @@ void w1_remove_master_device(struct w1_b EXPORT_SYMBOL(w1_add_master_device); EXPORT_SYMBOL(w1_remove_master_device); - -MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_W1);
Re: [PATCH 6/6][PKT_SCHED]: Prefix tc actions with act_
From: Patrick McHardy [EMAIL PROTECTED] Date: Sun, 08 Jan 2006 00:38:03 +0100 This patch prefixes the tc actions with act_. I didn't include the actual renames in the patch because I think it makes git loose its history, please execute these commands when applying the patch: Note that when you generate patches using GIT, it emits some header line information in the diff describing moves, deletes, and stuff like that. So, when you feed such a patch back into the GIT patch application programs it knows exactly what to do. You don't have to regenerate your changes or anything like that, I'm just letting you know for the future. :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Change some if (x) BUG(); to BUG_ON(x);
This changes some simple if (x) BUG(); statements to BUG_ON(x); Signed-off-by: Kris Katterjohn [EMAIL PROTECTED] This is a diff from 2.6.15. Obviously I wasn't able to test these changes per se, but is there a reason they wouldn't work correctly? Thanks! --- x/net/core/dev.c2006-01-02 21:21:10.0 -0600 +++ y/net/core/dev.c2006-01-07 20:00:45.0 -0600 @@ -1092,15 +1092,12 @@ int skb_checksum_help(struct sk_buff *sk goto out; } - if (offset (int)skb-len) - BUG(); + BUG_ON(offset (int)skb-len); csum = skb_checksum(skb, offset, skb-len-offset, 0); offset = skb-tail - skb-h.raw; - if (offset = 0) - BUG(); - if (skb-csum + 2 offset) - BUG(); + BUG_ON(offset = 0); + BUG_ON(skb-csum + 2 offset); *(u16*)(skb-h.raw + skb-csum) = csum_fold(csum); skb-ip_summed = CHECKSUM_NONE; --- x/net/core/skbuff.c 2006-01-02 21:21:10.0 -0600 +++ y/net/core/skbuff.c 2006-01-07 20:01:55.0 -0600 @@ -792,8 +792,7 @@ int ___pskb_trim(struct sk_buff *skb, un int end = offset + skb_shinfo(skb)-frags[i].size; if (end len) { if (skb_cloned(skb)) { - if (!realloc) - BUG(); + BUG_ON(!realloc); if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) return -ENOMEM; } @@ -895,8 +894,7 @@ unsigned char *__pskb_pull_tail(struct s struct sk_buff *insp = NULL; do { - if (!list) - BUG(); + BUG_ON(!list); if (list-len = eat) { /* Eaten as whole. */ @@ -1200,8 +1198,7 @@ unsigned int skb_checksum(const struct s start = end; } } - if (len) - BUG(); + BUG_ON(len); return csum; } @@ -1283,8 +1280,7 @@ unsigned int skb_copy_and_csum_bits(cons start = end; } } - if (len) - BUG(); + BUG_ON(len); return csum; } @@ -1298,8 +1294,7 @@ void skb_copy_and_csum_dev(const struct else csstart = skb_headlen(skb); - if (csstart skb_headlen(skb)) - BUG(); + BUG_ON(csstart skb_headlen(skb)); memcpy(to, skb-data, csstart); --- x/net/ipv4/icmp.c 2006-01-02 21:21:10.0 -0600 +++ y/net/ipv4/icmp.c 2006-01-07 20:02:07.0 -0600 @@ -898,8 +898,7 @@ static void icmp_address_reply(struct sk u32 _mask, *mp; mp = skb_header_pointer(skb, 0, sizeof(_mask), _mask); - if (mp == NULL) - BUG(); + BUG_ON(mp == NULL); for (ifa = in_dev-ifa_list; ifa; ifa = ifa-ifa_next) { if (*mp == ifa-ifa_mask inet_ifa_match(rt-rt_src, ifa)) --- x/net/ipv4/inetpeer.c 2006-01-02 21:21:10.0 -0600 +++ y/net/ipv4/inetpeer.c 2006-01-07 20:02:26.0 -0600 @@ -304,8 +304,7 @@ static void unlink_from_pool(struct inet /* look for a node to insert instead of p */ struct inet_peer *t; t = lookup_rightempty(p); - if (*stackptr[-1] != t) - BUG(); + BUG_ON(*stackptr[-1] != t); **--stackptr = t-avl_left; /* t is removed, t-v4daddr x-v4daddr for any * x in p-avl_left subtree. @@ -314,8 +313,7 @@ static void unlink_from_pool(struct inet t-avl_left = p-avl_left; t-avl_right = p-avl_right; t-avl_height = p-avl_height; - if (delp[1] != p-avl_left) - BUG(); + BUG_ON(delp[1] != p-avl_left); delp[1] = t-avl_left; /* was p-avl_left */ } peer_avl_rebalance(stack, stackptr); --- x/net/ipv4/tcp_input.c 2006-01-02 21:21:10.0 -0600 +++ y/net/ipv4/tcp_input.c 2006-01-07 20:02:54.0 -0600 @@ -3307,7 +3307,7 @@ tcp_collapse(struct sock *sk, struct sk_ int offset = start - TCP_SKB_CB(skb)-seq; int size = TCP_SKB_CB(skb)-end_seq - start; - if (offset 0) BUG(); + BUG_ON(offset 0); if (size 0) { size = min(copy, size); if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) ---
mv643xx_eth_start_xmit: calls skb_linearize with interrupts off
Hi, I am seeing backtraces on 2.6.15-rc7 on NFS traffic that are similar to this one: Badness in local_bh_enable at kernel/softirq.c:140 Call trace: [c0005340] check_bug_trap+0xb0/0xd0 [c0005bb4] program_check_exception+0x174/0x4f0 [c0004f04] ret_from_except_full+0x0/0x4c [c0021b8c] local_bh_enable+0x1c/0x90 [c0208410] skb_copy_bits+0x2a0/0x3c0 [c020f778] __skb_linearize+0x98/0x190 [c019f3b0] mv643xx_eth_start_xmit+0x2c0/0x5a0 [c02113f8] dev_queue_xmit+0x258/0x3c0 [c0232ffc] ip_finish_output+0x14c/0x2e0 [c0231c50] ip_fragment+0x4c0/0x720 [c02338c8] ip_output+0x258/0x360 [c02322d8] ip_push_pending_frames+0x428/0x4c0 [c0252b38] udp_push_pending_frames+0x108/0x230 [c02536b0] udp_sendmsg+0x3b0/0x6a0 [c025c660] inet_sendmsg+0x50/0x80 [c02025d8] sock_sendmsg+0xa8/0xf0 [c0202654] kernel_sendmsg+0x34/0x60 CONFIG_PPC=y CONFIG_HIGHMEM=y CONFIG_PREEMPT=y CONFIG_MV643XX_ETH=y In mv643xx_eth_start_xmit: [...] spin_lock_irqsave(mp-lock, flags); [...] /* Since hardware can't handle unaligned fragments smaller * than 9 bytes, if we find any, we linearize the skb * and start again. */ [...] skb_linearize(skb, GFP_ATOMIC); [...] which ends up calling kunmap_skb_frag(vaddr), which, when CONFIG_HIGHMEM=y, calls local_bh_enable with interrupts off. -- Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: (2nd try) [PATCH] corruption during e100 MDI register access
On 1/6/06, ODonnell, Michael [EMAIL PROTECTED] wrote: [ 2nd transmission. Microsoft mailer helpfully reformatted the patch in the last one... :-(] Greetings, We have identified two related bugs in the e100 driver and we request that they be repaired in the official Intel version of the driver. Both bugs are related to manipulation of the MDI control register. The first problem is that the Ready bit is being ignored when writing to the Control register; we noticed this because the Linux bonding driver would occasionally come to the spurious conclusion that the link was down when querying Link State. It turned out that by failing to wait for a previous command to complete it was selecting what was essentially a random register in the MDI register set. When we added code that waits for the Ready bit (as shown in the patch file below) all such problems ceased. damn, you know I had seen this on one machine only, and the machine had other problems, so i thought it wasn't e100. I can't quite figure out why we haven't seen this more often given how long the bug appears to have existed. The second problem is that, although access to the MDI registers involves multiple steps which must not be intermixed, nothing was defending against two or more threads attempting simultaneous access. The most obvious situation where such interference could occur involves the watchdog versus ioctl paths, but there are probably others, so we recommend the locking shown in our patch file. Agreed, but once again I am simply amazed this has been there so long. I think these are both good patches and I'll ack this and absorb it for our next release. It will be a bit before its completely through our process but its okay with me if this goes into the kernel now. Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html