Re: myri10ge conversion to non-contiguous skb
On 8/24/06, Brice Goglin [EMAIL PROTECTED] wrote: During the submission of the myri10ge driver, some people raised the question of using pages (or any kind of non-contiguous skb) instead of our current 16kB contiguous skb. We are looking at this right now and it is not clear what solution is the best. From what we understand, Linux provides two mostly redundant mechanisms to handle discontinuous skb, the skb-frags and the skb-frag_list, s2io using the latter while e1000 uses the former. Is one or the other recommended? What is the purpose of having them both in the net core? you really only have one option, to use PAGE_SIZE pages and frags[] w/nr_frags. e1000 tried the frag_list option but that is used by ip reassembly and badly conflicts with driver generated frag_list. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]d80211: fix iwconfig key [x] behavior
iwconfig key [x] behavior is not correctly handled in the stack, also modify the giwencode method to show the key info. Thanks, Hong [PATCH]d80211: fix iwconfig key [x] behavior Signed-off-by: Hong Liu [EMAIL PROTECTED] diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c index dd52555..d3dc59c 100644 --- a/net/d80211/ieee80211_ioctl.c +++ b/net/d80211/ieee80211_ioctl.c @@ -2811,9 +2811,10 @@ static int ieee80211_ioctl_siwencode(str if (sdata-default_key == NULL) idx = 0; else for (i = 0; i NUM_DEFAULT_KEYS; i++) { - if (sdata-default_key == sdata-keys[i]) + if (sdata-default_key == sdata-keys[i]) { idx = i; - break; +break; + } } if (idx 0) return -EINVAL; @@ -2824,16 +2825,22 @@ static int ieee80211_ioctl_siwencode(str alg = ALG_NONE; else if (erq-length == 0) { /* No key data - just set the default TX key index */ - sdata-default_key = sdata-keys[idx]; + if (sdata-default_key != sdata-keys[idx]) { + if (sdata-default_key) +ieee80211_key_sysfs_remove_default(sdata); + sdata-default_key = sdata-keys[idx]; + if (sdata-default_key) +ieee80211_key_sysfs_add_default(sdata); + } + return 0; } return ieee80211_set_encryption( dev, bcaddr, - idx, erq-length == 0 ? ALG_NONE : ALG_WEP, + idx, alg, sdata-default_key == NULL, NULL, keybuf, erq-length); - return 0; } @@ -2852,9 +2859,10 @@ static int ieee80211_ioctl_giwencode(str if (sdata-default_key == NULL) idx = 0; else for (i = 0; i NUM_DEFAULT_KEYS; i++) { - if (sdata-default_key == sdata-keys[i]) + if (sdata-default_key == sdata-keys[i]) { idx = i; - break; +break; + } } if (idx 0) return -EINVAL; @@ -2869,7 +2877,8 @@ static int ieee80211_ioctl_giwencode(str return 0; } - erq-length = 0; + erq-length = sdata-keys[idx]-keylen; + memcpy(key, sdata-keys[idx]-key, erq-length); erq-flags |= IW_ENCODE_ENABLED; return 0;
Re: [PATCH] net/*: use SLAB_PANIC
On Sun, Aug 27, 2006 at 03:08:41AM +0400, Alexey Dobriyan wrote: Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- Forgive me reformatting, in some cases making it fit in 80 columns was hard. net/core/flow.c|6 +- net/core/neighbour.c | 12 net/core/skbuff.c |9 ++--- net/decnet/dn_route.c | 11 +++ net/ipv4/inetpeer.c|5 + net/ipv4/ipmr.c|5 + net/ipv4/route.c | 10 +++--- net/ipv4/tcp.c |4 +--- net/ipv6/ip6_fib.c |4 +--- net/ipv6/route.c | 10 +++--- ipv6 can be modular, so panicing on a initialization failure is wrong. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.17 2/9] NetXen: Hardware access routines
On Monday 21 August 2006 19:33, Stephen Hemminger wrote: On Mon, 21 Aug 2006 13:57:23 +0530 Amit S. Kale [EMAIL PROTECTED] wrote: We can certainly create a table for all error messages. It'll hurt readability of code in many of the other places where printks are used to indicate some hardware error. -Amit My suggestion was intended as an way to handle multiple driver versions all using the same firmware or vice versa. By locking the firmware and driver version together you might make maintenance more difficult. Ah, I had missed that completely in your first email. Thanks for your suggestion. The NetXen firmware will most probably keep changing. It's hardware is flexible enough, so the firmware changes will possibly be varied in nature. Thinking about this further, it seems we should coalesce firmware dependent code into a few isolated functions. While this may be difficult, we should do it anyway. Hopefully future changes will not cause these efforts to go waste. -Amit - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] net: VM deadlock avoidance framework
On Sat, 2006-08-26 at 04:37 +0200, Indan Zupancic wrote: On Fri, August 25, 2006 17:39, Peter Zijlstra said: @@ -282,7 +282,8 @@ struct sk_buff { nfctinfo:3; __u8pkt_type:3, fclone:2, - ipvs_property:1; + ipvs_property:1, + emerg:1; __be16 protocol; Why not 'emergency'? Looks like 'emerge' with a typo now. ;-) hehe, me lazy, you gentoo ;-) sed -i -e 's/emerg/emregency/g' -e 's/EMERG/EMERGENCY/g' *.patch @@ -391,6 +391,7 @@ enum sock_flags { SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ + SOCK_VMIO, /* promise to never block on receive */ It might be used for IO related to the VM, but that doesn't tell _what_ it does. It also does much more than just not blocking on receive, so overal, aren't both the vmio name and the comment slightly misleading? I'm so having trouble with this name; I had SOCK_NONBLOCKING for a while, but that is a very bad name because nonblocking has this well defined meaning when talking about sockets, and this is not that. Hence I came up with the VMIO, because that is the only selecting criteria for being special. - I'll fix up the comment. +static inline int emerg_rx_pages_try_inc(void) +{ + return atomic_read(vmio_socks) + atomic_add_unless(emerg_rx_pages_used, 1, RX_RESERVE_PAGES); +} It looks cleaner to move that first check to the caller, as it is often redundant and in the other cases makes it more clear what the caller is really doing. Yes, very good suggestion indeed, what was I thinking?! @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table); static char *zone_names[MAX_NR_ZONES] = { DMA, DMA32, Normal, HighMem }; int min_free_kbytes = 1024; +int var_free_kbytes; Using var_free_pages makes the code slightly simpler, as all that needless convertion isn't needed anymore. Perhaps the same is true for min_free_kbytes... 't seems I'm a bit puzzled as to what you mean here. +noskb: + /* Attempt emergency allocation when RX skb. */ + if (!(flags SKB_ALLOC_RX)) + goto out; So only incoming skb allocation is guaranteed? What about outgoing skbs? What am I missing? Or can we sleep then, and increasing var_free_kbytes is sufficient to guarantee it? -sk_allocation |= __GFP_EMERGENCY - will take care of the outgoing packets. Also, since one only sends a limited number of packets out and then will wait for answers, we do not need to worry about fragmentation issues that much in this case. +static void emerg_free_skb(struct kmem_cache *cache, void *objp) +{ + free_page((unsigned long)objp); + emerg_rx_pages_dec(); +} + /* * Free an skbuff by memory without cleaning the state. */ @@ -326,17 +373,21 @@ void kfree_skbmem(struct sk_buff *skb) { struct sk_buff *other; atomic_t *fclone_ref; + void (*free_skb)(struct kmem_cache *, void *); skb_release_data(skb); + + free_skb = skb-emerg ? emerg_free_skb : kmem_cache_free; + switch (skb-fclone) { case SKB_FCLONE_UNAVAILABLE: - kmem_cache_free(skbuff_head_cache, skb); + free_skb(skbuff_head_cache, skb); break; case SKB_FCLONE_ORIG: fclone_ref = (atomic_t *) (skb + 2); if (atomic_dec_and_test(fclone_ref)) - kmem_cache_free(skbuff_fclone_cache, skb); + free_skb(skbuff_fclone_cache, skb); break; case SKB_FCLONE_CLONE: @@ -349,7 +400,7 @@ void kfree_skbmem(struct sk_buff *skb) skb-fclone = SKB_FCLONE_UNAVAILABLE; if (atomic_dec_and_test(fclone_ref)) - kmem_cache_free(skbuff_fclone_cache, other); + free_skb(skbuff_fclone_cache, other); break; }; } I don't have the original code in front of me, but isn't it possible to add a goto free which has all the freeing in one place? That would get rid of the function pointer stuff and emerg_free_skb. perhaps, yes, however I prefer this one, it allows access to the size. @@ -435,6 +486,17 @@ struct sk_buff *skb_clone(struct sk_buff atomic_t *fclone_ref = (atomic_t *) (n + 1); n-fclone = SKB_FCLONE_CLONE; atomic_inc(fclone_ref); + } else if (skb-emerg) { + if (!emerg_rx_pages_try_inc()) + return NULL; + + n = (void *)__get_free_page(gfp_mask | __GFP_EMERG); + if (!n) { + WARN_ON(1); + emerg_rx_pages_dec(); + return NULL; + } + n-fclone = SKB_FCLONE_UNAVAILABLE; } else { n =
Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)
[NET/IPv4]: update for udp.c only, to match 2.6.18-rc4-mm3 This is an update only, as the previous patch can not cope with recent changes to udp.c (all other files remain the same). Up-to-date, complete patches can always be taken from http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- udp.c | 606 -- 1 file changed, 410 insertions(+), 196 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 514c1e9..4ddd8e6 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -92,10 +92,8 @@ #include linux/errno.h #include linux/timer.h #include linux/mm.h #include linux/inet.h -#include linux/ipv6.h #include linux/netdevice.h #include net/snmp.h -#include net/ip.h #include net/tcp_states.h #include net/protocol.h #include linux/skbuff.h @@ -121,7 +119,19 @@ DEFINE_RWLOCK(udp_hash_lock); /* Shared by v4/v6 udp. */ int udp_port_rover; -static int udp_v4_get_port(struct sock *sk, unsigned short snum) +/* the extensions for UDP-Lite (RFC 3828) */ +#include udplite.c + +/** + * __udp_get_port - find an unbound UDP(-Lite) port + * + * @sk: udp_sock + * @snum: port number to look up + * @udptable: hash list table, must be of UDP_HTABLE_SIZE + * @port_rover: pointer to record of last unallocated port + */ +int __udp_get_port(struct sock *sk, unsigned short snum, + struct hlist_head udptable[], int *port_rover) { struct hlist_node *node; struct sock *sk2; @@ -131,16 +141,16 @@ static int udp_v4_get_port(struct sock * if (snum == 0) { int best_size_so_far, best, result, i; - if (udp_port_rover sysctl_local_port_range[1] || - udp_port_rover sysctl_local_port_range[0]) - udp_port_rover = sysctl_local_port_range[0]; + if (*port_rover sysctl_local_port_range[1] || + *port_rover sysctl_local_port_range[0]) + *port_rover = sysctl_local_port_range[0]; best_size_so_far = 32767; - best = result = udp_port_rover; + best = result = *port_rover; for (i = 0; i UDP_HTABLE_SIZE; i++, result++) { struct hlist_head *list; int size; - list = udp_hash[result (UDP_HTABLE_SIZE - 1)]; + list = udptable[result (UDP_HTABLE_SIZE - 1)]; if (hlist_empty(list)) { if (result sysctl_local_port_range[1]) result = sysctl_local_port_range[0] + @@ -162,16 +172,16 @@ static int udp_v4_get_port(struct sock * result = sysctl_local_port_range[0] + ((result - sysctl_local_port_range[0]) (UDP_HTABLE_SIZE - 1)); - if (!udp_lport_inuse(result)) + if (! __udp_lport_inuse(result, udptable)) break; } if (i = (1 16) / UDP_HTABLE_SIZE) goto fail; gotit: - udp_port_rover = snum = result; + *port_rover = snum = result; } else { sk_for_each(sk2, node, - udp_hash[snum (UDP_HTABLE_SIZE - 1)]) { + udptable[snum (UDP_HTABLE_SIZE - 1)]) { struct inet_sock *inet2 = inet_sk(sk2); if (inet2-num == snum @@ -189,7 +199,7 @@ gotit: } inet-num = snum; if (sk_unhashed(sk)) { - struct hlist_head *h = udp_hash[snum (UDP_HTABLE_SIZE - 1)]; + struct hlist_head *h = udptable[snum (UDP_HTABLE_SIZE - 1)]; sk_add_node(sk, h); sock_prot_inc_use(sk-sk_prot); @@ -202,6 +212,11 @@ fail: return 1; } +static __inline__ int udp_v4_get_port(struct sock *sk, unsigned short snum) +{ + return __udp_get_port(sk, snum, udp_hash, udp_port_rover); +} + static void udp_v4_hash(struct sock *sk) { BUG(); @@ -217,18 +232,24 @@ static void udp_v4_unhash(struct sock *s write_unlock_bh(udp_hash_lock); } -/* UDP is nearly always wildcards out the wazoo, it makes no sense to try - * harder than this. -DaveM +/** + * __udp_lookup - find UDP(-Lite) socket + * + * @udptable: hash list table, must be of UDP_HTABLE_SIZE + * + * UDP nearly always wildcards out the wazoo, it makes no sense to try + * harder than this. -DaveM */ -static struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport, - u32 daddr, u16 dport, int dif) +struct sock *__udp_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif, +
Re: [take14 0/3] kevent: Generic event handling mechanism.
On 8/28/06, Nicholas Miell [EMAIL PROTECTED] wrote: Also complicated is the case where waiting threads have different priorities, different timeouts, and different minimum event counts -- how do you decide which thread gets events first? What if the decisions are different depending on whether you want to maximize throughput or interactivity? BTW, what is the intended use of the min event count parameter? The obvious reason I can see, avoiding waking up a thread too often with few queued events, would imo be handled cleaner by just passing a parameter telling the kernel to try to queue more events. With a min event count you'd have to use a rather low timeout to ensure that events get handled within a resonable time. Rakshasa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)
On 8/28/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [NET/IPv4]: update for udp.c only, to match 2.6.18-rc4-mm3 This is an update only, as the previous patch can not cope with recent changes to udp.c (all other files remain the same). Up-to-date, complete patches can always be taken from http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- udp.c | 606 -- 1 file changed, 410 insertions(+), 196 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 514c1e9..4ddd8e6 100644 @@ -731,12 +801,12 @@ out: } /* - * IOCTL requests applicable to the UDP protocol + * IOCTL requests applicable to the UDP(-Lite) protocol */ Avoid these changes to reduce patch file size, please - + int udp_ioctl(struct sock *sk, int cmd, unsigned long arg) { - switch(cmd) + switch(cmd) Ditto -/* - * This should be easy, if there is something there we - * return it, otherwise we block. +/** + * udp_recvmsg - generic UDP/-Lite receive processing + * + * This routine is udplite-aware and works for both protocols. @@ -980,7 +1055,11 @@ #else #endif } -/* returns: +/** + * udp_queue_rcv_skb - receive queue processing + * + * This routine is udplite-aware and works on both sockets. if (up-encap_type) { @@ -1010,7 +1087,7 @@ static int udp_queue_rcv_skb(struct sock * If it's an encapsulateed packet, then pass it to the * IPsec xfrm input and return the response * appropriately. Otherwise, just fall through and -* pass this up the UDP socket. +* pass this up the UDP/-Lite socket. */ - /* FALLTHROUGH -- it's a UDP Packet */ + /* FALLTHROUGH -- it's a UDP/-Lite Packet */ } /* - * All we need to do is get the socket, and then do a checksum. + * All we need to do is get the socket, and then do a checksum. */ - Huh, what was this one? trailing whitespace? Can you leave this for another cset doing just the reformatting? @@ -1219,7 +1363,7 @@ static int udp_destroy_sock(struct sock } /* - * Socket option code for UDP + * Socket option code for UDP and UDP-Lite (shared). */ #endif + /** - * udp_poll - wait for a UDP event. + * udp_poll - wait for a UDP(-Lite) event. See next comment * @file - file struct * @sock - socket * @wait - poll table @@ -1348,11 +1528,14 @@ #endif * then it could get return from select indicating data available * but then block when reading it. Add special case code * to work around these arguably broken applications. + * + * The routine is udplite-aware and works for both protocols. I guess these comments can go as well, as one can quickly realise the functions handles UDP lite with all the IS_UDPLITE(sk) calls and is_{udp}lite variables :-) */ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait) { unsigned int mask = datagram_poll(file, sock, wait); struct sock *sk = sock-sk; + int is_lite = IS_UDPLITE(sk); Regards, - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)
Quoting Arnaldo Carvalho de Melo: | Avoid these changes to reduce patch file size, please I apologize for the bad patch format - I am revising the entire patch to improve readability and will resend. - Gerrit - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: operation without eeprom?
On Sunday 27 August 2006 19:50, Lennert Buytenhek wrote: Hi, There are a couple of ARM boards out there with on-board e1000s but without any kind of eeprom. The boot loader and kernel board support code have all the info necessary to configure the e1000, but the e1000 driver bombs out because there isn't an eeprom connected -- how are we supposed to deal with this situation? u-boot, which uses modified versions of the linux e1000 drivers, handles this special cases with a bunch of platform-specific #ifdefs http://www.denx.de/cgi-bin/gitweb.cgi?p=u-boot.git;a=blob;h=927acbb26737a20e02962f67047e192545a870a1;hb=16850919ff8666f20d047cb83b4ee77581336515;f=drivers/e1000.c I fear that working in general across the e1000 product line without an eeprom might not work so well. We've had 82545's work OK without and eeprom, but the 82572 did not work so well. Some chips appear to work OK with pure software config, whereas others might need some special setup parameters that would work best at chip power-up via the eeprom. As a general solution (with fewer ifdefs than the u-boot solution), it might be nice to have a read_eeprom_virtual(..) method in the driver where one could supply a binary blob to the driver instead of having a real eeprom. All of the driver code that relies on the eeprom could work like normal. I've been toying with this under u-boot for a custom ARM board without an eeprom too, though it does have the side-effect of bloating the u-boot driver a bit with the fake eeprom data that's really useless after boot (it's mostly 0x's though, so you could totally optimize it.) - Brent - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)
On 8/28/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Quoting Arnaldo Carvalho de Melo: | Avoid these changes to reduce patch file size, please I apologize for the bad patch format - I am revising the entire patch to improve readability and will resend. No need for apologies and thanks for taking my suggestions into account. - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name
On 8/27/06, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: Jeremy reported that a while back too. I do not know what is causing it and as far as I know no net developers have yet looked into it. It went away with -rc4-mm[23] for me... I just reproduced it with rc4-mm3, ipw2200 after coming out of suspend. I'll apply the patch from David Miller and see if anything shows out in the log. regards, Benoit - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name
On 8/27/06, David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Sun, 27 Aug 2006 00:19:43 -0700 Jeremy reported that a while back too. I do not know what is causing it and as far as I know no net developers have yet looked into it. A debugging patch like this one should help figure out the culprit. If we don't see the gibberish netdevice name printed in the kernel logs, then likely something is corrupting the netdevice structure or the memory holding the name. diff --git a/net/core/dev.c b/net/core/dev.c index d4a1ec3..45f9b19 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -738,6 +738,11 @@ int dev_change_name(struct net_device *d if (!dev_valid_name(newname)) return -EINVAL; +#if 1 + printk([%s:%d]: Changing netdevice name from [%s] to [%s]\n, + current-comm, current-pid, + dev-name, newname); +#endif if (strchr(newname, '%')) { err = dev_alloc_name(dev, newname); Dan, do you have any idea why NetworkManager from Ubuntu 6.06.1 would be corrupting network device names on recent MM kernels? I haven't seen this happening with Ubuntu's kernels. If you like, I can send you my kernel .config file. Here's what I get: [NetworkManager:5399]: Changing netdevice name from [eth0] to [��] ��: link down ADDRCONF(NETDEV_UP): ��: link is not ready [NetworkManager:5399]: Changing netdevice name from [eth1] to [7G*e] 7G*e: no IPv6 routers present Here's the result of strace -f -F -v -a50 NetworkManager: execve(./NetworkManager.bak, [./NetworkManager.bak], [TERM=linux, SHELL=/bin/bash, HUSHLOGIN=FALSE, OLDPWD=/home/miles, USER=root, LS_COLORS=no=00:fi=00:di=01;34:l..., SUDO_USER=miles, SUDO_UID=1000, PATH=/usr/local/sbin:/usr/local/..., MAIL=/var/mail/miles, PWD=/usr/sbin, LANG=en_US.UTF-8, HISTCONTROL=ignoredups, SUDO_COMMAND=/bin/bash, HOME=/home/miles, SHLVL=2, LANGUAGE=en_US:en_GB:en, LOGNAME=root, LESSOPEN=| /usr/bin/lesspipe %s, SUDO_GID=1000, LESSCLOSE=/usr/bin/lesspipe %s %..., _=/usr/bin/strace]) = 0 uname({sysname=Linux, nodename=Dumbleedor, release=2.6.18-rc4-mm3, version=#32 Sun Aug 27 01:01:35 PDT 2006, machine=i686}) = 0 brk(0)= 0x808b000 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f8a000 access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such file or directory) old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f88000 access(/etc/ld.so.preload, R_OK)= -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY)= 3 fstat64(3, {st_dev=makedev(3, 10), st_ino=195836, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=216, st_size=102666, st_atime=2006/08/28-00:34:02, st_mtime=2006/08/25-22:58:56, st_ctime=2006/08/25-22:58:56}) = 0 old_mmap(NULL, 102666, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6e000 close(3) = 0 access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such file or directory) open(/usr/lib/libhal.so.1, O_RDONLY)= 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\36\0..., 512) = 512 fstat64(3, {st_dev=makedev(3, 10), st_ino=830757, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=64, st_size=30448, st_atime=2006/08/28-00:34:02, st_mtime=2006/05/22-08:09:25, st_ctime=2006/07/05-21:10:31}) = 0 old_mmap(NULL, 33464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f65000 old_mmap(0xb7f6d000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0xb7f6d000 close(3) = 0 access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such file or directory) open(/lib/libiw.so.28, O_RDONLY)= 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\25..., 512) = 512 fstat64(3, {st_dev=makedev(3, 10), st_ino=814477, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=48, st_size=23228, st_atime=2006/08/28-00:34:02, st_mtime=2006/02/09-15:38:09, st_ctime=2006/07/05-21:19:53}) = 0 old_mmap(NULL, 26188, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f5e000 old_mmap(0xb7f64000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0xb7f64000 close(3) = 0 access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such file or directory) open(/usr/lib/libnl.so.1, O_RDONLY) = 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\236..., 512) = 512 fstat64(3, {st_dev=makedev(3, 10), st_ino=831039, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=368, st_size=180452, st_atime=2006/08/28-00:34:03, st_mtime=2006/03/22-05:46:12, st_ctime=2006/03/29-09:41:12}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
Re: [PATCH 1/4] net: VM deadlock avoidance framework
On Mon, August 28, 2006 12:22, Peter Zijlstra said: On Sat, 2006-08-26 at 04:37 +0200, Indan Zupancic wrote: Why not 'emergency'? Looks like 'emerge' with a typo now. ;-) hehe, me lazy, you gentoo ;-) sed -i -e 's/emerg/emregency/g' -e 's/EMERG/EMERGENCY/g' *.patch I used it for a while, long ago, until I figured out that there were better alternatives. I didn't like the overly complex init and portage system though. But if you say emerg it will sound as emerge, and all other fields in that struct aren't abbreviated either and often longer, so it just makes more sense to use the full name. @@ -391,6 +391,7 @@ enum sock_flags { SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ + SOCK_VMIO, /* promise to never block on receive */ It might be used for IO related to the VM, but that doesn't tell _what_ it does. It also does much more than just not blocking on receive, so overal, aren't both the vmio name and the comment slightly misleading? I'm so having trouble with this name; I had SOCK_NONBLOCKING for a while, but that is a very bad name because nonblocking has this well defined meaning when talking about sockets, and this is not that. Hence I came up with the VMIO, because that is the only selecting criteria for being special. - I'll fix up the comment. It's nice and short, but it might be weird if someone after a while finds another way of using this stuff. And it's relation to 'emergency' looks unclear. So maybe calling both the same makes most sense, no matter how you name it. @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table); static char *zone_names[MAX_NR_ZONES] = { DMA, DMA32, Normal, HighMem }; int min_free_kbytes = 1024; +int var_free_kbytes; Using var_free_pages makes the code slightly simpler, as all that needless convertion isn't needed anymore. Perhaps the same is true for min_free_kbytes... 't seems I'm a bit puzzled as to what you mean here. I mean to store the variable reserve in pages instead of kilobytes. Currently you're converting from the one to the other both when setting and when using the value. That doesn't make much sense and can be avoided by storing the value in pages from the start. +noskb: + /* Attempt emergency allocation when RX skb. */ + if (!(flags SKB_ALLOC_RX)) + goto out; So only incoming skb allocation is guaranteed? What about outgoing skbs? What am I missing? Or can we sleep then, and increasing var_free_kbytes is sufficient to guarantee it? -sk_allocation |= __GFP_EMERGENCY - will take care of the outgoing packets. Also, since one only sends a limited number of packets out and then will wait for answers, we do not need to worry about fragmentation issues that much in this case. Ah, missed that one. Didn't knew that the alloc flags were stored in the sock. +static void emerg_free_skb(struct kmem_cache *cache, void *objp) +{ + free_page((unsigned long)objp); + emerg_rx_pages_dec(); +} + /* *Free an skbuff by memory without cleaning the state. */ @@ -326,17 +373,21 @@ void kfree_skbmem(struct sk_buff *skb) { struct sk_buff *other; atomic_t *fclone_ref; + void (*free_skb)(struct kmem_cache *, void *); skb_release_data(skb); + + free_skb = skb-emerg ? emerg_free_skb : kmem_cache_free; + switch (skb-fclone) { case SKB_FCLONE_UNAVAILABLE: - kmem_cache_free(skbuff_head_cache, skb); + free_skb(skbuff_head_cache, skb); break; case SKB_FCLONE_ORIG: fclone_ref = (atomic_t *) (skb + 2); if (atomic_dec_and_test(fclone_ref)) - kmem_cache_free(skbuff_fclone_cache, skb); + free_skb(skbuff_fclone_cache, skb); break; case SKB_FCLONE_CLONE: @@ -349,7 +400,7 @@ void kfree_skbmem(struct sk_buff *skb) skb-fclone = SKB_FCLONE_UNAVAILABLE; if (atomic_dec_and_test(fclone_ref)) - kmem_cache_free(skbuff_fclone_cache, other); + free_skb(skbuff_fclone_cache, other); break; }; } I don't have the original code in front of me, but isn't it possible to add a goto free which has all the freeing in one place? That would get rid of the function pointer stuff and emerg_free_skb. perhaps, yes, however I prefer this one, it allows access to the size. What size are you talking about? What I had in mind is probably less readable, but it avoids a bunch of function calls and that indirect function call, so with luck it has less overhead and smaller object size: void kfree_skbmem(struct sk_buff *skb) { struct sk_buff *other; atomic_t *fclone_ref; struct kmem_cache *cache = skbuff_head_cache; struct sk_buff *free = skb; skb_release_data(skb); switch
Re: [PATCH 5/10] rt2x00: Register initialization fixes
On Sun, Aug 27, 2006 at 05:39:14PM +0200, Ivo van Doorn wrote: Various register initialization fixes to make the device work properly. This will fix the RX/TX issue for rt61pci. Signed-off-by Ivo van Doorn [EMAIL PROTECTED] --- diff -rU3 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c --- wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c 2006-08-27 16:11:40.0 +0200 +++ wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c 2006-08-27 16:17:02.0 +0200 @@ -1192,11 +1192,7 @@ rt2x00_register_write(rt2x00dev, RXCSR0, reg); rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00217223)); - - rt2x00_register_read(rt2x00dev, MACCSR1, reg); - rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1); - rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1); - rt2x00_register_write(rt2x00dev, MACCSR1, reg); + rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518)); rt2x00_register_read(rt2x00dev, MACCSR2, reg); rt2x00_set_field32(reg, MACCSR2_DELAY, 64); diff -rU3 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c --- wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c 2006-08-27 16:12:03.0 +0200 +++ wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c 2006-08-27 16:17:56.0 +0200 @@ -1249,6 +1249,7 @@ return -EBUSY; rt2x00_register_write(rt2x00dev, PWRCSR0, cpu_to_le32(0x3f3b3100)); + rt2x00_register_write(rt2x00dev, PCICSR, cpu_to_le32(0x03b8)); rt2x00_register_write(rt2x00dev, PSCSR0, cpu_to_le32(0x00020002)); rt2x00_register_write(rt2x00dev, PSCSR1, cpu_to_le32(0x0002)); @@ -1272,12 +1273,11 @@ rt2x00_set_field32(reg, RXCSR0_DISABLE_RX, 0); rt2x00_register_write(rt2x00dev, RXCSR0, reg); - rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223)); + rt2x00_register_write(rt2x00dev, GPIOCSR, cpu_to_le32(0xff00)); + rt2x00_register_write(rt2x00dev, TESTCSR, cpu_to_le32(0x00f0)); - rt2x00_register_read(rt2x00dev, MACCSR1, reg); - rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1); - rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1); - rt2x00_register_write(rt2x00dev, MACCSR1, reg); + rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223)); + rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518)); rt2x00_register_read(rt2x00dev, MACCSR2, reg); rt2x00_set_field32(reg, MACCSR2_DELAY, 64); Lots of magic numbers here...can we do something about that? John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [IPV6] ROUTE: Fix dst refcounting.
[IPV6] ROUTE: Fix dst reference counting in ip6_pol_route_lookup(). In ip6_pol_route_lookup(), when we finish backtracking at the top-level root entry, we need to hold it. Bug noticed by Mitsuru Chinen [EMAIL PROTECTED]. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 626ff7a..9d61ae9 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -510,8 +510,8 @@ restart: rt = fn-leaf; rt = rt6_device_match(rt, fl-oif, flags); BACKTRACK(fl-fl6_src); - dst_hold(rt-u.dst); out: + dst_hold(rt-u.dst); read_unlock_bh(table-tb6_lock); rt-u.dst.lastuse = jiffies; -- YOSHIFUJI Hideaki @ USAGI Project [EMAIL PROTECTED] GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 and 802.1ad/stacked vlan tagging
Hello Jesse, thank you for answering anyway. Though I think your answer covers only the obvious half of the problem. Indeed one might think that this solves the issue - as long as there are only linux kernels involved. Unfortunately my setup is a bit more complicated in terms of hardware. So I should have probably clarified the question this way: how do you configure the interface in a manner that packets with data length of 1500 get transferred, and not only 1496 ? I tried enlarging both real-device and first vlan interface mtu but that does not work out. I really thought that the visible device setting of mtu=1500 should have worked out and that the driver (or some code in between) should have corrected the allowed frame size to reflect the actual setup, not? Regards, Stephan PS: crossposted to both lists, list-members keep in mind I am not subscribed when answering! Thank you. On Mon, 28 Aug 2006 10:23:09 -0700 Brandeburg, Jesse [EMAIL PROTECTED] wrote: Stephan von Krawczynski wrote: Hello Jesse, sorry to bother you directly, but since you did the patch for my e1000 interrupt problem last time (February) I hope you have a short-hand idea for my current issue, too. I am trying to make stacked vlan tagging work under kernel 2.4 with e1000. Generally I do this on two boxes connected back-to-back: ifconfig eth0 up vconfig add eth0 4094 ifconfig eth0.4094 up vconfig add eth0.4094 1 ifconfig eth0.4094.1 192.168.1.1 netmask 255.255.255.0 broadcast 192.168.1.255 up (of course the second box gets another ip, lets assume .2). if you do a ping -s 1472 192.168.1.2 through the stacked vlan you see the packets vanish. With ping -s 1468 192.168.1.2 everything seems ok. I have the impression that the stacked vlans show some problem with mtu handling inside the e1000 driver. Mtu is set to 1500 but because stacking tags uses 4 bytes more the packets cannot use the full mtu. Any ideas what happens here? The packet is being dropped because it is longer than the allowed frame size for 1500 MTU. check ethtool -S eth0 mine shows rx_long_length_errors: 169 which indicates that you need to change your mtu on the stacked interface to 1496, at which point after I did: ip l s eth1.4094.1 mtu 1496 on both sides of my connection, everything was working. I think in this case it is just a configuration problem. When you stack vlans you have to account for the extra inserted length someplace and that place is by reducing the MTU. I'd appreciate it in the future if you could use [EMAIL PROTECTED] or netdev@vger.kernel.org for support questions like this because I'm not the only one who can answer questions (and I might have been on vacation! :-) ) Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] net: VM deadlock avoidance framework
On Mon, 2006-08-28 at 18:03 +0200, Indan Zupancic wrote: On Mon, August 28, 2006 12:22, Peter Zijlstra said: @@ -391,6 +391,7 @@ enum sock_flags { SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ +SOCK_VMIO, /* promise to never block on receive */ It might be used for IO related to the VM, but that doesn't tell _what_ it does. It also does much more than just not blocking on receive, so overal, aren't both the vmio name and the comment slightly misleading? I'm so having trouble with this name; I had SOCK_NONBLOCKING for a while, but that is a very bad name because nonblocking has this well defined meaning when talking about sockets, and this is not that. Hence I came up with the VMIO, because that is the only selecting criteria for being special. - I'll fix up the comment. It's nice and short, but it might be weird if someone after a while finds another way of using this stuff. And it's relation to 'emergency' looks unclear. So maybe calling both the same makes most sense, no matter how you name it. I've tried to come up with another use-case, but failed (of course that doesn't mean there is no). Also, I'm really past caring what the thing is called ;-) But if ppl object I guess its easy enough to run yet another sed command over the patches. @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table); static char *zone_names[MAX_NR_ZONES] = { DMA, DMA32, Normal, HighMem }; int min_free_kbytes = 1024; +int var_free_kbytes; Using var_free_pages makes the code slightly simpler, as all that needless convertion isn't needed anymore. Perhaps the same is true for min_free_kbytes... 't seems I'm a bit puzzled as to what you mean here. I mean to store the variable reserve in pages instead of kilobytes. Currently you're converting from the one to the other both when setting and when using the value. That doesn't make much sense and can be avoided by storing the value in pages from the start. right, will have a peek. void kfree_skbmem(struct sk_buff *skb) { struct sk_buff *other; atomic_t *fclone_ref; struct kmem_cache *cache = skbuff_head_cache; struct sk_buff *free = skb; skb_release_data(skb); switch (skb-fclone) { case SKB_FCLONE_UNAVAILABLE: goto free; case SKB_FCLONE_ORIG: fclone_ref = (atomic_t *) (skb + 2); if (atomic_dec_and_test(fclone_ref)){ cache = skbuff_fclone_cache; goto free; } break; case SKB_FCLONE_CLONE: fclone_ref = (atomic_t *) (skb + 1); other = skb - 1; /* The clone portion is available for * fast-cloning again. */ skb-fclone = SKB_FCLONE_UNAVAILABLE; if (atomic_dec_and_test(fclone_ref)){ cache = skbuff_fclone_cache; free = other; goto free; } break; }; return; free: if (!skb-emergency) kmem_cache_free(cache, free); else emergency_rx_free(free, kmem_cache_size(cache)); } Ah, like so, sure, that looks good. You can get rid of the memalloc_reserve and vmio_request_queues variables if you want, they aren't really needed for anything. If using them reduces the total code size I'd keep them though. I find my version easier to read, but that might just be the way my brain works. Maybe true, but I believe my version is more natural in the sense that it makes more clear what the code is doing. Less bookkeeping, more real work, so to speak. Ok, I'll have another look at it, perhaps my gray matter has shifted ;-) But after another look things seem a bit shaky, in the locking corner anyway. sk_adjust_memalloc() calls adjust_memalloc_reserve(), which changes var_free_kbytes and then calls setup_per_zone_pages_min(), which does the real work. But it reads min_free_kbytes without holding any locks. In mainline that's fine as the function is only called by the proc handler and in obscure memory hotplug stuff. But with your code it can also be called at any moment when a VMIO socket is made, which now races with the proc callback. More a theoretical than a real problem, but still slightly messy. Knew about that, hadn't made up my mind on a fix yet. Good spot never the less. Time to actually fix it I guess. adjust_memalloc_reserve() has no locking at all, while it might be called concurrently from different sources. Luckily sk_adjust_memalloc() is the only user, and which uses its own spinlock for synchronization, so things go well by accident now. It seems
[RFC][PATCHv2 2.6.18-rc4-mm3 2/3] net/ipv4: UDP and generic UDP(-Lite) processing
[Net/IPv4]: REVISED Modifications to the UDP module and generic UDP/-Lite processing. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- include/net/udp.h | 68 ++- net/ipv4/udp.c| 489 -- 2 files changed, 395 insertions(+), 162 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index 766fba1..77c5fb9 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -26,9 +26,48 @@ #include linux/list.h #include net/inet_sock.h #include net/sock.h #include net/snmp.h +#include net/ip.h +#include linux/ipv6.h #include linux/seq_file.h #define UDP_HTABLE_SIZE128 +#include net/udplite.h + +/** + * struct udp_skb_cb - UDP(-Lite) private variables + * + * @header: private variables used by IPv4/IPv6 + * @cscov: checksum coverage length (UDP-Lite only) + * @partial_cov: if set indicates partial csum coverage + */ +struct udp_skb_cb { + union { + struct inet_skb_parmh4; +#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) + struct inet6_skb_parm h6; +#endif + } header; + __u16 cscov; + __u8partial_cov; +}; +#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)-cb)) + +/* + * Generic checksumming routines for UDP(-Lite) v4 and v6 + */ +static inline u16 __udp_checksum_complete(struct sk_buff *skb) +{ + if (! UDP_SKB_CB(skb)-partial_cov) + return __skb_checksum_complete(skb); + return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)-cscov, + skb-csum)); +} + +static __inline__ int udp_checksum_complete(struct sk_buff *skb) +{ + return skb-ip_summed != CHECKSUM_UNNECESSARY + __udp_checksum_complete(skb); +} /* udp.c: This needs to be shared by v4 and v6 because the lookup *and hashing code needs to work with different AF's yet @@ -39,16 +78,24 @@ extern rwlock_t udp_hash_lock; extern int udp_port_rover; -static inline int udp_lport_inuse(u16 num) +/* + * XXX: since udp_v{4,6}_get_port use common code, these two functions + * will soon go + */ +static inline int __udp_lport_inuse(u16 num, struct hlist_head udptable[]) { struct sock *sk; struct hlist_node *node; - sk_for_each(sk, node, udp_hash[num (UDP_HTABLE_SIZE - 1)]) + sk_for_each(sk, node, udptable[num (UDP_HTABLE_SIZE - 1)]) if (inet_sk(sk)-num == num) return 1; return 0; } +static __inline__ int udp_lport_inuse(u16 num) +{ + return __udp_lport_inuse(num, udp_hash); +} /* Note: this must match 'valbool' in sock_setsockopt */ #define UDP_CSUM_NOXMIT1 @@ -75,21 +122,32 @@ extern unsigned int udp_poll(struct file poll_table *wait); DECLARE_SNMP_STAT(struct udp_mib, udp_statistics); -#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field) -#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field) -#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field) +/* + * SNMP statistics for UDP and UDP-Lite + */ +#define UDP_INC_STATS_USER(field, is_udplite) do { \ + if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \ + elseSNMP_INC_STATS_USER(udp_statistics, field); } while(0) +#define UDP_INC_STATS_BH(field, is_udplite) do { \ + if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \ + elseSNMP_INC_STATS_BH(udp_statistics, field);} while(0) +#define UDP_DEC_STATS_BH(field, is_udplite) do { \ + if (is_udplite) SNMP_DEC_STATS_BH(udplite_statistics, field); \ + elseSNMP_DEC_STATS_BH(udp_statistics, field);} while(0) /* /proc */ struct udp_seq_afinfo { struct module *owner; char*name; sa_family_t family; + struct hlist_head *hashtable; int (*seq_show) (struct seq_file *m, void *v); struct file_operations *seq_fops; }; struct udp_iter_state { sa_family_t family; + struct hlist_head *hashtable; int bucket; struct seq_operations seq_ops; }; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 514c1e9..5ca0db3 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -92,10 +92,8 @@ #include linux/errno.h #include linux/timer.h #include linux/mm.h #include linux/inet.h -#include linux/ipv6.h #include linux/netdevice.h #include net/snmp.h -#include net/ip.h #include net/tcp_states.h #include net/protocol.h #include linux/skbuff.h @@ -108,6 +106,8 @@ #include net/route.h #include net/inet_common.h #include net/checksum.h #include net/xfrm.h +/* the extensions
[RFC][PATCHv2 2.6.18-rc4-mm3 1/3] net/ipv4: UDP-Lite extensions
[Net/IPv4]: REVISED UDP-Lite standalone support and shared UDP/-Lite socket structure. This is in principle the same patch as posted earlier, with the difference that all whitespace changes have been removed; in addition, statements have been re-ordered so as to give a better-readable patch. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- include/linux/udp.h | 11 ++ include/net/udplite.h | 35 net/ipv4/udplite.c| 209 ++ 3 files changed, 255 insertions(+) diff --git a/include/linux/udp.h b/include/linux/udp.h index 90223f0..1b7cf10 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -50,12 +50,23 @@ struct udp_sock { * when the socket is uncorked. */ __u16len; /* total length of pending frames */ + /* +* Fields specific to UDP-Lite. +*/ + __u16pcslen; + __u16pcrlen; +/* indicator bits used by pcflag: */ +#define UDPLITE_BIT 0x1 /* set by udplite proto init function */ +#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */ +#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt*/ + __u8 pcflag;/* marks socket as UDP-Lite if 0*/ }; static inline struct udp_sock *udp_sk(const struct sock *sk) { return (struct udp_sock *)sk; } +#define IS_UDPLITE(__sk) (udp_sk(__sk)-pcflag) #endif diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c new file mode 100644 index 000..3911403 --- /dev/null +++ b/net/ipv4/udplite.c @@ -0,0 +1,209 @@ +/* + * UDPLITE An implementation of the UDP-Lite protocol (RFC 3828). + * + * Version:$Id: udplite.c,v 1.22 2006/08/22 13:01:52 gerrit Exp gerrit $ + * + * Authors:Gerrit Renker [EMAIL PROTECTED] + * + * Changes: + * Fixes: + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +struct hlist_head udplite_hash[UDP_HTABLE_SIZE]; +intudplite_port_rover; +DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics) __read_mostly; + +/* these functions are called by UDP-Lite with protocol-specific parameters */ +static int __udp_get_port(struct sock *, unsigned short, + struct hlist_head *, int *); +static struct sock *__udp_lookup(u32 , u16, u32, u16, int, struct hlist_head *); +static int __udp_mcast_deliver(struct sk_buff *, struct udphdr *, + u32, u32, struct hlist_head * ); +static int __udp_common_rcv(struct sk_buff *, int is_udplite); +static void__udp_err(struct sk_buff *, u32, struct hlist_head *); +#ifdef CONFIG_PROC_FS +static int udp4_seq_show(struct seq_file *, void *); +#endif + +/* + * Designate sk as UDP-Lite socket + */ +static inline int udplite_sk_init(struct sock *sk) +{ + udp_sk(sk)-pcflag = UDPLITE_BIT; + return 0; +} + +static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum) +{ + return __udp_get_port(sk, snum, udplite_hash, udplite_port_rover); +} + +static __inline__ struct sock *udplite_v4_lookup(u32 saddr, u16 sport, +u32 daddr, u16 dport, int dif) +{ + return __udp_lookup(saddr, sport, daddr, dport, dif, udplite_hash); +} + +static __inline__ int udplite_v4_mcast_deliver(struct sk_buff *skb, + struct udphdr *uh, u32 saddr, u32 daddr) +{ + return __udp_mcast_deliver(skb, uh, saddr, daddr, udplite_hash); +} + +__inline__ int udplite_rcv(struct sk_buff *skb) +{ + return __udp_common_rcv(skb, 1); +} + +__inline__ void udplite_err(struct sk_buff *skb, u32 info) +{ + return __udp_err(skb, info, udplite_hash); +} + +static int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh, +unsigned short len, u32 saddr, u32 daddr) +{ + u16 cscov; + +/* In UDPv4 a zero checksum means that the transmitter generated no + * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets + * with a zero checksum field are illegal. */ + if (uh-check == 0) { + LIMIT_NETDEBUG(KERN_DEBUG UDPLITE: zeroed csum field + (%d.%d.%d.%d:%d - %d.%d.%d.%d:%d)\n, NIPQUAD(saddr), + ntohs(uh-source), NIPQUAD(daddr), ntohs(uh-dest)); + return 0; + } + +UDP_SKB_CB(skb)-partial_cov = 0; +cscov = ntohs(uh-len); + + if (cscov == 0) /* Indicates that full coverage is required. */ + cscov =
[PATCH 1/9] sky2: remove cloned/pskb_expand_head check
The code to handle cloned skb overwriting is unnecessary in the sky2 driver and is buggy. The bug is that pskb_expand_head can change the skb but the driver has already mapped in the header. Since the sky2 hardware doesn't need to overwrite memory, the buggy code can just be removed; it was mistakenly copied from the tg3 driver. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-25 16:00:28.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-25 16:01:33.0 -0700 @@ -1239,13 +1239,6 @@ /* Check for TCP Segmentation Offload */ mss = skb_shinfo(skb)-gso_size; if (mss != 0) { - /* just drop the packet if non-linear expansion fails */ - if (skb_header_cloned(skb) - pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) { - dev_kfree_skb(skb); - goto out_unlock; - } - mss += ((skb-h.th-doff - 5) * 4); /* TCP options */ mss += (skb-nh.iph-ihl * 4) + sizeof(struct tcphdr); mss += ETH_HLEN; @@ -1341,7 +1334,6 @@ sky2_put_idx(hw, txqaddr[sky2-port], sky2-tx_prod); -out_unlock: spin_unlock(sky2-tx_lock); dev-trans_start = jiffies; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] sky2: version 1.7
Change version number for this bundle. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-22 14:55:42.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-22 14:55:46.0 -0700 @@ -50,7 +50,7 @@ #include sky2.h #define DRV_NAME sky2 -#define DRV_VERSION1.6 +#define DRV_VERSION1.7 #define PFXDRV_NAME /* -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] sky2: pci post bug
Make sure that PCI write occurs before the delay. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:17.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-28 10:00:20.0 -0700 @@ -531,6 +531,7 @@ reg1 |= phy_power[port]; sky2_pci_write32(hw, PCI_DEV_REG1, reg1); + sky2_pci_read32(hw, PCI_DEV_REG1); udelay(100); } @@ -766,9 +767,10 @@ /* Update chip's next pointer */ static inline void sky2_put_idx(struct sky2_hw *hw, unsigned q, u16 idx) { + q = Y2_QADDR(q, PREF_UNIT_PUT_IDX); wmb(); - sky2_write16(hw, Y2_QADDR(q, PREF_UNIT_PUT_IDX), idx); - mmiowb(); + sky2_write16(hw, q, idx); + sky2_read16(hw, q); } -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] sky2: MSI test timing
The test for MSI IRQ could have timing issues. The PCI write needs to be pushed out before waiting, and the wait queue should be initialized before the IRQ. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-25 16:05:10.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-25 16:05:14.0 -0700 @@ -3189,6 +3189,8 @@ struct pci_dev *pdev = hw-pdev; int err; + init_waitqueue_head (hw-msi_wait); + sky2_write32(hw, B0_IMSK, Y2_IS_IRQ_SW); err = request_irq(pdev-irq, sky2_test_intr, IRQF_SHARED, DRV_NAME, hw); @@ -3198,10 +3200,8 @@ return err; } - init_waitqueue_head (hw-msi_wait); - sky2_write8(hw, B0_CTST, CS_ST_SW_IRQ); - wmb(); + sky2_read8(hw, B0_CTST); wait_event_timeout(hw-msi_wait, hw-msi_detected, HZ/10); -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] sky2: optimize checksum offload information
Since many packets have the same checksum starting offset and insertion location; the driver can save the last information and only tell hardware when it changes. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:08.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-28 10:00:13.0 -0700 @@ -1280,12 +1280,17 @@ if (skb-nh.iph-protocol == IPPROTO_UDP) ctrl |= UDPTCP; - le = get_tx_le(sky2); - le-tx.csum.start = cpu_to_le16(hdr); - le-tx.csum.offset = cpu_to_le16(offset); - le-length = 0; /* initial checksum value */ - le-ctrl = 1; /* one packet */ - le-opcode = OP_TCPLISW | HW_OWNER; + if (hdr != sky2-tx_csum_start || offset != sky2-tx_csum_offset) { + sky2-tx_csum_start = hdr; + sky2-tx_csum_offset = offset; + + le = get_tx_le(sky2); + le-tx.csum.start = cpu_to_le16(hdr); + le-tx.csum.offset = cpu_to_le16(offset); + le-length = 0; /* initial checksum value */ + le-ctrl = 1; /* one packet */ + le-opcode = OP_TCPLISW | HW_OWNER; + } } le = get_tx_le(sky2); --- sky2.orig/drivers/net/sky2.h2006-08-28 09:59:36.0 -0700 +++ sky2/drivers/net/sky2.h 2006-08-28 10:00:13.0 -0700 @@ -1843,6 +1843,8 @@ u32 tx_addr64; u16 tx_pending; u16 tx_last_mss; + u16 tx_csum_start; + u16 tx_csum_offset; struct ring_info *rx_ring cacheline_aligned_in_smp; struct sky2_rx_le*rx_le; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] sky2: dont use force status bit
Don't use force status bit. It was never implemented on all chips, or has no impact. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-25 16:02:27.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-25 16:05:10.0 -0700 @@ -1192,7 +1192,6 @@ struct sky2_tx_le *le = NULL; struct tx_ring_info *re; unsigned i, len; - int avail; dma_addr_t mapping; u32 addr64; u16 mss; @@ -1328,12 +1327,8 @@ re-idx = sky2-tx_prod; le-ctrl |= EOP; - avail = tx_avail(sky2); - if (mss != 0 || avail TX_MIN_PENDING) { - le-ctrl |= FRC_STAT; - if (avail = MAX_SKB_TX_LE) - netif_stop_queue(dev); - } + if (tx_avail(sky2) = MAX_SKB_TX_LE) + netif_stop_queue(dev); sky2_put_idx(hw, txqaddr[sky2-port], sky2-tx_prod); --- sky2.orig/drivers/net/sky2.h2006-08-25 16:00:28.0 -0700 +++ sky2/drivers/net/sky2.h 2006-08-25 16:05:10.0 -0700 @@ -1748,7 +1748,6 @@ INIT_SUM= 13, LOCK_SUM= 14, INS_VLAN= 15, - FRC_STAT= 16, EOP = 17, }; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Add to MAINTAINERS file
This patch adds Jim Lewis to the MAINTAINERS file for the Spidernet network driver. Signed-off-by: James K Lewis [EMAIL PROTECTED] --- MAINTAINERS |6 ++ 1 file changed, 6 insertions(+) Index: linux-2.6.18-rc2/MAINTAINERS === --- linux-2.6.18-rc2.orig/MAINTAINERS 2006-08-21 16:59:25.0 -0500 +++ linux-2.6.18-rc2/MAINTAINERS2006-08-21 17:19:14.0 -0500 @@ -2702,6 +2702,12 @@ M: [EMAIL PROTECTED] L: linux-kernel@vger.kernel.org ? S: Supported +SPIDERNET NETWORK DRIVER for CELL +P: Jim Lewis +M: [EMAIL PROTECTED] +L: netdev@vger.kernel.org +S: Supported + SRM (Alpha) environment access P: Jan-Benedict Glaw M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] sky2 1.7 non-critical bug fixes
These are a set of non-critical fixes to the sky2 driver. 1. cloned skb tso bug fix 2. netdev_alloc_skb 3. don't use force status on transmit 4. MSI pci posting bug 5. TSO segment size optimization 6. checksum offload optimization 7. power up PHY only on network open 8. pci post bug before delay 9. version 1.7 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] sky2: TSO mss optimization
The MSS in the transmit engine only has to change if TSO mtu changes. This means less commands to the chip when mixing TSO and regular data. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:07.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-28 10:00:08.0 -0700 @@ -1244,15 +1244,15 @@ mss += ((skb-h.th-doff - 5) * 4); /* TCP options */ mss += (skb-nh.iph-ihl * 4) + sizeof(struct tcphdr); mss += ETH_HLEN; - } - if (mss != sky2-tx_last_mss) { - le = get_tx_le(sky2); - le-tx.tso.size = cpu_to_le16(mss); - le-tx.tso.rsvd = 0; - le-opcode = OP_LRGLEN | HW_OWNER; - le-ctrl = 0; - sky2-tx_last_mss = mss; + if (mss != sky2-tx_last_mss) { + le = get_tx_le(sky2); + le-tx.tso.size = cpu_to_le16(mss); + le-tx.tso.rsvd = 0; + le-opcode = OP_LRGLEN | HW_OWNER; + le-ctrl = 0; + sky2-tx_last_mss = mss; + } } ctrl = 0; @@ -1320,7 +1320,7 @@ le-opcode = OP_BUFFER | HW_OWNER; fre = sky2-tx_ring - + RING_NEXT((re - sky2-tx_ring) + i, TX_RING_SIZE); + + RING_NEXT((re - sky2-tx_ring) + i, TX_RING_SIZE); pci_unmap_addr_set(fre, mapaddr, mapping); } -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] sky2: power down PHY when not up
To save power, don't enable power to the PHY until device is brought up. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:13.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-28 10:00:17.0 -0700 @@ -195,7 +195,6 @@ static void sky2_set_power_state(struct sky2_hw *hw, pci_power_t state) { u16 power_control; - u32 reg1; int vaux; pr_debug(sky2_set_power_state %d\n, state); @@ -228,20 +227,9 @@ else sky2_write8(hw, B2_Y2_CLK_GATE, 0); - /* Turn off phy power saving */ - reg1 = sky2_pci_read32(hw, PCI_DEV_REG1); - reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD); - - /* looks like this XL is back asswards .. */ - if (hw-chip_id == CHIP_ID_YUKON_XL hw-chip_rev 1) { - reg1 |= PCI_Y2_PHY1_COMA; - if (hw-ports 1) - reg1 |= PCI_Y2_PHY2_COMA; - } - sky2_pci_write32(hw, PCI_DEV_REG1, reg1); - udelay(100); - if (hw-chip_id == CHIP_ID_YUKON_EC_U) { + u32 reg1; + sky2_pci_write32(hw, PCI_DEV_REG3, 0); reg1 = sky2_pci_read32(hw, PCI_DEV_REG4); reg1 = P_ASPM_CONTROL_MSK; @@ -253,15 +241,6 @@ case PCI_D3hot: case PCI_D3cold: - /* Turn on phy power saving */ - reg1 = sky2_pci_read32(hw, PCI_DEV_REG1); - if (hw-chip_id == CHIP_ID_YUKON_XL hw-chip_rev 1) - reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD); - else - reg1 |= (PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD); - sky2_pci_write32(hw, PCI_DEV_REG1, reg1); - udelay(100); - if (hw-chip_id == CHIP_ID_YUKON_XL hw-chip_rev 1) sky2_write8(hw, B2_Y2_CLK_GATE, 0); else @@ -285,7 +264,7 @@ sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF); } -static void sky2_phy_reset(struct sky2_hw *hw, unsigned port) +static void sky2_gmac_reset(struct sky2_hw *hw, unsigned port) { u16 reg; @@ -533,6 +512,28 @@ gm_phy_write(hw, port, PHY_MARV_INT_MASK, PHY_M_DEF_MSK); } +static void sky2_phy_power(struct sky2_hw *hw, unsigned port, int onoff) +{ + u32 reg1; + static const u32 phy_power[] + = { PCI_Y2_PHY1_POWD, PCI_Y2_PHY2_POWD }; + + /* looks like this XL is back asswards .. */ + if (hw-chip_id == CHIP_ID_YUKON_XL hw-chip_rev 1) + onoff = !onoff; + + reg1 = sky2_pci_read32(hw, PCI_DEV_REG1); + + if (onoff) + /* Turn off phy power saving */ + reg1 = ~phy_power[port]; + else + reg1 |= phy_power[port]; + + sky2_pci_write32(hw, PCI_DEV_REG1, reg1); + udelay(100); +} + /* Force a renegotiation */ static void sky2_phy_reinit(struct sky2_port *sky2) { @@ -1088,6 +1089,8 @@ if (!sky2-rx_ring) goto err_out; + sky2_phy_power(hw, port, 1); + sky2_mac_init(hw, port); /* Determine available ram buffer space (in 4K blocks). @@ -1421,7 +1424,7 @@ /* Stop more packets from being queued */ netif_stop_queue(dev); - sky2_phy_reset(hw, port); + sky2_gmac_reset(hw, port); /* Stop transmitter */ sky2_write32(hw, Q_ADDR(txqaddr[port], Q_CSR), BMU_STOP); @@ -1469,6 +1472,8 @@ imask = ~portirq_msk[port]; sky2_write32(hw, B0_IMSK, imask); + sky2_phy_power(hw, port, 0); + /* turn off LED's */ sky2_write16(hw, B0_Y2LED, LED_STAT_OFF); @@ -2403,7 +2408,7 @@ sky2_write32(hw, B0_HWE_IMSK, Y2_HWE_ALL_MASK); for (i = 0; i hw-ports; i++) - sky2_phy_reset(hw, i); + sky2_gmac_reset(hw, i); memset(hw-st_le, 0, STATUS_LE_BYTES); hw-st_idx = 0; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] sky2: use netdev_alloc_skb
Use netdev_alloc_skb for buffer allocation to allow for headroom. This prevents bugs in code paths that assume extra space at the front and makes sky2 behave like other drivers. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-08-25 16:01:33.0 -0700 +++ sky2/drivers/net/sky2.c 2006-08-25 16:02:27.0 -0700 @@ -954,14 +954,16 @@ /* * It appears the hardware has a bug in the FIFO logic that * cause it to hang if the FIFO gets overrun and the receive buffer - * is not aligned. ALso alloc_skb() won't align properly if slab - * debugging is enabled. + * is not 64 byte aligned. The buffer returned from netdev_alloc_skb is + * aligned except if slab debugging is enabled. */ -static inline struct sk_buff *sky2_alloc_skb(unsigned int size, gfp_t gfp_mask) +static inline struct sk_buff *sky2_alloc_skb(struct net_device *dev, +unsigned int length, +gfp_t gfp_mask) { struct sk_buff *skb; - skb = alloc_skb(size + RX_SKB_ALIGN, gfp_mask); + skb = __netdev_alloc_skb(dev, length + RX_SKB_ALIGN, gfp_mask); if (likely(skb)) { unsigned long p = (unsigned long) skb-data; skb_reserve(skb, ALIGN(p, RX_SKB_ALIGN) - p); @@ -997,7 +999,8 @@ for (i = 0; i sky2-rx_pending; i++) { struct ring_info *re = sky2-rx_ring + i; - re-skb = sky2_alloc_skb(sky2-rx_bufsize, GFP_KERNEL); + re-skb = sky2_alloc_skb(sky2-netdev, sky2-rx_bufsize, +GFP_KERNEL); if (!re-skb) goto nomem; @@ -1829,15 +1832,16 @@ * For small packets or errors, just reuse existing skb. * For larger packets, get new buffer. */ -static struct sk_buff *sky2_receive(struct sky2_port *sky2, +static struct sk_buff *sky2_receive(struct net_device *dev, u16 length, u32 status) { + struct sky2_port *sky2 = netdev_priv(dev); struct ring_info *re = sky2-rx_ring + sky2-rx_next; struct sk_buff *skb = NULL; if (unlikely(netif_msg_rx_status(sky2))) printk(KERN_DEBUG PFX %s: rx slot %u status 0x%x len %d\n, - sky2-netdev-name, sky2-rx_next, status, length); + dev-name, sky2-rx_next, status, length); sky2-rx_next = (sky2-rx_next + 1) % sky2-rx_pending; prefetch(sky2-rx_ring + sky2-rx_next); @@ -1848,11 +1852,11 @@ if (!(status GMR_FS_RX_OK)) goto resubmit; - if (length sky2-netdev-mtu + ETH_HLEN) + if (length dev-mtu + ETH_HLEN) goto oversize; if (length copybreak) { - skb = alloc_skb(length + 2, GFP_ATOMIC); + skb = netdev_alloc_skb(dev, length + 2); if (!skb) goto resubmit; @@ -1867,7 +1871,7 @@ } else { struct sk_buff *nskb; - nskb = sky2_alloc_skb(sky2-rx_bufsize, GFP_ATOMIC); + nskb = sky2_alloc_skb(dev, sky2-rx_bufsize, GFP_ATOMIC); if (!nskb) goto resubmit; @@ -1897,7 +1901,7 @@ if (netif_msg_rx_err(sky2) net_ratelimit()) printk(KERN_INFO PFX %s: rx error, status 0x%x length %d\n, - sky2-netdev-name, status, length); + dev-name, status, length); if (status (GMR_FS_LONG_ERR | GMR_FS_UN_SIZE)) sky2-net_stats.rx_length_errors++; @@ -1951,11 +1955,10 @@ switch (le-opcode ~HW_OWNER) { case OP_RXSTAT: - skb = sky2_receive(sky2, length, status); + skb = sky2_receive(dev, length, status); if (!skb) break; - skb-dev = dev; skb-protocol = eth_type_trans(skb, dev); dev-last_rx = jiffies; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/5] d80211: add ioctl to stop data frame tx
Hi Johannes, Johannes Berg [EMAIL PROTECTED] writes: On Tue, 2006-08-22 at 10:34 -0700, David Kimdon wrote: This ioctl is used when radar is delected on a channel. Data frames must stop but management frames must be allowed to continue for some time to communicate the channel switch to stations. Which does lead to the question: How are you detecting radar in userspace in the first place?? I've been working on merging Devicescape's 802.11h / radar detection implementation into the open source hostapd (and the wireless-dev kernel). Radar is initially detected by the low-level radio driver. Userspace gets notified of radar via calls to ieee80211_radar_status, which generates a fake management frame with a struct ieee80211_radar_info in it. Userspace is then responsible for handling the resultant activities, such as stopping transmission on that channel, selecting another channel, sending out channel switch announcements, changing channels, and remembering to block use of the old channel for the required time. I'll reply to your and Jiri's other question separately. Thanks, elliot - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1][SCTP]: Fix sctp_primitive_ABORT() call in sctp_close()
Dave, The recent SCTP CVE fix that went into 2.6.18 changed sctp_primitive_ABORT() callers to create an ABORT chunk and pass it as an arg instead of struct msghdr. While submitting this fix, i missed the other location in sctp_close() where this is called. Please apply this patch to 2.6.18 and it should also go into the stable series. Thanks Sridhar [SCTP]: Fix sctp_primitive_ABORT() call in sctp_close(). With the recent fix, the callers of sctp_primitive_ABORT() need to create an ABORT chunk and pass it as an argument rather than msghdr that was passed earlier. Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] diff --git a/net/sctp/socket.c b/net/sctp/socket.c index fde3f55..dab1594 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1289,9 +1289,13 @@ SCTP_STATIC void sctp_close(struct sock } } - if (sock_flag(sk, SOCK_LINGER) !sk-sk_lingertime) - sctp_primitive_ABORT(asoc, NULL); - else + if (sock_flag(sk, SOCK_LINGER) !sk-sk_lingertime) { + struct sctp_chunk *chunk; + + chunk = sctp_make_abort_user(asoc, NULL, 0); + if (chunk) + sctp_primitive_ABORT(asoc, chunk); + } else sctp_primitive_SHUTDOWN(asoc, NULL); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/10] rt2x00: Register initialization fixes
On Monday 28 August 2006 18:08, John W. Linville wrote: On Sun, Aug 27, 2006 at 05:39:14PM +0200, Ivo van Doorn wrote: Various register initialization fixes to make the device work properly. This will fix the RX/TX issue for rt61pci. Signed-off-by Ivo van Doorn [EMAIL PROTECTED] --- diff -rU3 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c --- wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c 2006-08-27 16:11:40.0 +0200 +++ wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c 2006-08-27 16:17:02.0 +0200 @@ -1192,11 +1192,7 @@ rt2x00_register_write(rt2x00dev, RXCSR0, reg); rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00217223)); - - rt2x00_register_read(rt2x00dev, MACCSR1, reg); - rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1); - rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1); - rt2x00_register_write(rt2x00dev, MACCSR1, reg); + rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518)); rt2x00_register_read(rt2x00dev, MACCSR2, reg); rt2x00_set_field32(reg, MACCSR2_DELAY, 64); diff -rU3 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c --- wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c 2006-08-27 16:12:03.0 +0200 +++ wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c 2006-08-27 16:17:56.0 +0200 @@ -1249,6 +1249,7 @@ return -EBUSY; rt2x00_register_write(rt2x00dev, PWRCSR0, cpu_to_le32(0x3f3b3100)); + rt2x00_register_write(rt2x00dev, PCICSR, cpu_to_le32(0x03b8)); rt2x00_register_write(rt2x00dev, PSCSR0, cpu_to_le32(0x00020002)); rt2x00_register_write(rt2x00dev, PSCSR1, cpu_to_le32(0x0002)); @@ -1272,12 +1273,11 @@ rt2x00_set_field32(reg, RXCSR0_DISABLE_RX, 0); rt2x00_register_write(rt2x00dev, RXCSR0, reg); - rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223)); + rt2x00_register_write(rt2x00dev, GPIOCSR, cpu_to_le32(0xff00)); + rt2x00_register_write(rt2x00dev, TESTCSR, cpu_to_le32(0x00f0)); - rt2x00_register_read(rt2x00dev, MACCSR1, reg); - rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1); - rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1); - rt2x00_register_write(rt2x00dev, MACCSR1, reg); + rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223)); + rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518)); rt2x00_register_read(rt2x00dev, MACCSR2, reg); rt2x00_set_field32(reg, MACCSR2_DELAY, 64); Lots of magic numbers here...can we do something about that? Only partially, there are currently a couple problems with: - Some of the above registers are documented, but there is however a main problem with current documentation from Ralink, not all register maps are correct. The driver indicates some other function for the register than our documentation claims. - The register not defined, the documentation gives no details and I cannot understand the meaning of the value from the original Ralink code. This is a main problem with the original code since its code contains a great deal of magical values, most have them have been analysed and its meaning or origin was determined. There are however still some magical values left. - In some other situations it is indeed possible to use the bitmaps as defined in the header, but those would not always explain the meaning of the value clearer (GPIOCSR is for example just a list of BIT0, BIT1, BIT2 etc). - Using the defines fom the headers and using the rt2x00_set_field32 would result in quite a lot of cpu_to_le32 calls, this would be a waste on big endian machines when the particular register is only used once. I will go over the values again and see which ones can be made clearer with comments, and if using the rt2x00_set/get_field32 can be used instead to make things clearer. Ivo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCHv2 2.6.18-rc4-mm3 3/3] net/ipv4: misc. support files
[Net/IPv4]: REVISED Miscellaneous changes which complete the v4 support for UDP-Lite. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- include/linux/in.h |1 + include/linux/socket.h |1 + include/net/snmp.h |2 ++ include/net/xfrm.h |2 ++ net/ipv4/af_inet.c | 15 ++- net/ipv4/proc.c| 16 ++-- net/ipv6/udp.c |1 + 7 files changed, 35 insertions(+), 3 deletions(-) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 877e5b3..43faef2 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1223,10 +1223,14 @@ static int __init init_ipv4_mibs(void) tcp_statistics[1] = alloc_percpu(struct tcp_mib); udp_statistics[0] = alloc_percpu(struct udp_mib); udp_statistics[1] = alloc_percpu(struct udp_mib); + udplite_statistics[0] = alloc_percpu(struct udp_mib); + udplite_statistics[1] = alloc_percpu(struct udp_mib); + if (! (net_statistics[0] net_statistics[1] ip_statistics[0] ip_statistics[1] tcp_statistics[0] tcp_statistics[1] - udp_statistics[0] udp_statistics[1])) + udp_statistics[0] udp_statistics[1] + udplite_statistics[0] udplite_statistics[1] ) ) return -ENOMEM; (void) tcp_mib_init(); @@ -1300,6 +1304,11 @@ #endif inet_register_protosw(q); /* +* Add UDP-Lite (RFC 3828) +*/ + udplite4_register(); + + /* * Set the ARP module up */ @@ -1367,6 +1376,8 @@ static int __init ipv4_proc_init(void) goto out_tcp; if (udp4_proc_init()) goto out_udp; + if (udplite4_proc_init()) + goto out_udplite; if (fib_proc_init()) goto out_fib; if (ip_misc_proc_init()) @@ -1376,6 +1387,8 @@ out: out_misc: fib_proc_exit(); out_fib: + udplite4_proc_exit(); +out_udplite: udp4_proc_exit(); out_udp: tcp4_proc_exit(); diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 9c6cbe3..608fe34 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -66,9 +66,10 @@ static int sockstat_seq_show(struct seq_ tcp_death_row.tw_count, atomic_read(tcp_sockets_allocated), atomic_read(tcp_memory_allocated)); seq_printf(seq, UDP: inuse %d\n, fold_prot_inuse(udp_prot)); + seq_printf(seq, UDPLITE: inuse %d\n, fold_prot_inuse(udplite_prot)); seq_printf(seq, RAW: inuse %d\n, fold_prot_inuse(raw_prot)); - seq_printf(seq, FRAG: inuse %d memory %d\n, ip_frag_nqueues, - atomic_read(ip_frag_mem)); + seq_printf(seq, FRAG: inuse %d memory %d\n, ip_frag_nqueues, +atomic_read(ip_frag_mem)); return 0; } @@ -304,6 +305,17 @@ static int snmp_seq_show(struct seq_file fold_field((void **) udp_statistics, snmp4_udp_list[i].entry)); + /* the UDP and UDP-Lite MIBs are the same */ + seq_puts(seq, \nUdpLite:); + for (i = 0; snmp4_udp_list[i].name != NULL; i++) + seq_printf(seq, %s, snmp4_udp_list[i].name); + + seq_puts(seq, \nUdpLite:); + for (i = 0; snmp4_udp_list[i].name != NULL; i++) + seq_printf(seq, %lu, + fold_field((void **) udplite_statistics, + snmp4_udp_list[i].entry) ); + seq_putc(seq, '\n'); return 0; } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index b9cc55c..b72540b 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1073,6 +1073,7 @@ static struct udp_seq_afinfo udp6_seq_af .owner = THIS_MODULE, .name = udp6, .family = AF_INET6, + .hashtable = udp_hash, .seq_show = udp6_seq_show, .seq_fops = udp6_seq_fops, }; diff --git a/include/linux/in.h b/include/linux/in.h index 94f557f..5ada82e 100644 --- a/include/linux/in.h +++ b/include/linux/in.h @@ -44,6 +44,7 @@ enum { IPPROTO_COMP = 108,/* Compression Header protocol */ IPPROTO_SCTP = 132,/* Stream Control Transport Protocol */ + IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */ IPPROTO_RAW = 255, /* Raw IP packets */ IPPROTO_MAX diff --git a/include/linux/socket.h b/include/linux/socket.h index 3614090..592b666 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -264,6 +264,7 @@ #define SOL_UDP 17 #define SOL_IPV6 41 #define SOL_ICMPV6 58 #define SOL_SCTP 132 +#define SOL_UDPLITE136 /* UDP-Lite (RFC 3828) */ #define SOL_RAW255 #define SOL_IPX256 #define SOL_AX25 257 diff --git a/include/net/snmp.h b/include/net/snmp.h index
Re: [RFC][PATCHv2 2.6.18-rc4-mm3 3/3] net/ipv4: misc. support files
[EMAIL PROTECTED] wrote: [Net/IPv4]: REVISED Miscellaneous changes which complete the v4 support for UDP-Lite. --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -467,6 +467,7 @@ u16 xfrm_flowi_sport(struct flowi *fl) switch(fl-proto) { case IPPROTO_TCP: case IPPROTO_UDP: + case IPPROTO_UDPLITE: case IPPROTO_SCTP: port = fl-fl_ip_sport; break; @@ -492,6 +493,7 @@ u16 xfrm_flowi_dport(struct flowi *fl) switch(fl-proto) { case IPPROTO_TCP: case IPPROTO_UDP: + case IPPROTO_UDPLITE: case IPPROTO_SCTP: port = fl-fl_ip_dport; break; You also need to adapt _decode_session[46] in xfrm[46]_policy.c for IPsec. While you're at it you might consider adjusting xt_tcpudp, xt_multiport, ipt_LOG and ip6t_LOG as well to get some basic netfilter support. I'm going to take care of connection tracking and NAT once this is in mainline. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name
On Mon, 28 Aug 2006 08:52:02 -0700 Miles Lane [EMAIL PROTECTED] wrote: On 8/27/06, David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Sun, 27 Aug 2006 00:19:43 -0700 Jeremy reported that a while back too. I do not know what is causing it and as far as I know no net developers have yet looked into it. A debugging patch like this one should help figure out the culprit. If we don't see the gibberish netdevice name printed in the kernel logs, then likely something is corrupting the netdevice structure or the memory holding the name. diff --git a/net/core/dev.c b/net/core/dev.c index d4a1ec3..45f9b19 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -738,6 +738,11 @@ int dev_change_name(struct net_device *d if (!dev_valid_name(newname)) return -EINVAL; +#if 1 + printk([%s:%d]: Changing netdevice name from [%s] to [%s]\n, + current-comm, current-pid, + dev-name, newname); +#endif if (strchr(newname, '%')) { err = dev_alloc_name(dev, newname); Dan, do you have any idea why NetworkManager from Ubuntu 6.06.1 would be corrupting network device names on recent MM kernels? I haven't seen this happening with Ubuntu's kernels. If you like, I can send you my kernel .config file. Here's what I get: grepping for `ioctl' gives: ioctl(9, SIOCGIWNAME, 0xbfe38d8c) = -1 EINVAL (Invalid argument) ioctl(9, SIOCETHTOOL, 0xbfe38d2c) = 0 ioctl(11, SIOCGIFHWADDR, {ifr_name=eth0, ???}) = -1 ENODEV (No such device) ioctl(11, SIOCGIFFLAGS, {ifr_name=eth0, ???}) = -1 ENODEV (No such device) Perhaps you could generate the strace output for 2.6.18-rc5, grep that for ioctl, look for differences? That initial SIOCGIWNAME failure is fishy. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPV6] ROUTE: Fix dst refcounting.
From: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Tue, 29 Aug 2006 01:46:49 +0900 (JST) [IPV6] ROUTE: Fix dst reference counting in ip6_pol_route_lookup(). In ip6_pol_route_lookup(), when we finish backtracking at the top-level root entry, we need to hold it. Bug noticed by Mitsuru Chinen [EMAIL PROTECTED]. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] Applied, thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] d80211: support more wireless command
the following patches are based on earlier patched. I did separate each set of command into its own patch with enhanced based on you comments. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/*: use SLAB_PANIC
From: Christoph Hellwig [EMAIL PROTECTED] Date: Mon, 28 Aug 2006 10:36:51 +0100 ipv6 can be modular, so panicing on a initialization failure is wrong. That may be the case, but he merely translated the code as it existed, he didn't change it to start panic()'ing it already did. It would be a seperate change to undo the panic() in the ipv6 code. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] d80211: report supported rates and channels in SIOCGIWRANGE
This patch modify d80211 to report more information like supported rate and channel in SIOCGIWRANGE command. Signed-off-by: Mohamed Abbas [EMAIL PROTECTED] diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c index 89a58e3..3d8156c 100644 --- a/net/d80211/ieee80211_ioctl.c +++ b/net/d80211/ieee80211_ioctl.c @@ -1566,6 +1566,10 @@ static int ieee80211_ioctl_giwrange(stru struct iw_point *data, char *extra) { struct iw_range *range = (struct iw_range *) extra; + int i,j,c,n; + int skip = 0; + struct ieee80211_local *local = dev-ieee80211_ptr; + struct ieee80211_hw_modes *bg = NULL; data-length = sizeof(struct iw_range); memset(range, 0, sizeof(struct iw_range)); @@ -1581,6 +1585,55 @@ static int ieee80211_ioctl_giwrange(stru range-min_frag = 256; range-max_frag = 2346; + j = 0; + for (i = 0; i local-num_curr_rates j IW_MAX_BITRATES; i++) { + struct ieee80211_rate *rate = local-curr_rates[i]; + + if (rate-flags IEEE80211_RATE_SUPPORTED) { + range-bitrate[j] = rate-rate * 10; + j++; + } + } + range-num_bitrates = j; + + c = 0; + for (i = 0; i local-hw-num_modes; i++) { + struct ieee80211_hw_modes *mode = local-hw-modes[i]; + + for (j = 0; + j mode-num_channels c IW_MAX_FREQUENCIES; j++) { + struct ieee80211_channel *chan = mode-channels[j]; + + /* skip any repeated bg channel */ + skip = 0; + if (bg + ((mode-mode == MODE_IEEE80211G) || + (mode-mode == MODE_IEEE80211B))) { + +for (n = 0; n bg-num_channels; n++) { + if (bg-channels[0].chan == chan-chan){ + skip = 1; + break; + } +} + } + + if (skip) +continue; + + range-freq[c].i = chan-chan; + range-freq[c].m = chan-freq * 10; + range-freq[c].e = 1; + c++; + } + if (!bg ((mode-mode == MODE_IEEE80211G) || + (mode-mode == MODE_IEEE80211B))) + bg = mode; + + } + range-num_channels = c; + range-num_frequency = c; + return 0; }
RE: e1000 and 802.1ad/stacked vlan tagging
Stephan von Krawczynski wrote: thank you for answering anyway. Though I think your answer covers only the obvious half of the problem. Indeed one might think that this solves the issue - as long as there are only linux kernels involved. Unfortunately my setup is a bit more complicated in terms of hardware. So I should have probably clarified the question this way: how do you configure the interface in a manner that packets with data length of 1500 get transferred, and not only 1496 ? I tried setting mtu 2000 and everything worked with the virtual device (both) mtu at 1500. If you are getting long_packet errors at the mtu settings you tried (you didn't mention which ones) then the hardware is dropping the packets due to being over 1522 bytes in length (including CRC). I tried enlarging both real-device and first vlan interface mtu but that does not work out. I really thought that the visible device setting of mtu=1500 should have worked out and that the driver (or some code in between) should have corrected the allowed frame size to reflect the actual setup, not? unfortunately I believe that your hardware MTU on the base interface MUST be adjusted in order to do stacked vlans because the vlan code doesn't fragment packets, it just inserts tags. The e1000 hardware is capable of inserting/stripping 1 level of tags without dropping overlong frames, but cannot seamlessly handle 1+n levels of inserted tags. Transmit, we don't care how long the frames are that are given to us (the driver doesn't enforce MTU on transmit) but on receive we have limited space so it is important that each frame fit into the allocated buffer (including CRC). Please try MTU 1508 on eth0 (base interface only), as that configuration worked for me. Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] d80211: diplay supported rates in readable format
This patch modify d80211 to report supported rates in readable format in iwlist scan command. Signed-off-by: Mohamed Abbas [EMAIL PROTECTED] diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c index a933d92..b2e45a4 100644 --- a/net/d80211/ieee80211_sta.c +++ b/net/d80211/ieee80211_sta.c @@ -2714,15 +2714,21 @@ ieee80211_sta_scan_result(struct net_dev current_ev = iwe_stream_add_point(current_ev, end_buf, iwe, buf); - p = buf; - p += sprintf(p, supp_rates=); - for (i = 0; i bss-supp_rates_len; i++) - p+= sprintf(p, %02x, bss-supp_rates[i]); - memset(iwe, 0, sizeof(iwe)); - iwe.cmd = IWEVCUSTOM; - iwe.u.data.length = strlen(buf); - current_ev = iwe_stream_add_point(current_ev, end_buf, iwe, - buf); + /* dispaly all support rates in readable format */ + p = current_ev + IW_EV_LCP_LEN; + iwe.cmd = SIOCGIWRATE; + /* Those two flags are ignored... */ + iwe.u.bitrate.fixed = iwe.u.bitrate.disabled = 0; + + for (i = 0; i bss-supp_rates_len; i++) { + iwe.u.bitrate.value = ((bss-supp_rates[i] + 0x7f) * 50); + p = iwe_stream_add_value(current_ev, p, + end_buf, iwe, IW_EV_PARAM_LEN); + } + /* Check if we added any rate */ + if((p - current_ev) IW_EV_LCP_LEN) + current_ev = p; kfree(buf); break;
[PATCH 5/7] d80211: indicate if unassociate/radio off status
This patch indicate unassociated and radio off status in name field Signed-off-by: Mohamed Abbas [EMAIL PROTECTED] diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c index 89a58e3..44b2698 100644 --- a/net/d80211/ieee80211_ioctl.c +++ b/net/d80211/ieee80211_ioctl.c @@ -1538,6 +1538,19 @@ static int ieee80211_ioctl_giwname(struc char *name, char *extra) { struct ieee80211_local *local = dev-ieee80211_ptr; + struct ieee80211_sub_if_data *sdata; + + sdata = IEEE80211_DEV_TO_SUB_IF(dev); + if (!local-conf.radio_enabled) { + strcpy(name, radio off); +return 0; + } else if (sdata-type == IEEE80211_IF_TYPE_STA) { + if ((sdata-u.sta.state != IEEE80211_ASSOCIATED) || + (sdata-u.sta.probereq_poll)) { + strcpy(name, unassociated); + return 0; + } + } switch (local-conf.phymode) { case MODE_IEEE80211A:
Re: [PATCH 1/1][SCTP]: Fix sctp_primitive_ABORT() call in sctp_close()
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Mon, 28 Aug 2006 12:11:36 -0700 The recent SCTP CVE fix that went into 2.6.18 changed sctp_primitive_ABORT() callers to create an ABORT chunk and pass it as an arg instead of struct msghdr. While submitting this fix, i missed the other location in sctp_close() where this is called. Please apply this patch to 2.6.18 and it should also go into the stable series. This shows why embargoes are detrimental to Linux kernel development. There might have been a chance of this being caught earlier if the original bug fix had been posted for review here on netdev. Now we have the situation where a -stable kernel release went out with the bogus version of the fix because it got _ZERO_ public review, and it is likely that several distributions have therefore shipped kernels with this problem too. This is unacceptable. Please everyone, learn from this, and understand that I will always ignore embargoed bug reports sent to me. Discuss networking bugs, no matter how severe, here on netdev from the beginning so we can fix things correctly and not via some private group of individuals. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] d80211: add support for SIOCSIWNICKN SIOCGIWNICKN
This patch modify d80211 to add nick wireless command Signed-off-by: Mohamed Abbas [EMAIL PROTECTED] diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index 0d2d79d..02242c6 100644 --- a/net/d80211/ieee80211_i.h +++ b/net/d80211/ieee80211_i.h @@ -241,6 +241,7 @@ struct ieee80211_if_sta { IEEE80211_IBSS_SEARCH, IEEE80211_IBSS_JOINED } state; struct timer_list timer; + u8 nick[IW_ESSID_MAX_SIZE]; u8 bssid[ETH_ALEN], prev_bssid[ETH_ALEN]; u8 ssid[IEEE80211_MAX_SSID_LEN]; size_t ssid_len; diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c index 89a58e3..956eabb 100644 --- a/net/d80211/ieee80211_ioctl.c +++ b/net/d80211/ieee80211_ioctl.c @@ -2153,6 +2153,39 @@ static void ieee80211_ioctl_unmask_chann } +static int ieee80211_ioctl_siwnick(struct net_device *dev, + struct iw_request_info *info, + union iwreq_data *wrqu, char *extra) +{ + struct ieee80211_sub_if_data *sdata; + struct ieee80211_if_sta *ifsta; + + sdata = IEEE80211_DEV_TO_SUB_IF(dev); + ifsta = sdata-u.sta; + if (wrqu-data.length = IW_ESSID_MAX_SIZE) + return -E2BIG; + + memset(ifsta-nick, 0, sizeof(ifsta-nick)); + memcpy(ifsta-nick, extra, wrqu-data.length); + return 0; +} + +static int ieee80211_ioctl_giwnick(struct net_device *dev, + struct iw_request_info *info, + union iwreq_data *wrqu, char *extra) +{ + struct ieee80211_sub_if_data *sdata; + struct ieee80211_if_sta *ifsta; + + sdata = IEEE80211_DEV_TO_SUB_IF(dev); + ifsta = sdata-u.sta; + + wrqu-data.length = strlen(ifsta-nick) + 1; + memcpy(extra, ifsta-nick, wrqu-data.length); + wrqu-data.flags = 1; /* active */ + return 0; +} + static int ieee80211_ioctl_test_mode(struct net_device *dev, int mode) { struct ieee80211_local *local = dev-ieee80211_ptr; @@ -3138,8 +3171,8 @@ static const iw_handler ieee80211_handle (iw_handler) ieee80211_ioctl_giwscan, /* SIOCGIWSCAN */ (iw_handler) ieee80211_ioctl_siwessid, /* SIOCSIWESSID */ (iw_handler) ieee80211_ioctl_giwessid, /* SIOCGIWESSID */ - (iw_handler) NULL,/* SIOCSIWNICKN */ - (iw_handler) NULL,/* SIOCGIWNICKN */ + (iw_handler) ieee80211_ioctl_siwnick, /* SIOCSIWNICKN */ + (iw_handler) ieee80211_ioctl_giwnick, /* SIOCGIWNICKN */ (iw_handler) NULL,/* -- hole -- */ (iw_handler) NULL,/* -- hole -- */ (iw_handler) NULL,/* SIOCSIWRATE */
[PATCH 7/7] d80211: getting wrong freq value if we did hardware scan
In this patch we search all A-BAND available channels to get the right frequency value. this might not be the right thing to do in beacon parsing. Another approach is to have a static array of the maximum A-BAND channel number then we can map from channel to frequency fast. we can set the values of this array at run time. This patch modify d80211 to fix getting wrong frequency value for scan implemented in hardware. With harware scan we might get beacon of a network that is on different channel that in local-conf.channel causing set freq to wrong value. Signed-off-by: Mohamed Abbas [EMAIL PROTECTED] diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c index a933d92..374193e 100644 --- a/net/d80211/ieee80211_sta.c +++ b/net/d80211/ieee80211_sta.c @@ -1543,8 +1543,6 @@ #endif bss-channel = channel; bss-freq = local-conf.freq; if (channel != local-conf.channel - (local-conf.phymode == MODE_IEEE80211G || - local-conf.phymode == MODE_IEEE80211B) channel = 1 channel = 14) { static const int freq_list[] = { 2412, 2417, 2422, 2427, 2432, 2437, 2442, @@ -1553,6 +1551,32 @@ #endif /* IEEE 802.11g/b mode can receive packets from neighboring * channels, so map the channel into frequency. */ bss-freq = freq_list[channel - 1]; + + if (bss-hw_mode != MODE_IEEE80211G + bss-hw_mode != MODE_IEEE80211B) + bss-hw_mode = MODE_IEEE80211G; + + } else if (channel != local-conf.channel ) { + int j, i; + int b_found = 0; + + /* not a bg channel search in other mode */ + for (i = 0; i local-hw-num_modes; i++) { + struct ieee80211_hw_modes *mode = local-hw-modes[i]; + + if ((mode-mode != MODE_IEEE80211G) + (mode-mode != MODE_IEEE80211B)){ +for (j = 0; mode-num_channels; j++) +if (mode-channels[j].chan == channel) { + bss-freq = mode-channels[j].freq; + b_found = 1; + bss-hw_mode = mode-mode; + break; +} + } + if (b_found) +break; + } } bss-timestamp = timestamp; bss-last_update = jiffies;
Re: [PATCH 5/7] d80211: indicate if unassociate/radio off status
It would be helpful if you inlined your patches instead of attaching them next time. I'm not comfortable with using the name for this purpose. Don't we just report 00:00:00:00:00:00 when not associated? Also, for radio off, wasn't that being covered by the rfkill patches? -Michael Wu pgpSscKut6D6p.pgp Description: PGP signature
Re: e1000 and 802.1ad/stacked vlan tagging
Stephan von Krawczynski wrote: Hello Jesse, thank you for answering anyway. Though I think your answer covers only the obvious half of the problem. Indeed one might think that this solves the issue - as long as there are only linux kernels involved. Unfortunately my setup is a bit more complicated in terms of hardware. So I should have probably clarified the question this way: how do you configure the interface in a manner that packets with data length of 1500 get transferred, and not only 1496 ? I tried enlarging both real-device and first vlan interface mtu but that does not work out. I really thought that the visible device setting of mtu=1500 should have worked out and that the driver (or some code in between) should have corrected the allowed frame size to reflect the actual setup, not? Unless you are patching the VLAN code, stacked VLANs are not going to work anyway. Search the archives of the VLAN mailing list for reasons why..and at least a few patches that 'fix' the problem for a few types of uses. Ben - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name
From: Andrew Morton [EMAIL PROTECTED] Date: Mon, 28 Aug 2006 12:03:28 -0700 grepping for `ioctl' gives: ioctl(9, SIOCGIWNAME, 0xbfe38d8c) = -1 EINVAL (Invalid argument) ioctl(9, SIOCETHTOOL, 0xbfe38d2c) = 0 ioctl(11, SIOCGIFHWADDR, {ifr_name=eth0, ???}) = -1 ENODEV (No such device) ioctl(11, SIOCGIFFLAGS, {ifr_name=eth0, ???}) = -1 ENODEV (No such device) Perhaps you could generate the strace output for 2.6.18-rc5, grep that for ioctl, look for differences? That initial SIOCGIWNAME failure is fishy. That might help, but SIOCGIWNAME just gets a string that says what wireless mode the device is in, not the device name. Althought NetworkManager might use this for something interesting. All of the interesting config calls are probably happening via netlink, which doesn't get decoded by strace. But changes via netlink can get traced by using ip in monitor mode, try ip monitor all as root during such a NetworkManager run. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: divide error: 0000 in fib6_rule_match [Re: 2.6.18-rc4-mm3]
On Mon, 28 Aug 2006 22:07:16 +0200 Mattia Dongili [EMAIL PROTECTED] wrote: On Sat, Aug 26, 2006 at 04:09:22PM -0700, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc4/2.6.18-rc4-mm3/ [...] git-net.patch got this one when starting sshd: [ 44.412000] divide error: [#1] [ 44.412000] 4K_STACKS PREEMPT [ 44.412000] last sysfs file: /devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load [ 44.412000] Modules linked in: nfsd exportfs lockd sunrpc ipt_MASQUERADE iptable_nat ip_nat xt_tcpudp xt_state ip_conntrack iptable_filter ip_tables x_tables ipv6 jfs aes dm_crypt dm_mod rtc sony_acpi tun psmouse sonypi speedstep_ich speedstep_lib cpufreq_conservative cpufreq_ondemand freq_table cpufreq_powersave sd_mod usb_storage scsi_mod usbhid pcmcia snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer intel_agp agpgart i2c_i801 snd soundcore snd_page_alloc yenta_socket rsrc_nonstatic pcmcia_core uhci_hcd usbcore evdev e100 mii pcspkr [ 44.412000] CPU:0 [ 44.412000] EIP:0060:[d1516aca]Not tainted VLI [ 44.412000] EFLAGS: 00210246 (2.6.18-rc4-mm3-1 #6) [ 44.412000] EIP is at fib6_rule_match+0x7a/0x150 [ipv6] [ 44.412000] eax: ebx: cd9d4e30 ecx: d15290e0 edx: [ 44.412000] esi: cd7d9e08 edi: cd9d4e30 ebp: cd9d4d34 esp: cd9d4d0c [ 44.412000] ds: 007b es: 007b ss: 0068 [ 44.412000] Process sshd (pid: 3780, ti=cd9d4000 task=cf131590 task.ti=cd9d4000) [ 44.412000] Stack: 0003 c018b200 ced9df60 cd9d4d6c ced9df60 d15290e0 [ 44.412000]cd7d9e08 cd9d4e30 cd9d4d58 c02c198e d15290e0 cd9d4e30 c123f380 [ 44.412000]cd9d4e30 cd7d9e08 cd9d4e30 cd9d4d80 d15169dc d15290a0 cd9d4e30 [ 44.412000] Call Trace: [ 44.412000] [c02c198e] fib_rules_lookup+0x5e/0xe0 [ 44.412000] [d15169dc] fib6_rule_lookup+0x3c/0xb0 [ipv6] [ 44.412000] [d14f8702] ip6_route_output+0x32/0x40 [ipv6] [ 44.412000] [d14ed155] ip6_dst_lookup_tail+0x95/0xd0 [ipv6] [ 44.412000] [d14ed1a7] ip6_dst_lookup+0x17/0x20 [ipv6] [ 44.412000] [d15120ce] ip6_datagram_connect+0x36e/0x6c0 [ipv6] [ 44.412000] [c02f6829] inet_dgram_connect+0x39/0x80 [ 44.412000] [c02a6ceb] sys_connect+0x6b/0x90 [ 44.412000] [c02a846f] sys_socketcall+0x9f/0x260 [ 44.412000] [c010325b] syscall_call+0x7/0xb [ 44.412000] [b7c7c93c] 0xb7c7c93c [ 44.412000] === [ 44.412000] Code: 00 00 00 89 d8 83 e0 1f 0f 85 9a 00 00 00 8b 5d 08 0f b6 53 68 84 d2 75 78 8b 55 08 8b 5d 0c 8b 4a 60 8b 43 28 31 c8 89 d1 31 d2 f7 71 64 85 c0 0f 94 c0 0f b6 c0 8b 5d f4 8b 75 f8 8b 7d fc 89 [ 44.412000] EIP: [d1516aca] fib6_rule_match+0x7a/0x150 [ipv6] SS:ESP 0068:cd9d4d0c I cannot work out how the heck you got a divide instruction in fib6_rule_match(). Can you please do `make net/ipv6/fib6_rules.s', find the code which implements fib6_rule_match() (line starting with fib6_rule_match:) and send that plus the next 200-odd lines? Or just stick fib6_rules.s on a server somewhere? Or mail me fib6_rules.s off-list. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: divide error: 0000 in fib6_rule_match [Re: 2.6.18-rc4-mm3]
I cannot work out how the heck you got a divide instruction in fib6_rule_match(). This might be another symptom of the broken smp-alternatives patch. It tended to randomly corrupt some instructions by inserting different bytes which then crash in interesting ways. I already sent a fix for that, but it's not in yet. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: divide error: 0000 in fib6_rule_match [Re: 2.6.18-rc4-mm3]
On Mon, 28 Aug 2006 22:07:16 +0200 Mattia Dongili [EMAIL PROTECTED] wrote: [ 44.412000] === [ 44.412000] Code: 00 00 00 89 d8 83 e0 1f 0f 85 9a 00 00 00 8b 5d 08 0f b6 53 68 84 d2 75 78 8b 55 08 8b 5d 0c 8b 4a 60 8b 43 28 31 c8 89 d1 31 d2 f7 71 64 85 c0 0f 94 c0 0f b6 c0 8b 5d f4 8b 75 f8 8b 7d fc 89 [ 44.412000] EIP: [d1516aca] fib6_rule_match+0x7a/0x150 [ipv6] SS:ESP 0068:cd9d4d0c [ 44.412000] 6note: sshd[3780] exited with preempt_count 1 config and full dmesg: http://oioio.altervista.org/linux/config-2.6.18-rc4-mm3-1 http://oioio.altervista.org/linux/dmesg-2.6.18-rc4-mm3-1 it's at fib6_rules.c:132 but since I can't tell why r-fwmask is 0 I'll avoid proposing a wrong patch :) Oh. It looks like this has already been fixed: #ifdef CONFIG_IPV6_ROUTE_FWMARK if ((r-fwmark ^ fl-fl6_fwmark) r-fwmask) return 0; #endif there's no divide in there now. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: divide error: 0000 in fib6_rule_match
From: Andrew Morton [EMAIL PROTECTED] Date: Mon, 28 Aug 2006 14:30:03 -0700 Oh. It looks like this has already been fixed: #ifdef CONFIG_IPV6_ROUTE_FWMARK if ((r-fwmark ^ fl-fl6_fwmark) r-fwmask) return 0; #endif there's no divide in there now. That's right there used to be a typo there. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPSec kernel oops on ppc64
Joy Latten [EMAIL PROTECTED] wrote: I installed 2.6.17 + patch-2.6.18-rc4 + 2.6.18-rc4-mm2 onto two pSeries power 5 (ppc64 lpars) machines. I configured IPSec using the configuration listed below. Could you try straight 2.6.17? If that crashes too, then at least we can be sure that it isn't something new. A straight 2.6.17 kernel does not crash and my pings work. A 2.6.17 + patch-2.6.18-rc4 does crash and my pings do not work. The above tests were done on a ppc64. I can try patch-2.6.18-rc1, etc... to see which one it stops working on to narrow it down. Regards, Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] neighbour: convert neighbour hash table to hlist
Change the neighbour table hash list to hlist from list.h to allow for easier later conversion to RCU. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/net/neighbour.h |6 - net/core/neighbour.c| 160 +--- 2 files changed, 88 insertions(+), 78 deletions(-) --- net-2.6.19.orig/include/net/neighbour.h +++ net-2.6.19/include/net/neighbour.h @@ -88,10 +88,10 @@ struct neigh_statistics struct neighbour { - struct neighbour*next; + struct hlist_node hlist; struct neigh_table *tbl; struct neigh_parms *parms; - struct net_device *dev; + struct net_device *dev; unsigned long used; unsigned long confirmed; unsigned long updated; @@ -161,7 +161,7 @@ struct neigh_table unsigned long last_rand; kmem_cache_t*kmem_cachep; struct neigh_statistics *stats; - struct neighbour**hash_buckets; + struct hlist_head *hash_buckets; unsigned inthash_mask; __u32 hash_rnd; unsigned inthash_chain_gc; --- net-2.6.19.orig/net/core/neighbour.c +++ net-2.6.19/net/core/neighbour.c @@ -126,10 +126,11 @@ static int neigh_forced_gc(struct neigh_ write_lock_bh(tbl-lock); for (i = 0; i = tbl-hash_mask; i++) { - struct neighbour *n, **np; + struct neighbour *n; + struct hlist_node *node, *tmp; - np = tbl-hash_buckets[i]; - while ((n = *np) != NULL) { + hlist_for_each_entry_safe(n, node, tmp, + tbl-hash_buckets[i], hlist) { /* Neighbour record may be discarded if: * - nobody refers to it. * - it is not permanent @@ -137,7 +138,7 @@ static int neigh_forced_gc(struct neigh_ write_lock(n-lock); if (atomic_read(n-refcnt) == 1 !(n-nud_state NUD_PERMANENT)) { - *np = n-next; + hlist_del(n-hlist); n-dead = 1; shrunk = 1; write_unlock(n-lock); @@ -145,7 +146,6 @@ static int neigh_forced_gc(struct neigh_ continue; } write_unlock(n-lock); - np = n-next; } } @@ -181,14 +181,15 @@ static void neigh_flush_dev(struct neigh int i; for (i = 0; i = tbl-hash_mask; i++) { - struct neighbour *n, **np = tbl-hash_buckets[i]; + struct hlist_node *node, *tmp; + struct neighbour *n; - while ((n = *np) != NULL) { - if (dev n-dev != dev) { - np = n-next; + hlist_for_each_entry_safe(n, node, tmp, + tbl-hash_buckets[i], hlist) { + if (dev n-dev != dev) continue; - } - *np = n-next; + + hlist_del(n-hlist); write_lock(n-lock); neigh_del_timer(n); n-dead = 1; @@ -279,23 +280,20 @@ out_entries: goto out; } -static struct neighbour **neigh_hash_alloc(unsigned int entries) +static struct hlist_head *neigh_hash_alloc(unsigned int entries) { - unsigned long size = entries * sizeof(struct neighbour *); - struct neighbour **ret; + unsigned long size = entries * sizeof(struct hlist_head); - if (size = PAGE_SIZE) { - ret = kzalloc(size, GFP_ATOMIC); - } else { - ret = (struct neighbour **) + if (size = PAGE_SIZE) + return kzalloc(size, GFP_ATOMIC); + else + return (struct hlist_head *) __get_free_pages(GFP_ATOMIC|__GFP_ZERO, get_order(size)); - } - return ret; } -static void neigh_hash_free(struct neighbour **hash, unsigned int entries) +static void neigh_hash_free(struct hlist_head *hash, unsigned int entries) { - unsigned long size = entries * sizeof(struct neighbour *); + unsigned long size = entries * sizeof(struct hlist_head); if (size = PAGE_SIZE) kfree(hash); @@ -305,7 +303,7 @@ static void neigh_hash_free(struct neigh static void neigh_hash_grow(struct neigh_table *tbl, unsigned long new_entries) { - struct neighbour **new_hash, **old_hash; + struct hlist_head *new_hash, *old_hash; unsigned int i, new_hash_mask, old_entries; NEIGH_CACHE_STAT_INC(tbl, hash_grows); @@ -321,16
[PATCH 6/6] neighbour: convert hard header cache to sequence number
The reading of the hard header cache in the output path can be made lockless using seqlock. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/netdevice.h |3 ++- include/net/neighbour.h |2 ++ net/core/neighbour.c | 40 +++- net/ipv4/ip_output.c | 13 +++-- net/ipv6/ip6_output.c | 13 +++-- 5 files changed, 45 insertions(+), 26 deletions(-) --- net-2.6.19.orig/include/linux/netdevice.h +++ net-2.6.19/include/linux/netdevice.h @@ -193,7 +193,7 @@ struct hh_cache */ int hh_len; /* length of header */ int (*hh_output)(struct sk_buff *skb); - rwlock_thh_lock; + seqlock_t hh_lock; /* cached hardware header; allow for machine alignment needs.*/ #define HH_DATA_MOD16 @@ -217,6 +217,7 @@ struct hh_cache #define LL_RESERVED_SPACE_EXTRA(dev,extra) \ dev)-hard_header_len+extra)~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + /* These flag bits are private to the generic network queueing * layer, they may not be explicitly referenced by any other * code. --- net-2.6.19.orig/net/core/neighbour.c +++ net-2.6.19/net/core/neighbour.c @@ -591,9 +591,11 @@ void neigh_destroy(struct neighbour *nei while ((hh = neigh-hh) != NULL) { neigh-hh = hh-hh_next; hh-hh_next = NULL; - write_lock_bh(hh-hh_lock); + + write_seqlock_bh(hh-hh_lock); hh-hh_output = neigh_blackhole; - write_unlock_bh(hh-hh_lock); + write_sequnlock_bh(hh-hh_lock); + if (atomic_dec_and_test(hh-hh_refcnt)) kfree(hh); } @@ -912,9 +914,9 @@ static void neigh_update_hhs(struct neig if (update) { for (hh = neigh-hh; hh; hh = hh-hh_next) { - write_lock_bh(hh-hh_lock); + write_seqlock_bh(hh-hh_lock); update(hh, neigh-dev, neigh-ha); - write_unlock_bh(hh-hh_lock); + write_sequnlock_bh(hh-hh_lock); } } } @@ -1105,7 +1107,7 @@ static void neigh_hh_init(struct neighbo break; if (!hh (hh = kzalloc(sizeof(*hh), GFP_ATOMIC)) != NULL) { - rwlock_init(hh-hh_lock); + seqlock_init(hh-hh_lock); hh-hh_type = protocol; atomic_set(hh-hh_refcnt, 0); hh-hh_next = NULL; @@ -1128,6 +1130,33 @@ static void neigh_hh_init(struct neighbo } } + +/* + * Add header to skb from hard header cache + * Handle case where cache gets changed. + */ +int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb) +{ + int len, alen; + unsigned seq; + int (*output)(struct sk_buff *); + + for(;;) { + seq = read_seqbegin(hh-hh_lock); + len = hh-hh_len; + alen = HH_DATA_ALIGN(len); + output = hh-hh_output; + memcpy(skb-data - alen, hh-hh_data, alen); + skb_push(skb, len); + + if (likely(!read_seqretry(hh-hh_lock, seq))) + return output(skb); + + /* undo and try again */ + __skb_pull(skb, len); + } +} + /* This function can be used in contexts, where only old dev_queue_xmit worked, f.e. if you want to override normal output path (eql, shaper), but resolution is not made yet. @@ -2767,6 +2796,7 @@ EXPORT_SYMBOL(neigh_delete); EXPORT_SYMBOL(neigh_destroy); EXPORT_SYMBOL(neigh_dump_info); EXPORT_SYMBOL(neigh_event_ns); +EXPORT_SYMBOL(neigh_hh_output); EXPORT_SYMBOL(neigh_ifdown); EXPORT_SYMBOL(neigh_lookup); EXPORT_SYMBOL(neigh_lookup_nodev); --- net-2.6.19.orig/net/ipv4/ip_output.c +++ net-2.6.19/net/ipv4/ip_output.c @@ -182,16 +182,9 @@ static inline int ip_finish_output2(stru skb = skb2; } - if (hh) { - int hh_alen; - - read_lock_bh(hh-hh_lock); - hh_alen = HH_DATA_ALIGN(hh-hh_len); - memcpy(skb-data - hh_alen, hh-hh_data, hh_alen); - read_unlock_bh(hh-hh_lock); - skb_push(skb, hh-hh_len); - return hh-hh_output(skb); - } else if (dst-neighbour) + if (hh) + return neigh_hh_output(hh, skb); + else if (dst-neighbour) return dst-neighbour-output(skb); if (net_ratelimit()) --- net-2.6.19.orig/net/ipv6/ip6_output.c +++ net-2.6.19/net/ipv6/ip6_output.c @@ -76,16 +76,9 @@ static inline int ip6_output_finish(stru struct dst_entry *dst = skb-dst; struct hh_cache *hh = dst-hh; - if (hh) { - int hh_alen; - - read_lock_bh(hh-hh_lock); - hh_alen = HH_DATA_ALIGN(hh-hh_len); -
[PATCH 5/6] neighbour: convert lookup to sequence lock
The reading of neighbour table entries can be converted from a slow reader/writer lock to a fast lockless sequence number check. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/net/neighbour.h |2 net/core/neighbour.c| 117 +--- net/ipv4/arp.c | 101 + net/ipv6/ndisc.c| 16 +++--- net/ipv6/route.c| 12 ++-- net/sched/sch_teql.c| 11 +++- 6 files changed, 155 insertions(+), 104 deletions(-) --- net-2.6.19.orig/include/net/neighbour.h +++ net-2.6.19/include/net/neighbour.h @@ -100,7 +100,7 @@ struct neighbour __u8type; __u8dead; atomic_tprobes; - rwlock_tlock; + seqlock_t lock; unsigned char ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))]; struct hh_cache *hh; atomic_trefcnt; --- net-2.6.19.orig/net/core/neighbour.c +++ net-2.6.19/net/core/neighbour.c @@ -143,17 +143,17 @@ static int neigh_forced_gc(struct neigh_ * - nobody refers to it. * - it is not permanent */ - write_lock(n-lock); + write_seqlock(n-lock); if (atomic_read(n-refcnt) == 1 !(n-nud_state NUD_PERMANENT)) { hlist_del_rcu(n-hlist); n-dead = 1; shrunk = 1; - write_unlock(n-lock); + write_sequnlock(n-lock); call_rcu(n-rcu, neigh_rcu_release); continue; } - write_unlock(n-lock); + write_sequnlock(n-lock); } } @@ -198,7 +198,7 @@ static void neigh_flush_dev(struct neigh continue; hlist_del_rcu(n-hlist); - write_lock(n-lock); + write_seqlock(n-lock); neigh_del_timer(n); n-dead = 1; @@ -220,7 +220,7 @@ static void neigh_flush_dev(struct neigh n-nud_state = NUD_NONE; NEIGH_PRINTK2(neigh %p is stray.\n, n); } - write_unlock(n-lock); + write_sequnlock(n-lock); neigh_release(n); } } @@ -267,7 +267,7 @@ static struct neighbour *neigh_alloc(str memset(n, 0, tbl-entry_size); skb_queue_head_init(n-arp_queue); - rwlock_init(n-lock); + seqlock_init(n-lock); n-updated= n-used = now; n-nud_state = NUD_NONE; n-output = neigh_blackhole; @@ -615,7 +615,7 @@ void neigh_destroy(struct neighbour *nei /* Neighbour state is suspicious; disable fast path. - Called with write_locked neigh. + Called with locked neigh. */ static void neigh_suspect(struct neighbour *neigh) { @@ -632,7 +632,7 @@ static void neigh_suspect(struct neighbo /* Neighbour state is OK; enable fast path. - Called with write_locked neigh. + Called with locked neigh. */ static void neigh_connect(struct neighbour *neigh) { @@ -676,7 +676,7 @@ static void neigh_periodic_timer(unsigne hlist_for_each_entry_safe(n, node, tmp, head, hlist) { unsigned int state; - write_lock(n-lock); + write_seqlock(n-lock); state = n-nud_state; if (state (NUD_PERMANENT | NUD_IN_TIMER)) @@ -690,12 +690,12 @@ static void neigh_periodic_timer(unsigne time_after(now, n-used + n-parms-gc_staletime))) { hlist_del_rcu(n-hlist); n-dead = 1; - write_unlock(n-lock); + write_sequnlock(n-lock); neigh_release(n); continue; } next_elt: - write_unlock(n-lock); + write_sequnlock(n-lock); } /* Cycle through all hash buckets every base_reachable_time/2 ticks. @@ -738,7 +738,7 @@ static void neigh_timer_handler(unsigned unsigned state; int notify = 0; - write_lock(neigh-lock); + write_seqlock(neigh-lock); state = neigh-nud_state; now = jiffies; @@ -748,6 +748,7 @@ static void neigh_timer_handler(unsigned #ifndef CONFIG_SMP printk(KERN_WARNING neigh: timer !nud_in_timer\n); #endif + write_sequnlock(neigh-lock); goto out; } @@ -808,9 +809,9 @@ static void neigh_timer_handler(unsigned
[PATCH 3/6] neighbour: convert pneigh hash table to hlist
Change the pneigh_entry table to hlist from list.h to allow for easier later conversion to RCU. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/net/neighbour.h |6 ++-- net/core/neighbour.c| 58 2 files changed, 33 insertions(+), 31 deletions(-) --- net-2.6.19.orig/include/net/neighbour.h +++ net-2.6.19/include/net/neighbour.h @@ -124,8 +124,8 @@ struct neigh_ops struct pneigh_entry { - struct pneigh_entry *next; - struct net_device *dev; + struct hlist_node hlist; + struct net_device *dev; u8 key[0]; }; @@ -165,7 +165,7 @@ struct neigh_table unsigned inthash_mask; __u32 hash_rnd; unsigned inthash_chain_gc; - struct pneigh_entry **phash_buckets; + struct hlist_head *phash_buckets; #ifdef CONFIG_PROC_FS struct proc_dir_entry *pde; #endif --- net-2.6.19.orig/net/core/neighbour.c +++ net-2.6.19/net/core/neighbour.c @@ -455,6 +455,7 @@ struct pneigh_entry * pneigh_lookup(stru struct net_device *dev, int creat) { struct pneigh_entry *n; + struct hlist_node *tmp; int key_len = tbl-key_len; u32 hash_val = *(u32 *)(pkey + key_len - 4); @@ -465,7 +466,7 @@ struct pneigh_entry * pneigh_lookup(stru read_lock_bh(tbl-lock); - for (n = tbl-phash_buckets[hash_val]; n; n = n-next) { + hlist_for_each_entry(n, tmp, tbl-phash_buckets[hash_val], hlist) { if (!memcmp(n-key, pkey, key_len) (n-dev == dev || !n-dev)) { read_unlock_bh(tbl-lock); @@ -495,8 +496,7 @@ struct pneigh_entry * pneigh_lookup(stru } write_lock_bh(tbl-lock); - n-next = tbl-phash_buckets[hash_val]; - tbl-phash_buckets[hash_val] = n; + hlist_add_head(n-hlist, tbl-phash_buckets[hash_val]); write_unlock_bh(tbl-lock); out: return n; @@ -506,7 +506,8 @@ out: int pneigh_delete(struct neigh_table *tbl, const void *pkey, struct net_device *dev) { - struct pneigh_entry *n, **np; + struct pneigh_entry *n; + struct hlist_node *tmp; int key_len = tbl-key_len; u32 hash_val = *(u32 *)(pkey + key_len - 4); @@ -516,10 +517,9 @@ int pneigh_delete(struct neigh_table *tb hash_val = PNEIGH_HASHMASK; write_lock_bh(tbl-lock); - for (np = tbl-phash_buckets[hash_val]; (n = *np) != NULL; -np = n-next) { + hlist_for_each_entry(n, tmp, tbl-phash_buckets[hash_val], hlist) { if (!memcmp(n-key, pkey, key_len) n-dev == dev) { - *np = n-next; + hlist_del(n-hlist); write_unlock_bh(tbl-lock); if (tbl-pdestructor) tbl-pdestructor(n); @@ -535,22 +535,21 @@ int pneigh_delete(struct neigh_table *tb static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev) { - struct pneigh_entry *n, **np; u32 h; for (h = 0; h = PNEIGH_HASHMASK; h++) { - np = tbl-phash_buckets[h]; - while ((n = *np) != NULL) { + struct pneigh_entry *n; + struct hlist_node *tmp, *nxt; + + hlist_for_each_entry_safe(n, tmp, nxt, tbl-phash_buckets[h], hlist) { if (!dev || n-dev == dev) { - *np = n-next; + hlist_del(n-hlist); if (tbl-pdestructor) tbl-pdestructor(n); if (n-dev) dev_put(n-dev); kfree(n); - continue; } - np = n-next; } } return -ENOENT; @@ -1332,7 +1331,6 @@ void neigh_parms_destroy(struct neigh_pa void neigh_table_init_no_netlink(struct neigh_table *tbl) { unsigned long now = jiffies; - unsigned long phsize; atomic_set(tbl-parms.refcnt, 1); INIT_RCU_HEAD(tbl-parms.rcu_head); @@ -1359,8 +1357,8 @@ void neigh_table_init_no_netlink(struct tbl-hash_mask = 1; tbl-hash_buckets = neigh_hash_alloc(tbl-hash_mask + 1); - phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *); - tbl-phash_buckets = kzalloc(phsize, GFP_KERNEL); + tbl-phash_buckets = kcalloc(PNEIGH_HASHMASK + 1, sizeof(struct hlist_head), +GFP_KERNEL); if (!tbl-hash_buckets || !tbl-phash_buckets) panic(cannot allocate neighbour cache hashes); @@ -2188,18 +2186,18 @@ static struct pneigh_entry *pneigh_get_f { struct neigh_seq_state *state =
[PATCH 4/6] net neighbour: convert to RCU
Use RCU to allow for lock less access to the neighbour table. This should speedup the send path because no atomic operations will be needed to lookup ARP entries, etc. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/net/neighbour.h |4 - net/core/neighbour.c| 158 +--- 2 files changed, 87 insertions(+), 75 deletions(-) --- net-2.6.19.orig/include/net/neighbour.h +++ net-2.6.19/include/net/neighbour.h @@ -108,6 +108,7 @@ struct neighbour struct sk_buff_head arp_queue; struct timer_list timer; struct neigh_ops*ops; + struct rcu_head rcu; u8 primary_key[0]; }; @@ -126,6 +127,7 @@ struct pneigh_entry { struct hlist_node hlist; struct net_device *dev; + struct rcu_head rcu; u8 key[0]; }; @@ -157,7 +159,7 @@ struct neigh_table struct timer_list proxy_timer; struct sk_buff_head proxy_queue; atomic_tentries; - rwlock_tlock; + spinlock_t lock; unsigned long last_rand; kmem_cache_t*kmem_cachep; struct neigh_statistics *stats; --- net-2.6.19.orig/net/core/neighbour.c +++ net-2.6.19/net/core/neighbour.c @@ -67,9 +67,10 @@ static struct file_operations neigh_stat #endif /* - Neighbour hash table buckets are protected with rwlock tbl-lock. + Neighbour hash table buckets are protected with lock tbl-lock. - - All the scans/updates to hash buckets MUST be made under this lock. + - All the scans of hash buckes must be made with RCU read lock (nopreempt) + - updates to hash buckets MUST be made under this lock. - NOTHING clever should be made under this lock: no callbacks to protocol backends, no attempts to send something to network. It will result in deadlocks, if backend/driver wants to use neighbour @@ -117,6 +118,13 @@ unsigned long neigh_rand_reach_time(unsi } +static void neigh_rcu_release(struct rcu_head *head) +{ + struct neighbour *neigh = container_of(head, struct neighbour, rcu); + + neigh_release(neigh); +} + static int neigh_forced_gc(struct neigh_table *tbl) { int shrunk = 0; @@ -124,7 +132,7 @@ static int neigh_forced_gc(struct neigh_ NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs); - write_lock_bh(tbl-lock); + spin_lock_bh(tbl-lock); for (i = 0; i = tbl-hash_mask; i++) { struct neighbour *n; struct hlist_node *node, *tmp; @@ -138,11 +146,11 @@ static int neigh_forced_gc(struct neigh_ write_lock(n-lock); if (atomic_read(n-refcnt) == 1 !(n-nud_state NUD_PERMANENT)) { - hlist_del(n-hlist); + hlist_del_rcu(n-hlist); n-dead = 1; shrunk = 1; write_unlock(n-lock); - neigh_release(n); + call_rcu(n-rcu, neigh_rcu_release); continue; } write_unlock(n-lock); @@ -151,7 +159,7 @@ static int neigh_forced_gc(struct neigh_ tbl-last_flush = jiffies; - write_unlock_bh(tbl-lock); + spin_unlock_bh(tbl-lock); return shrunk; } @@ -189,7 +197,7 @@ static void neigh_flush_dev(struct neigh if (dev n-dev != dev) continue; - hlist_del(n-hlist); + hlist_del_rcu(n-hlist); write_lock(n-lock); neigh_del_timer(n); n-dead = 1; @@ -220,17 +228,17 @@ static void neigh_flush_dev(struct neigh void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev) { - write_lock_bh(tbl-lock); + spin_lock_bh(tbl-lock); neigh_flush_dev(tbl, dev); - write_unlock_bh(tbl-lock); + spin_unlock_bh(tbl-lock); } int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) { - write_lock_bh(tbl-lock); + spin_lock_bh(tbl-lock); neigh_flush_dev(tbl, dev); pneigh_ifdown(tbl, dev); - write_unlock_bh(tbl-lock); + spin_unlock_bh(tbl-lock); del_timer_sync(tbl-proxy_timer); pneigh_queue_purge(tbl-proxy_queue); @@ -326,8 +334,8 @@ static void neigh_hash_grow(struct neigh unsigned int hash_val = tbl-hash(n-primary_key, n-dev); hash_val = new_hash_mask; - hlist_del(n-hlist); - hlist_add_head(n-hlist, new_hash[hash_val]); + __hlist_del(n-hlist); + hlist_add_head_rcu(n-hlist,
[PATCH 0/5] skge update
Several non-critical bug fixes for skge driver. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] skge: pci bus post fixes
At the end of a critical section, we need to force the PCI write to complete by doing a read. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -2747,7 +2747,7 @@ static int skge_poll(struct net_device * spin_lock_irq(hw-hw_lock); hw-intr_mask |= rxirqmask[skge-port]; skge_write32(hw, B0_IMSK, hw-intr_mask); - mmiowb(); + skge_read32(hw, B0_IMSK); spin_unlock_irq(hw-hw_lock); return 0; @@ -2881,6 +2881,7 @@ static void skge_extirq(void *arg) spin_lock_irq(hw-hw_lock); hw-intr_mask |= IS_EXT_REG; skge_write32(hw, B0_IMSK, hw-intr_mask); + skge_read32(hw, B0_IMSK); spin_unlock_irq(hw-hw_lock); } @@ -2955,6 +2956,7 @@ static irqreturn_t skge_intr(int irq, vo skge_error_irq(hw); skge_write32(hw, B0_IMSK, hw-intr_mask); + skge_read32(hw, B0_IMSK); spin_unlock(hw-hw_lock); return IRQ_HANDLED; @@ -3424,6 +3426,7 @@ static void __devexit skge_remove(struct spin_lock_irq(hw-hw_lock); hw-intr_mask = 0; skge_write32(hw, B0_IMSK, 0); + skge_read32(hw, B0_IMSK); spin_unlock_irq(hw-hw_lock); skge_write16(hw, B0_LED, LED_STAT_OFF); -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] skge: use ethX for irq assigments
The user level irq balance daemon uses eth as a way to distinquish ethernet devices. Also, by using device name it is possible to distinquish different boards. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -3343,23 +3343,16 @@ static int __devinit skge_probe(struct p goto err_out_free_hw; } - err = request_irq(pdev-irq, skge_intr, IRQF_SHARED, DRV_NAME, hw); - if (err) { - printk(KERN_ERR PFX %s: cannot assign irq %d\n, - pci_name(pdev), pdev-irq); - goto err_out_iounmap; - } - pci_set_drvdata(pdev, hw); - err = skge_reset(hw); if (err) - goto err_out_free_irq; + goto err_out_iounmap; printk(KERN_INFO PFX DRV_VERSION addr 0x%llx irq %d chip %s rev %d\n, (unsigned long long)pci_resource_start(pdev, 0), pdev-irq, skge_board_name(hw), hw-chip_rev); - if ((dev = skge_devinit(hw, 0, using_dac)) == NULL) + dev = skge_devinit(hw, 0, using_dac); + if (!dev) goto err_out_led_off; if (!is_valid_ether_addr(dev-dev_addr)) { @@ -3369,7 +3362,6 @@ static int __devinit skge_probe(struct p goto err_out_free_netdev; } - err = register_netdev(dev); if (err) { printk(KERN_ERR PFX %s: cannot register net device\n, @@ -3377,6 +3369,12 @@ static int __devinit skge_probe(struct p goto err_out_free_netdev; } + err = request_irq(pdev-irq, skge_intr, IRQF_SHARED, dev-name, hw); + if (err) { + printk(KERN_ERR PFX %s: cannot assign irq %d\n, + dev-name, pdev-irq); + goto err_out_unregister; + } skge_show_addr(dev); if (hw-ports 1 (dev1 = skge_devinit(hw, 1, using_dac))) { @@ -3389,15 +3387,16 @@ static int __devinit skge_probe(struct p free_netdev(dev1); } } + pci_set_drvdata(pdev, hw); return 0; +err_out_unregister: + unregister_netdev(dev); err_out_free_netdev: free_netdev(dev); err_out_led_off: skge_write16(hw, B0_LED, LED_STAT_OFF); -err_out_free_irq: - free_irq(pdev-irq, hw); err_out_iounmap: iounmap(hw-regs); err_out_free_hw: -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] skge: use dev_alloc_skb
To avoid problems with buggy protocols that assume extra header space, use dev_alloc_skb() when allocating receive buffers. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -827,7 +827,8 @@ static int skge_rx_fill(struct skge_port do { struct sk_buff *skb; - skb = alloc_skb(skge-rx_buf_size + NET_IP_ALIGN, GFP_KERNEL); + skb = __dev_alloc_skb(skge-rx_buf_size + NET_IP_ALIGN, + GFP_KERNEL); if (!skb) return -ENOMEM; @@ -2609,7 +2610,7 @@ static inline struct sk_buff *skge_rx_ge goto error; if (len RX_COPY_THRESHOLD) { - skb = alloc_skb(len + 2, GFP_ATOMIC); + skb = dev_alloc_skb(len + 2); if (!skb) goto resubmit; @@ -2624,7 +2625,7 @@ static inline struct sk_buff *skge_rx_ge skge_rx_reuse(e, skge-rx_buf_size); } else { struct sk_buff *nskb; - nskb = alloc_skb(skge-rx_buf_size + NET_IP_ALIGN, GFP_ATOMIC); + nskb = dev_alloc_skb(skge-rx_buf_size + NET_IP_ALIGN); if (!nskb) goto resubmit; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] pcnet32: fix user visible typo
Also, final dot removed and single form fixed. The cause of #6428 is still to be found. Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- drivers/net/pcnet32.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/net/pcnet32.c +++ b/drivers/net/pcnet32.c @@ -2986,7 +2986,8 @@ static int __init pcnet32_init_module(vo pcnet32_probe_vlbus(pcnet32_portlist); if (cards_found (pcnet32_debug NETIF_MSG_PROBE)) - printk(KERN_INFO PFX %d cards_found.\n, cards_found); + printk(KERN_INFO PFX %d card%s found\n, + cards_found, cards_found 1 ? s : ); return (pcnet32_have_pci + cards_found) ? 0 : -ENODEV; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] skge: version 1.7
Increase version. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -43,7 +43,7 @@ #include skge.h #define DRV_NAME skge -#define DRV_VERSION1.6 +#define DRV_VERSION1.7 #define PFXDRV_NAME #define DEFAULT_TX_RING_SIZE 128 -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] skge: cleanup suspend/resume code
The code for suspend/resume needs several fixes. The hardware lock should be setup in probe only, not in resume. Interrupts should be disabled during suspend, etc. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -3106,7 +3106,6 @@ static int skge_reset(struct skge_hw *hw else hw-ram_size = t8 * 4096; - spin_lock_init(hw-hw_lock); hw-intr_mask = IS_HW_ERR | IS_EXT_REG | IS_PORT_1; if (hw-ports 1) hw-intr_mask |= IS_PORT_2; @@ -3332,6 +3331,7 @@ static int __devinit skge_probe(struct p hw-pdev = pdev; mutex_init(hw-phy_mutex); INIT_WORK(hw-phy_work, skge_extirq, hw); + spin_lock_init(hw-hw_lock); hw-regs = ioremap_nocache(pci_resource_start(pdev, 0), 0x4000); if (!hw-regs) { @@ -3449,26 +3449,25 @@ static int skge_suspend(struct pci_dev * struct skge_hw *hw = pci_get_drvdata(pdev); int i, wol = 0; - for (i = 0; i 2; i++) { + pci_save_state(pdev); + for (i = 0; i hw-ports; i++) { struct net_device *dev = hw-dev[i]; - if (dev) { + if (netif_running(dev)) { struct skge_port *skge = netdev_priv(dev); - if (netif_running(dev)) { - netif_carrier_off(dev); - if (skge-wol) - netif_stop_queue(dev); - else - skge_down(dev); - } - netif_device_detach(dev); + + netif_carrier_off(dev); + if (skge-wol) + netif_stop_queue(dev); + else + skge_down(dev); wol |= skge-wol; } + netif_device_detach(dev); } - pci_save_state(pdev); + skge_write32(hw, B0_IMSK, 0); pci_enable_wake(pdev, pci_choose_state(pdev, state), wol); - pci_disable_device(pdev); pci_set_power_state(pdev, pci_choose_state(pdev, state)); return 0; @@ -3477,23 +3476,33 @@ static int skge_suspend(struct pci_dev * static int skge_resume(struct pci_dev *pdev) { struct skge_hw *hw = pci_get_drvdata(pdev); - int i; + int i, err; pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); pci_enable_wake(pdev, PCI_D0, 0); - skge_reset(hw); + err = skge_reset(hw); + if (err) + goto out; - for (i = 0; i 2; i++) { + for (i = 0; i hw-ports; i++) { struct net_device *dev = hw-dev[i]; - if (dev) { - netif_device_attach(dev); - if (netif_running(dev) skge_up(dev)) + + netif_device_attach(dev); + if (netif_running(dev)) { + err = skge_up(dev); + + if (err) { + printk(KERN_ERR PFX %s: could not up: %d\n, + dev-name, err); dev_close(dev); + goto out; + } } } - return 0; +out: + return err; } #endif -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPSec kernel oops on ppc64
From: Joy Latten [EMAIL PROTECTED] Date: Mon, 28 Aug 2006 17:25:15 -0500 I can try patch-2.6.18-rc1, etc... to see which one it stops working on to narrow it down. If you could do this in the meanwhile, it would help us out a lot. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pcnet32: fix user visible typo
The cause of #6428 has already been fixed in v1.32 of the pcnet32 driver. To be correct, the printk should be: printk(KERN_INFO PFX %d card%s found\n, cards_found, cards_found != 1 ? s : ); So that zero cards also says 'pcnet32: 0 cards found.' Why delete the period from the end of the sentence? On Tue, Aug 29, 2006 at 03:32:49AM +0400, Alexey Dobriyan wrote: Also, final dot removed and single form fixed. The cause of #6428 is still to be found. Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- drivers/net/pcnet32.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/net/pcnet32.c +++ b/drivers/net/pcnet32.c @@ -2986,7 +2986,8 @@ static int __init pcnet32_init_module(vo pcnet32_probe_vlbus(pcnet32_portlist); if (cards_found (pcnet32_debug NETIF_MSG_PROBE)) - printk(KERN_INFO PFX %d cards_found.\n, cards_found); + printk(KERN_INFO PFX %d card%s found\n, + cards_found, cards_found 1 ? s : ); return (pcnet32_have_pci + cards_found) ? 0 : -ENODEV; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Don Fry [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] net: VM deadlock avoidance framework
On Mon, August 28, 2006 19:32, Peter Zijlstra said: Also, I'm really past caring what the thing is called ;-) But if ppl object I guess its easy enough to run yet another sed command over the patches. True, same here. You can get rid of the memalloc_reserve and vmio_request_queues variables if you want, they aren't really needed for anything. If using them reduces the total code size I'd keep them though. I find my version easier to read, but that might just be the way my brain works. Maybe true, but I believe my version is more natural in the sense that it makes more clear what the code is doing. Less bookkeeping, more real work, so to speak. Ok, I'll have another look at it, perhaps my gray matter has shifted ;-) I don't care either way, just providing an alternative. I'd compile both and see which one is smaller. Ah, no accident there, I'm fully aware that there would need to be a spinlock in adjust_memalloc_reserve() if there were another caller. (I even had it there for some time) - added comment. Good that you're aware of it. Thing is, how much sense does the split-up into adjust_memalloc_reserve() and sk_adjust_memalloc() make at this point? Why not merge the code of adjust_memalloc_reserve() with sk_adjust_memalloc() and only add adjust_memalloc_reserve() when it's really needed? It saves an export. Feedback on the 28-Aug-2006 19:24 version from programming.kicks-ass.net/kernel-patches/vm_deadlock/current/ +void setup_per_zone_pages_min(void) +{ + static DEFINE_SPINLOCK(lock); + unsigned long flags; + + spin_lock_irqsave(lock, flags); + __setup_per_zone_pages_min(); + spin_unlock_irqrestore(lock, flags); +} Better to put the lock next to min_free_kbytes, both for readability and cache behaviour. And it satisfies the lock data, not code mantra. +static inline void * emergency_rx_alloc(size_t size, gfp_t gfp_mask) +{ + void * page = NULL; + + if (size PAGE_SIZE) + return page; + + if (atomic_add_unless(emergency_rx_pages_used, 1, RX_RESERVE_PAGES)) { + page = (void *)__get_free_page(gfp_mask); + if (!page) { + WARN_ON(1); + atomic_dec(emergency_rx_pages_used); + } + } + + return page; +} If you prefer to avoid cmpxchg (which is often used in atomic_add_unless and can be expensive) then you can use something like: static inline void * emergency_rx_alloc(size_t size, gfp_t gfp_mask) { void * page; if (size PAGE_SIZE) return NULL; if (atomic_inc_return(emergency_rx_pages_used) == RX_RESERVE_PAGES) goto out; page = (void *)__get_free_page(gfp_mask); if (page) return page; WARN_ON(1); out: atomic_dec(emergency_rx_pages_used); return NULL; } The tiny race should be totally harmless. Both versions are a bit big to inline though. @@ -195,6 +196,86 @@ __u32 sysctl_rmem_default = SK_RMEM_MAX; /* Maximal space eaten by iovec or ancilliary data plus some space */ int sysctl_optmem_max = sizeof(unsigned long)*(2*UIO_MAXIOV + 512); +static DEFINE_SPINLOCK(memalloc_lock); +static int memalloc_reserve; +static unsigned int vmio_request_queues; + +atomic_t vmio_socks; +atomic_t emergency_rx_pages_used; +EXPORT_SYMBOL_GPL(vmio_socks); Is this export needed? It's only used in net/core/skbuff.c and net/core/sock.c, which are compiled into one module. +EXPORT_SYMBOL_GPL(emergency_rx_pages_used); Same here. It's only used by code in sock.c and skbuff.c, and no external code calls emergency_rx_alloc(), nor emergency_rx_free(). -- I think I depleted my usefulness, there isn't much left to say for me. It's up to the big guys to decide about the merrit of this patch. If Evgeniy's network allocator fixes all deadlocks and also has other advantages, then great. IMHO: - This patch isn't really a framework, more a minimal fix for one specific, though important problem. But it's small and doesn't have much impact (numbers would be nice, e.g. vmlinux/modules size before and after, and some network benchmark results). - If Evgeniy's network allocator is as good as it looks, then why can't it replace the existing one? Just adding private subsystem specific memory allocators seems wrong. I might be missing the big picture, but it looks like memory allocator things should be at least synchronized and discussed with Christoph Lameter and his modular slab allocator patch. All in all it seems it will take a while until Evgeniy's code will be merged, so I think applying Peter's patch soonish and removing it again the moment it becomes unnecessary is reasonable. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 7/7] AMSO1100 Makefiles and Kconfig changes.
I'm finally getting around to merging this up, and: --- /dev/null +++ b/drivers/infiniband/hw/amso1100/README @@ -0,0 +1,11 @@ +This is the OpenFabrics provider driver for the +AMSO1100 1Gb RNIC adapter. + +This adapter is available in limited quantities +for development purposes from Open Grid Computing. + +This driver requires the IWCM and CMA mods necessary +to support iWARP. + +Contact [EMAIL PROTECTED] for more information. + I don't think this belongs in the drivers directory. In fact, is it worth having this in the kernel at all? How about if I just add a MAINTAINERS entry for amso1100 pointing at [EMAIL PROTECTED] ? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 7/7] AMSO1100 Makefiles and Kconfig changes.
Sounds good to me. - Original Message - From: Roland Dreier [EMAIL PROTECTED] To: Steve Wise [EMAIL PROTECTED] Cc: openib-general@openib.org; netdev@vger.kernel.org Sent: Monday, August 28, 2006 6:07 PM Subject: Re: [PATCH v4 7/7] AMSO1100 Makefiles and Kconfig changes. I'm finally getting around to merging this up, and: --- /dev/null +++ b/drivers/infiniband/hw/amso1100/README @@ -0,0 +1,11 @@ +This is the OpenFabrics provider driver for the +AMSO1100 1Gb RNIC adapter. + +This adapter is available in limited quantities +for development purposes from Open Grid Computing. + +This driver requires the IWCM and CMA mods necessary +to support iWARP. + +Contact [EMAIL PROTECTED] for more information. + I don't think this belongs in the drivers directory. In fact, is it worth having this in the kernel at all? How about if I just add a MAINTAINERS entry for amso1100 pointing at [EMAIL PROTECTED] ? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPSec kernel oops on ppc64
On Mon, Aug 28, 2006 at 05:25:15PM -0500, Joy Latten wrote: A straight 2.6.17 kernel does not crash and my pings work. A 2.6.17 + patch-2.6.18-rc4 does crash and my pings do not work. The above tests were done on a ppc64. Thanks for that info. This does sound like a bug. Could you please generate a dump of the stack/register contents and a disassembly of the code around the crash? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/*: use SLAB_PANIC
On Mon, Aug 28, 2006 at 01:36:37PM -0700, David Miller wrote: ipv6 can be modular, so panicing on a initialization failure is wrong. That may be the case, but he merely translated the code as it existed, he didn't change it to start panic()'ing it already did. It would be a seperate change to undo the panic() in the ipv6 code. That separate change transformed into big cleanup of IPV6 init/exit codepaths to fix panic properly. Will be posted soon. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]ethtool.c:fix buffer overflow when devname it too long
As the length of field ifr_name of struct ifreq is IFNAMSIZ(16) in header file /usr/include/net/if.h. It will result in buffer overflow when devname is too long. I modified strcpy to strncpy for only copying IFNAMSIZ bytes into struct ifreq. Also, by adding a section into parse_cmdline to detect if the length of devname is invalid. diff -Nrup ethtool-4.orig/ethtool.c ethtool-4/ethtool.c --- ethtool-4.orig/ethtool.c2006-07-18 21:21:38.0 -0500 +++ ethtool-4/ethtool.c 2006-08-27 22:32:12.0 -0500 @@ -626,6 +626,9 @@ static void parse_cmdline(int argc, char if (devname == NULL) { show_usage(1); + } else if (strlen(devname) IFNAMSIZ) { + fprintf(stderr, Device name is too long. Should be less than %d!\n, IFNAMSIZ); + show_usage(1); } } @@ -1139,7 +1142,7 @@ static int doit(void) /* Setup our control structures. */ memset(ifr, 0, sizeof(ifr)); - strcpy(ifr.ifr_name, devname); + strncpy(ifr.ifr_name, devname, IFNAMSIZ); /* Open control socket. */ fd = socket(AF_INET, SOCK_DGRAM, 0); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -net-2.6.19] net/*: don't panic
IPv6 can be modular and panicking on module loading is the last thing you want. Two SLAB_PANIC cases converted to error propagating as well as one panic() call. Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] --- I recall release is near, so error handling continues to suck. It needs big revamp anyway. init functions returning void. functions simply dropping -E... Partly shared with IPv4 :-( One more question: how can one unload ipv6? it seems to immediately get 8 users here no matter what. include/net/ip6_fib.h |2 +- include/net/ip6_route.h |2 +- include/net/transp_v6.h |2 +- net/ipv6/af_inet6.c |6 +- net/ipv6/ip6_fib.c |8 +--- net/ipv6/route.c| 14 +++--- net/ipv6/tcp_ipv6.c | 19 ++- 7 files changed, 38 insertions(+), 15 deletions(-) --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -210,7 +210,7 @@ extern void fib6_run_gc(unsigned long extern voidfib6_gc_cleanup(void); -extern voidfib6_init(void); +extern int fib6_init(void); extern voidfib6_rules_init(void); extern voidfib6_rules_cleanup(void); --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -59,7 +59,7 @@ extern struct dst_entry * ip6_route_outp extern int ip6_route_me_harder(struct sk_buff *skb); -extern voidip6_route_init(void); +extern int ip6_route_init(void); extern voidip6_route_cleanup(void); extern int ipv6_route_ioctl(unsigned int cmd, void __user *arg); --- a/include/net/transp_v6.h +++ b/include/net/transp_v6.h @@ -24,7 +24,7 @@ extern void ipv6_destopt_init(void); /* transport protocols */ extern voidrawv6_init(void); extern voidudpv6_init(void); -extern voidtcpv6_init(void); +extern int tcpv6_init(void); extern int udpv6_connect(struct sock *sk, struct sockaddr *uaddr, --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -863,13 +863,17 @@ #endif /* Init v6 transport protocols. */ udpv6_init(); - tcpv6_init(); + err = tcpv6_init(); + if (err) + goto tcpv6_init_fail; ipv6_packet_init(); err = 0; out: return err; +tcpv6_init_fail: + addrconf_cleanup(); addrconf_fail: ip6_flowlabel_cleanup(); ip6_route_cleanup(); --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1468,14 +1468,16 @@ void fib6_run_gc(unsigned long dummy) spin_unlock_bh(fib6_gc_lock); } -void __init fib6_init(void) +int __init fib6_init(void) { fib6_node_kmem = kmem_cache_create(fib6_nodes, sizeof(struct fib6_node), - 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, + 0, SLAB_HWCACHE_ALIGN, NULL, NULL); - + if (!fib6_node_kmem) + return -ENOMEM; fib6_tables_init(); + return 0; } void fib6_gc_cleanup(void) --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2411,14 +2411,21 @@ ctl_table ipv6_route_table[] = { #endif -void __init ip6_route_init(void) +int __init ip6_route_init(void) { struct proc_dir_entry *p; + int rv; ip6_dst_ops.kmem_cachep = kmem_cache_create(ip6_dst_cache, sizeof(struct rt6_info), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL); - fib6_init(); + SLAB_HWCACHE_ALIGN, NULL, NULL); + if (!ip6_dst_ops.kmem_cachep) + return -ENOMEM; + rv = fib6_init(); + if (rv 0) { + kmem_cache_destroy(ip6_dst_ops.kmem_cachep); + return rv; + } #ifdef CONFIG_PROC_FS p = proc_net_create(ipv6_route, 0, rt6_proc_info); if (p) @@ -2432,6 +2439,7 @@ #endif #ifdef CONFIG_IPV6_MULTIPLE_TABLES fib6_rules_init(); #endif + return 0; } void ip6_route_cleanup(void) --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1644,14 +1644,23 @@ static struct inet_protosw tcpv6_protosw INET_PROTOSW_ICSK, }; -void __init tcpv6_init(void) +int __init tcpv6_init(void) { + int rv; + /* register inet6 protocol */ - if (inet6_add_protocol(tcpv6_protocol, IPPROTO_TCP) 0) + rv = inet6_add_protocol(tcpv6_protocol, IPPROTO_TCP); + if (rv 0) { printk(KERN_ERR tcpv6_init: Could not register protocol\n); + return rv; + } inet6_register_protosw(tcpv6_protosw); - if
Re: myri10ge conversion to non-contiguous skb
Jesse Brandeburg wrote: On 8/24/06, Brice Goglin [EMAIL PROTECTED] wrote: During the submission of the myri10ge driver, some people raised the question of using pages (or any kind of non-contiguous skb) instead of our current 16kB contiguous skb. We are looking at this right now and it is not clear what solution is the best. From what we understand, Linux provides two mostly redundant mechanisms to handle discontinuous skb, the skb-frags and the skb-frag_list, s2io using the latter while e1000 uses the former. Is one or the other recommended? What is the purpose of having them both in the net core? you really only have one option, to use PAGE_SIZE pages and frags[] w/nr_frags. e1000 tried the frag_list option but that is used by ip reassembly and badly conflicts with driver generated frag_list. Ok, thanks for the clarification, we'll use frags then. Is s2io going to be converted from frag_list to frags then? Brice - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html