Re: [patch] pegasus.c
This is another attempt on sending the pegasus patch with Alpine. I know it's even worse, but i've attached gzip-ed version just in case. Here goes a brief description of what's changed: --- This patch is fixing a driver bug triggered when malformed string is passed to the 'devid' module parameter. The expected format is: device_name:vendor_id:device_id:flags but it turned out people often type: somename::0 instead of: somename:::0 which typically ends up dereferencing null pointer. Signed-off-by: Petko Manolov [EMAIL PROTECTED] --- cheers, Petko On Mon, 11 Feb 2008, Jeff Garzik wrote: Petko Manolov wrote: Hi Jeff, Attached you'll find a patch that is fixing a driver bug triggered when malformed string is passed to the 'devid' module parameter. The expected format is: device_name:vendor_id:device_id:flags but it turned out people often type: somename::0 instead of: somename:::0 ACK but two process problems preventing application: * patch is base64-encoded * no signed-off-by included pegasus.c.patch.gz Description: Binary data --- drivers/net/usb/pegasus.c.orig 2008-01-09 12:16:52.0 +0200 +++ drivers/net/usb/pegasus.c 2008-01-09 12:16:58.0 +0200 @@ -1461,12 +1461,24 @@ static void parse_id(char *id) if ((token = strsep(id, :)) != NULL) name = token; + else + goto err; /* name now points to a null terminated string*/ if ((token = strsep(id, :)) != NULL) vendor_id = simple_strtoul(token, NULL, 16); + else + goto err; + if ((token = strsep(id, :)) != NULL) device_id = simple_strtoul(token, NULL, 16); - flags = simple_strtoul(id, NULL, 16); + else + goto err; + + if (id != NULL) + flags = simple_strtoul(id, NULL, 16); + else + goto err; + pr_info(%s: new device %s, vendor ID 0x%04x, device ID 0x%04x, flags: 0x%x\n, driver_name, name, vendor_id, device_id, flags); @@ -1476,6 +1488,7 @@ static void parse_id(char *id) return; for (i=0; usb_dev_id[i].name; i++); + usb_dev_id[i].name = name; usb_dev_id[i].vendor = vendor_id; usb_dev_id[i].device = device_id; @@ -1483,6 +1496,11 @@ static void parse_id(char *id) pegasus_ids[i].match_flags = USB_DEVICE_ID_MATCH_DEVICE; pegasus_ids[i].idVendor = vendor_id; pegasus_ids[i].idProduct = device_id; + + return; + +err: + pr_info(malformed 'devid' module parameter\n); } static int __init pegasus_init(void)
Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?
On Feb 18, 2008 9:06 PM, Stephen Hemminger [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 19:42:25 + (GMT) Chris Rankin [EMAIL PROTECTED] wrote: --- Stephen Hemminger [EMAIL PROTECTED] wrote: sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === pci :00:01.0: Error creating sysfs bridge symlink, continuing... sysfs: duplicate filename 'bridge' can not be created WARNING: at fs/sysfs/dir.c:424 sysfs_add_one() Pid: 1, comm: swapper Not tainted 2.6.24.1 #1 [c0105020] show_trace_log_lvl+0x1a/0x2f [c0105990] show_trace+0x12/0x14 [c010613d] dump_stack+0x6c/0x72 [c01991bf] sysfs_add_one+0x57/0xbc [c0199e41] sysfs_create_link+0xc2/0x10d [c01bae9a] pci_bus_add_devices+0xbd/0x103 [c01bae82] pci_bus_add_devices+0xa5/0x103 [c034016c] pci_legacy_init+0x56/0xe3 [c03274e1] kernel_init+0x157/0x2c3 [c0104c83] kernel_thread_helper+0x7/0x10 === I have a vague feeling that this was fixed, perhaps in 2.6.24.x? Never heard of this, what is the initialization script that causes this? Also do you have the SYSFS_DEPRECATED option configured? that caused issues with regular network drivers. Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8. There was a bug (fixed in 2.6.24) that had to do with sysfs_create_link and SYSFS_DEPRECATED probably there is a similar problem with directories. Chris, could you enable CONFIG_DEBUG_KOBJECT=y, it might show what objects try to claim the same name. Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
AW: Problem receiving multicast/promiscuous-mode with kernel.2.6.24
Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, Stand G16 Hi, ok, i managed to shrink down my code to show the behaviour in small size ;-) It shows the same effect as my app, sometimes i get all the packets after a mc_join, sometimes i get only the timeouts. I find no predictive behaviour, i'm really desperate ... Hope you can see the effect too .. Do u need an app sending the multicast VLAN packets ? Robert - #include stdint.h #include sys/select.h #include errno.h #include stdio.h #include string.h #include time.h #include sys/types.h #include sys/socket.h #include netdb.h #include netinet/in.h #include stdio.h #include stdlib.h #include unistd.h #include fcntl.h #include arpa/inet.h #include net/if.h #include errno.h #include netdb.h //#include linux/if.h #include linux/sockios.h #include sys/ioctl.h //#define JOIN_SINGLE_IF #undef JOIN_SINGLE_IF #define PORT 10500 /* ** Try to receive VLAN-tagged UDP multicast packets ** VLAN-ID:3, VLAN_PRI=6, Addr: 224.1.9.0, dstnport 10500 ** ** Kernel used: 2.6.24 ** ** HW: VIA Epia 5000, VIA Epia LT (VIA Rhine II and VIA VT6107) ** ** Additional network settings: ** vconfig add eth0 3 ** vconfig set_egress_map eth0.3 6 6 ** ifconfig eth0.3 10.0.0.1 ** route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0.3 ** */ #define MAX_PAYLOAD 1000 /* * Network representation of the rtp header. */ struct rtp_header { #ifdef WORDS_BIGENDIAN uint16_t version:2; uint16_t padbit:1; uint16_t extbit:1; uint16_t cc:4; uint16_t markbit:1; uint16_t paytype:7; #else uint16_t cc:4; uint16_t extbit:1; uint16_t padbit:1; uint16_t version:2; uint16_t paytype:7; uint16_t markbit:1; #endif uint16_t seq_number; uint32_t timestamp; uint32_t ssrc; /* In fact we rely on rtp_header being not larger than * the minimum header length. So, there's no contributing * sync source field here. */ }; struct rtp_packet { struct rtp_header header; unsigned char payload[MAX_PAYLOAD]; }; #ifdef WORDS_BIGENDIAN #error Sorry this is not supported on bigendian targets. #endif #define BYTE0(x) ((x) 0xff) #define BYTE1(x) (((x) 8) 0xff) #define BYTE2(x) (((x) 16) 0xff) #define BYTE3(x) (((x) 24) 0xff) static char *my_inet_ntoa (char *buf, size_t n, struct in_addr in) { if (snprintf (buf, n, %u.%u.%u.%u, BYTE0(in.s_addr), BYTE1(in.s_addr), BYTE2(in.s_addr), BYTE3(in.s_addr)) = n) { return NULL; } return buf; } static int set_non_blocking (int fd) { int flags; if ((flags = fcntl (fd, F_GETFL)) 0) { perror (fcntl getflags ()); return -1; } if (fcntl (fd, F_SETFL, flags | O_NONBLOCK) 0) { perror (fcntl setflags ()); return -1; } return fd; } int main() { int fd; fd_set read_fds; struct timespec timeout; while(1) { int i, ret; if ((fd = subscribe_udp (PORT)) 0) { printf(Could not register UDP port.\n); return 1; } if (join_multicast_group (fd, 224.1.9.0) 0) { printf(Error joining multicast !\n); return 1; } for(i=0;i10;i++) { FD_ZERO (read_fds); FD_SET (fd, read_fds); timeout.tv_sec = 1; timeout.tv_nsec = 0; ret = pselect (fd+1, read_fds, NULL, NULL, timeout, NULL); if (ret 0) { if (errno == EINTR) { continue; } else { printf(select failed.\n); return 1; } } if (FD_ISSET(fd, read_fds)) { ssize_t len; struct rtp_packet p; struct sockaddr_in from; socklen_t fromlen; char from_ip[20]; len = recvfrom (fd, p, sizeof (p), 0, (struct sockaddr*) from, fromlen); my_inet_ntoa (from_ip, sizeof (from_ip), from.sin_addr); printf(Received data packet from %s, length %u\n, from_ip, len); } else { printf(Got timeout receiving Multicast packets !\n); } } /* loop 10 */ if (leave_multicast_group(fd, 224.1.9.0) 0) { printf(Error leaving multicast !\n); return 1; } close(fd); } } int subscribe_udp (int port) { int s; s = create_listening_socket (port, 1); return s; } int create_listening_socket (int listen_port, int udp) { struct sockaddr_in a; int s; int yes; if ((s = socket (AF_INET, udp ? SOCK_DGRAM : SOCK_STREAM, 0))
Re: tbench regression in 2.6.25-rc1
On Tue, 2008-02-19 at 08:35 +0100, Eric Dumazet wrote: Zhang, Yanmin a �crit : On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote: On Mon, 18 Feb 2008 16:12:38 +0800 Zhang, Yanmin [EMAIL PROTECTED] wrote: On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 15 Feb 2008 15:21:48 +0100 On linux-2.6.25-rc1 x86_64 : offsetof(struct dst_entry, lastuse)=0xb0 offsetof(struct dst_entry, __refcnt)=0xb8 offsetof(struct dst_entry, __use)=0xbc offsetof(struct dst_entry, next)=0xc0 So it should be optimal... I dont know why tbench prefers __refcnt being on 0xc0, since in this case lastuse will be on a different cache line... Each incoming IP packet will need to change lastuse, __refcnt and __use, so keeping them in the same cache line is a win. I suspect then that even this patch could help tbench, since it avoids writing lastuse... I think your suspicions are right, and even moreso it helps to keep __refcnt out of the same cache line as input/output/ops which are read-almost-entirely :- I think you are right. The issue is these three variables sharing the same cache line with input/output/ops. ) I haven't done an exhaustive analysis, but it seems that the write traffic to lastuse and __refcnt are about the same. However if we find that __refcnt gets hit more than lastuse in this workload, it explains the regression. I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 +0800 @@ -52,11 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ + + struct dst_entry*path; #ifdef CONFIG_NET_CLS_ROUTE __u32 tclassid; @@ -70,10 +69,12 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; Well, after this patch, we grow dst_entry by 8 bytes : With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I don't enable it. I will move tclassid under ops. sizeof(struct dst_entry)=0xd0 offsetof(struct dst_entry, input)=0x68 offsetof(struct dst_entry, output)=0x70 offsetof(struct dst_entry, __refcnt)=0xb4 offsetof(struct dst_entry, lastuse)=0xc0 offsetof(struct dst_entry, __use)=0xb8 sizeof(struct rtable)=0x140 So we dirty two cache lines instead of one, unless your cpu have 128 bytes cache lines ? I am quite suprised that my patch to not change lastuse if already set to jiffies changes nothing... If you have some time, could you also test this (unrelated) patch ? We can avoid dirty all the time a cache line of loopback device. diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index f2a6e71..0a4186a 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev) return 0; } #endif - dev-last_rx = jiffies; +#ifdef CONFIG_SMP + if (dev-last_rx != jiffies) +#endif + dev-last_rx = jiffies; /* it's OK to use per_cpu_ptr() because BHs are off */ pcpu_lstats = netdev_priv(dev); Although I didn't test it, I don't think it's ok. The key is __refcnt shares the same cache line with ops/input/output. Note it was unrelated to struct dst, but dirtying of one cache line of 'loopback netdevice' I tested it, and tbench result was better with this patch : 890 MB/s instead of 870 MB/s on a bi dual core machine. I tested your new patch and it doesn't help tbench. On my 8-core stoakley machine, the regression is only 5%, but it's 30% on 16-core tigerton. It looks like the scalability is poor. I was curious of the potential gain on your 16 cores
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 22:09:24 + Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Firstly, let's fix one thing at a time. Leave the sk_dst_get() thing alone until we can prove that it's part of the lockdep traces. In reproducing the problem, I obtained several lockdep traces that implicated sk_dst_get(). I changed the code to use __sk_dst_check() as you suggested and they went away. At that point, I was hopeful the locking issues were fixed. But after several minutes of creating/deleting hundreds of ppp sessions, lockdep dumped another error. It is that error that I posted yesterday. Next, I can't see why ppp_input() needs to be invoked with interrupts disabled. There are many other things that invoke that in software interrupt context, such as pppoe. I agree. I'm seeking advice on what the underlying cause is of this new trace. Please provide the lockdep traces without the ppp_input() IRQ disabling so this can be properly analyzed. The trace _was_ without ppp_input IRQ disabling. The trace doesn't occur if I disable IRQs around the ppp_input() call. The patch I sent showed the changes I made before running the tests that created the new lockdep trace. I'm sorry this wasn't clear. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { - if (p-net != net) - continue; - if ((p-dev p-dev-ifindex == ifindex) || + if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } If the values are: p-dev == NULL ifindex == 0 p-net != net The parms should not be taken into account and the looping must continue. But with this modification it is not the case, if we specify parms ifindex == 0, the first parms with the dev field set to NULL will be taken belonging or not to the right net. They should be taken. In the other case inetdev_event will fail for sure in the middle. You could check. These are ARP defaults and I do not see a problem for now to get them. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
Jarek Poplawski wrote: On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote: Jarek Poplawski wrote: Hi, It seems, this nice report is still uncomplete: could you check if there could have been something more yet? Unfortunately the ISP's syslog stops. But I've been able to borrow two Quad Xeon boxes and have reproduced the problem. Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Hmm... This is a really long report and quite a bit different from the previous one. I need some time for this. BTW: you sent before a lockdep report with hlist_lock problem. I think this could be fixed in some independent patch to make this all more readable. Are all the other changes in this current patch only because of this or previous lockdep report or for some other reasons (or reports) yet? As I mentioned in my reply to davem, modifying the pppol2tp driver as described in the patch I sent made the original lockdep problems go away. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
Denis V. Lunev wrote: On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { - if (p-net != net) - continue; - if ((p-dev p-dev-ifindex == ifindex) || + if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } If the values are: p-dev == NULL ifindex == 0 p-net != net The parms should not be taken into account and the looping must continue. But with this modification it is not the case, if we specify parms ifindex == 0, the first parms with the dev field set to NULL will be taken belonging or not to the right net. They should be taken. In the other case inetdev_event will fail for sure in the middle. You could check. These are ARP defaults and I do not see a problem for now to get them. Because there is a parms default per namespace. So several instances of them per nd table. That was the initial approach with Eric's patchset. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
On Tue, 2008-02-19 at 10:51 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { - if (p-net != net) - continue; - if ((p-dev p-dev-ifindex == ifindex) || + if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } If the values are: p-dev == NULL ifindex == 0 p-net != net The parms should not be taken into account and the looping must continue. But with this modification it is not the case, if we specify parms ifindex == 0, the first parms with the dev field set to NULL will be taken belonging or not to the right net. They should be taken. In the other case inetdev_event will fail for sure in the middle. You could check. These are ARP defaults and I do not see a problem for now to get them. Because there is a parms default per namespace. So several instances of them per nd table. That was the initial approach with Eric's patchset. These changes are not in mainstream and I do not want to touch ARP as this is not a simple thing. In reality ARP will be needed only when we'll have a real device inside a namespace. Right now I prefer to have minimal set of working changes to finish IP and upper layers. Regards, Den -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
Denis V. Lunev wrote: On Tue, 2008-02-19 at 10:51 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { - if (p-net != net) - continue; - if ((p-dev p-dev-ifindex == ifindex) || + if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } If the values are: p-dev == NULL ifindex == 0 p-net != net The parms should not be taken into account and the looping must continue. But with this modification it is not the case, if we specify parms ifindex == 0, the first parms with the dev field set to NULL will be taken belonging or not to the right net. They should be taken. In the other case inetdev_event will fail for sure in the middle. You could check. These are ARP defaults and I do not see a problem for now to get them. Because there is a parms default per namespace. So several instances of them per nd table. That was the initial approach with Eric's patchset. These changes are not in mainstream and I do not want to touch ARP as this is not a simple thing. In reality ARP will be needed only when we'll have a real device inside a namespace. Right now I prefer to have minimal set of working changes to finish IP and upper layers. core/neighbour.c is a common part between several protocols, especially ipv4 and ipv6. If you modify this function just to fit your need in the arp that will block me for ipv6 until you make parms default per namespace. So please, find another way to do that, perhaps just add a helper function. I suggest you do parms default per namespace first, it is quite small and easy :) Just let me the time to send the copy-parms-default function. Is it ok ? -- Daniel -- Daniel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 19, 2008 at 10:30:47AM +, Jarek Poplawski wrote: ... IMHO, just like I wrote earlier, the main problem is in ppp_generic(), ...or maybe ppp_generic.c? Whatever... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 19, 2008 at 09:03:12AM +, James Chapman wrote: David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 22:09:24 + Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Firstly, let's fix one thing at a time. Leave the sk_dst_get() thing alone until we can prove that it's part of the lockdep traces. In reproducing the problem, I obtained several lockdep traces that implicated sk_dst_get(). As a matter of fact I missed just that kind information on previous lockdep report, so if you could send them too this should be still helpful. ... I agree. I'm seeking advice on what the underlying cause is of this new trace. IMHO, just like I wrote earlier, the main problem is in ppp_generic(), especially ppp_connect_channel(), where main tx rx locks are used. I didn't know enough about this sk_dst_lock traces yet. I hope I could help with this, but after these changes I need some time to figure this out again. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
Denis V. Lunev wrote: Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { - if (p-net != net) - continue; - if ((p-dev p-dev-ifindex == ifindex) || + if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } If the values are: p-dev == NULL ifindex == 0 p-net != net The parms should not be taken into account and the looping must continue. But with this modification it is not the case, if we specify parms ifindex == 0, the first parms with the dev field set to NULL will be taken belonging or not to the right net. IMO the right test is: if (p-net == net ((p-dev p-dev-ifindex == ifindex) || !p-dev !ifindex))) I definitively prefer the first notation :) - if (p-net != net) - continue; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.26 1/5][SYSCTL]: Merge equal code in sysctl proc handlers.
The -read and -write callbacks act in a very similar way, so merge these paths to reduce the number of places to patch later. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- fs/proc/proc_sysctl.c | 50 ++-- 1 files changed, 11 insertions(+), 39 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 614c34b..5e31585 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -165,8 +165,8 @@ out: return err; } -static ssize_t proc_sys_read(struct file *filp, char __user *buf, - size_t count, loff_t *ppos) +static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf, + size_t count, loff_t *ppos, int write) { struct dentry *dentry = filp-f_dentry; struct ctl_table_header *head; @@ -190,12 +190,12 @@ static ssize_t proc_sys_read(struct file *filp, char __user *buf, * and won't be until we finish. */ error = -EPERM; - if (sysctl_perm(table, MAY_READ)) + if (sysctl_perm(table, write ? MAY_WRITE : MAY_READ)) goto out; /* careful: calling conventions are nasty here */ res = count; - error = table-proc_handler(table, 0, filp, buf, res, ppos); + error = table-proc_handler(table, write, filp, buf, res, ppos); if (!error) error = res; out: @@ -204,44 +204,16 @@ out: return error; } -static ssize_t proc_sys_write(struct file *filp, const char __user *buf, +static ssize_t proc_sys_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos) { - struct dentry *dentry = filp-f_dentry; - struct ctl_table_header *head; - struct ctl_table *table; - ssize_t error; - size_t res; - - table = do_proc_sys_lookup(dentry-d_parent, dentry-d_name, head); - /* Has the sysctl entry disappeared on us? */ - error = -ENOENT; - if (!table) - goto out; - - /* Has the sysctl entry been replaced by a directory? */ - error = -EISDIR; - if (!table-proc_handler) - goto out; - - /* -* At this point we know that the sysctl was not unregistered -* and won't be until we finish. -*/ - error = -EPERM; - if (sysctl_perm(table, MAY_WRITE)) - goto out; - - /* careful: calling conventions are nasty here */ - res = count; - error = table-proc_handler(table, 1, filp, (char __user *)buf, - res, ppos); - if (!error) - error = res; -out: - sysctl_head_finish(head); + return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0); +} - return error; +static ssize_t proc_sys_write(struct file *filp, const char __user *buf, + size_t count, loff_t *ppos) +{ + return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1); } -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target
David Miller wrote: From: Joonwoo Park [EMAIL PROTECTED] Date: Tue, 19 Feb 2008 11:53:24 +0900 [PATCH] netfilter: fix incorrect use of skb_make_writable http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park [EMAIL PROTECTED] I'll let Patrick pull this in, thanks! Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.26 2/5][SYSCTL]: Clean sysctls from unneeded extern and forward declarations.
The do_sysctl_strategy can be static since it's used in kernel/sysctl.c only. Besides, move it and parse_table above their callers and drop the forward declarations. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/linux/sysctl.h |5 -- kernel/sysctl.c| 144 +++- 2 files changed, 68 insertions(+), 81 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 571f01d..8e50196 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -981,11 +981,6 @@ extern int do_sysctl (int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen); -extern int do_sysctl_strategy (struct ctl_table *table, - int __user *name, int nlen, - void __user *oldval, size_t __user *oldlenp, - void __user *newval, size_t newlen); - extern ctl_handler sysctl_data; extern ctl_handler sysctl_string; extern ctl_handler sysctl_intvec; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 740e144..c224cc5 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -145,12 +145,6 @@ extern int no_unaligned_warning; extern int max_lock_depth; #endif -#ifdef CONFIG_SYSCTL_SYSCALL -static int parse_table(int __user *, int, void __user *, size_t __user *, - void __user *, size_t, struct ctl_table *); -#endif - - #ifdef CONFIG_PROC_SYSCTL static int proc_do_cad_pid(struct ctl_table *table, int write, struct file *filp, void __user *buffer, size_t *lenp, loff_t *ppos); @@ -1459,6 +1453,74 @@ void register_sysctl_root(struct ctl_table_root *root) } #ifdef CONFIG_SYSCTL_SYSCALL +/* Perform the actual read/write of a sysctl table entry. */ +static int do_sysctl_strategy (struct ctl_table *table, + int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen) +{ + int op = 0, rc; + + if (oldval) + op |= 004; + if (newval) + op |= 002; + if (sysctl_perm(table, op)) + return -EPERM; + + if (table-strategy) { + rc = table-strategy(table, name, nlen, oldval, oldlenp, +newval, newlen); + if (rc 0) + return rc; + if (rc 0) + return 0; + } + + /* If there is no strategy routine, or if the strategy returns +* zero, proceed with automatic r/w */ + if (table-data table-maxlen) { + rc = sysctl_data(table, name, nlen, oldval, oldlenp, +newval, newlen); + if (rc 0) + return rc; + } + return 0; +} + +static int parse_table(int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen, + struct ctl_table *table) +{ + int n; +repeat: + if (!nlen) + return -ENOTDIR; + if (get_user(n, name)) + return -EFAULT; + for ( ; table-ctl_name || table-procname; table++) { + if (!table-ctl_name) + continue; + if (n == table-ctl_name) { + int error; + if (table-child) { + if (sysctl_perm(table, 001)) + return -EPERM; + name++; + nlen--; + table = table-child; + goto repeat; + } + error = do_sysctl_strategy(table, name, nlen, + oldval, oldlenp, + newval, newlen); + return error; + } + } + return -ENOTDIR; +} + int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen) { @@ -1531,76 +1593,6 @@ int sysctl_perm(struct ctl_table *table, int op) return test_perm(table-mode, op); } -#ifdef CONFIG_SYSCTL_SYSCALL -static int parse_table(int __user *name, int nlen, - void __user *oldval, size_t __user *oldlenp, - void __user *newval, size_t newlen, - struct ctl_table *table) -{ - int n; -repeat: - if (!nlen) - return -ENOTDIR; - if (get_user(n, name)) - return -EFAULT; - for ( ; table-ctl_name || table-procname; table++) { - if (!table-ctl_name) - continue; - if
[PATCH net-2.6.26 4/5][SYSCTL]: Create the net sysctl root for RO tables.
This root keeps ctl tables in one global list, but doesn't allow for non-init namespaces to write into tables, stored in it. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- include/net/net_namespace.h |2 ++ net/sysctl_net.c| 33 + 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 28738b7..2930ae3 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -173,6 +173,8 @@ struct ctl_table; struct ctl_table_header; extern struct ctl_table_header *register_net_sysctl_table(struct net *net, const struct ctl_path *path, struct ctl_table *table); +extern struct ctl_table_header *register_init_net_ctl_table( + struct ctl_path *path, struct ctl_table *table); extern void unregister_net_sysctl_table(struct ctl_table_header *header); #endif /* __NET_NET_NAMESPACE_H */ diff --git a/net/sysctl_net.c b/net/sysctl_net.c index 665e856..42c99e6 100644 --- a/net/sysctl_net.c +++ b/net/sysctl_net.c @@ -40,6 +40,30 @@ static struct ctl_table_root net_sysctl_root = { .lookup = net_ctl_header_lookup, }; +static LIST_HEAD(net_ro_headers); + +static struct list_head *net_ctl_ro_header_lookup(struct ctl_table_root *root, + struct nsproxy *namespaces) +{ + return net_ro_headers; +} + +static int net_ctl_ro_permissions(struct ctl_table_root *root, + struct nsproxy *ns, struct ctl_table *table) +{ + int mode; + + mode = table-mode; + if (ns-net_ns != init_net) + mode = ~0222; + return mode; +} + +static struct ctl_table_root net_sysctl_ro_root = { + .lookup = net_ctl_ro_header_lookup, + .permissions = net_ctl_ro_permissions, +}; + static int sysctl_net_init(struct net *net) { INIT_LIST_HEAD(net-sysctl_table_headers); @@ -64,6 +88,7 @@ static __init int sysctl_init(void) if (ret) goto out; register_sysctl_root(net_sysctl_root); + register_sysctl_root(net_sysctl_ro_root); out: return ret; } @@ -80,6 +105,14 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net, } EXPORT_SYMBOL_GPL(register_net_sysctl_table); +struct ctl_table_header *register_init_net_ctl_table(struct ctl_path *path, + struct ctl_table *table) +{ + return __register_sysctl_paths(net_sysctl_ro_root, + init_nsproxy, path, table); +} +EXPORT_SYMBOL_GPL(register_net_ro_ctl_table); + void unregister_net_sysctl_table(struct ctl_table_header *header) { return unregister_sysctl_table(header); -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.26 5/5][SYSCTL]: Move some net.core sysctls to RO root.
There are many tables in net/core/sysctl_net_core.c that are to be read-only. Current implementation duplicates this array for each namespace just to clear the write bits in the permissions mask. Keep the writable tables to per-net ctl root and move the others to the read-only one. This saves some memory in run time and removes the... ugly code, that prepared the tables. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- net/core/sysctl_net_core.c | 35 +-- 1 files changed, 17 insertions(+), 18 deletions(-) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 130338f..4e530ce 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -125,14 +125,6 @@ static struct ctl_table net_core_table[] = { #endif /* CONFIG_XFRM */ #endif /* CONFIG_NET */ { - .ctl_name = NET_CORE_SOMAXCONN, - .procname = somaxconn, - .data = init_net.sysctl_somaxconn, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec - }, - { .ctl_name = NET_CORE_BUDGET, .procname = netdev_budget, .data = netdev_budget, @@ -151,6 +143,18 @@ static struct ctl_table net_core_table[] = { { .ctl_name = 0 } }; +static struct ctl_table netns_core_table[] = { + { + .ctl_name = NET_CORE_SOMAXCONN, + .procname = somaxconn, + .data = init_net.sysctl_somaxconn, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, + { .ctl_name = 0 } +}; + static __net_initdata struct ctl_path net_core_path[] = { { .procname = net, .ctl_name = CTL_NET, }, { .procname = core, .ctl_name = NET_CORE, }, @@ -159,23 +163,17 @@ static __net_initdata struct ctl_path net_core_path[] = { static __net_init int sysctl_core_net_init(struct net *net) { - struct ctl_table *tbl, *tmp; + struct ctl_table *tbl; net-sysctl_somaxconn = SOMAXCONN; - tbl = net_core_table; + tbl = netns_core_table; if (net != init_net) { - tbl = kmemdup(tbl, sizeof(net_core_table), GFP_KERNEL); + tbl = kmemdup(tbl, sizeof(netns_core_table), GFP_KERNEL); if (tbl == NULL) goto err_dup; - for (tmp = tbl; tmp-procname; tmp++) { - if (tmp-data = (void *)init_net - tmp-data (void *)(init_net + 1)) - tmp-data += (char *)net - (char *)init_net; - else - tmp-mode = ~0222; - } + tbl[0].data = net-sysctl_somaxconn; } net-sysctl_core_hdr = register_net_sysctl_table(net, @@ -209,6 +207,7 @@ static __net_initdata struct pernet_operations sysctl_core_ops = { static __init int sysctl_core_init(void) { + register_init_net_ctl_table(net_core_path, net_core_table); return register_pernet_subsys(sysctl_core_ops); } -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
On the Top of the World~『Textile Industry』
http://www.taiwannews.com.tw/static/express/080219-01.pdf http://www.taiwannews.com.tw/static/express/080219-02.pdf http://www.taiwannews.com.tw/static/express/080219-03.pdf http://www.taiwannews.com.tw/static/express/080219-04.pdf -- Powered by PHPlist, www.phplist.com -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac
David Miller wrote: From: David Miller [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 16:43:05 -0800 (PST) I think we can fix this easily by using __attribute_const_ on the print_mac() declaration. Let me play with that. Actually it seems the 'pure' attribute is more important here. Although it's not semantically a perfect match, what we need to tell the compiler is basically that: 1) the return value depends upon the inputs 2) if the input is not used, it's safe to avoid the call and 'pure' accomplishes that without any unwanted side-effects. I think this will not result in any unwanted over-optimization. Because if the inputs change in any way GCC has to emit the call. Any objections? This seems fine to me, thanks Dave. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cls_u32 u32_classify()
On Mon, 2008-18-02 at 21:46 -0800, David Miller wrote: Can some u32 expert review this? http://marc.info/?l=linux-netdevm=120178638323045w=2 cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.26 0/5][SYSCTL]: Make some sysctl RO in net namespaces.
Hi, David. Some time ago, when I made the net.core.somaxconn ctl per-namespace, you told that the approach I used to make some ctl tables read-only in namespace was not very good and said to improve it. After looking at other code, I decided, that many ctl variables will have to be read-only in namespace, so we need some generic way to do this. So, here's the patchset, that allows to create ctl tables, that are read-only in some namespace in general (and in some net namespace in particular). I tried to make it work the way not to consume extra memory at run time. This patchset is related to net namespaces only, but on the other hand it affects the core sysctl engine. What is your opinion about this set: should I send these patches (or some of them) to Andrew instead and wait till it appears in mainline (and sequentially in net tree) or will you accept this one in net-2.6.26? Thanks, Pavel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.26 3/5][SYSCTL]: Add the -permissions callback on the ctl_table_root.
When the table came from some other root, this root may affect the table's permissions, depending on who is working with the table. The core hunk is at the bottom of this patch. All the rest is just pushing the ctl_table_root argument up to the sysctl_perm function. This will be mostly (only?) used in the net sysctls. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- fs/proc/proc_sysctl.c |4 ++-- include/linux/sysctl.h |7 ++- kernel/sysctl.c| 25 ++--- 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 5e31585..5acc001 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -190,7 +190,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf, * and won't be until we finish. */ error = -EPERM; - if (sysctl_perm(table, write ? MAY_WRITE : MAY_READ)) + if (sysctl_perm(head-root, table, write ? MAY_WRITE : MAY_READ)) goto out; /* careful: calling conventions are nasty here */ @@ -388,7 +388,7 @@ static int proc_sys_permission(struct inode *inode, int mask, struct nameidata * goto out; /* Use the permissions on the sysctl table entry */ - error = sysctl_perm(table, mask); + error = sysctl_perm(head-root, table, mask); out: sysctl_head_finish(head); return error; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 8e50196..3239561 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -945,11 +945,14 @@ enum /* For the /proc/sys support */ struct ctl_table; struct nsproxy; +struct ctl_table_root; + extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev); extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces, struct ctl_table_header *prev); extern void sysctl_head_finish(struct ctl_table_header *prev); -extern int sysctl_perm(struct ctl_table *table, int op); +extern int sysctl_perm(struct ctl_table_root *root, + struct ctl_table *table, int op); typedef struct ctl_table ctl_table; @@ -1049,6 +1052,8 @@ struct ctl_table_root { struct list_head header_list; struct list_head *(*lookup)(struct ctl_table_root *root, struct nsproxy *namespaces); + int (*permissions)(struct ctl_table_root *root, + struct nsproxy *namespaces, struct ctl_table *table); }; /* struct ctl_table_header is used to maintain dynamic lists of diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c224cc5..8b8b582 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1454,7 +1454,8 @@ void register_sysctl_root(struct ctl_table_root *root) #ifdef CONFIG_SYSCTL_SYSCALL /* Perform the actual read/write of a sysctl table entry. */ -static int do_sysctl_strategy (struct ctl_table *table, +static int do_sysctl_strategy (struct ctl_table_root *root, + struct ctl_table *table, int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen) @@ -1465,7 +1466,7 @@ static int do_sysctl_strategy (struct ctl_table *table, op |= 004; if (newval) op |= 002; - if (sysctl_perm(table, op)) + if (sysctl_perm(root, table, op)) return -EPERM; if (table-strategy) { @@ -1491,6 +1492,7 @@ static int do_sysctl_strategy (struct ctl_table *table, static int parse_table(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen, + struct ctl_table_root *root, struct ctl_table *table) { int n; @@ -1505,14 +1507,14 @@ repeat: if (n == table-ctl_name) { int error; if (table-child) { - if (sysctl_perm(table, 001)) + if (sysctl_perm(root, table, 001)) return -EPERM; name++; nlen--; table = table-child; goto repeat; } - error = do_sysctl_strategy(table, name, nlen, + error = do_sysctl_strategy(root, table, name, nlen, oldval, oldlenp, newval, newlen); return error; @@ -1538,7 +1540,8 @@ int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *ol for (head = sysctl_head_next(NULL); head;
[PATCH] [NETNS]: Namespace leak in pneigh_lookup.
release_net is missed on the error path in pneigh_lookup. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7bb6a9a..174e29e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -507,6 +507,7 @@ struct pneigh_entry * pneigh_lookup(struct neigh_table *tbl, if (tbl-pconstructor tbl-pconstructor(n)) { if (dev) dev_put(dev); + release_net(net); kfree(n); n = NULL; goto out; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETFILTER]: Introduce nf_inet_address
On Tue, 2008-01-29 at 13:16 +, Linux Kernel Mailing List wrote: Commit: 643a2c15a407faf08101a20e1a3461160711899d [NETFILTER]: Introduce nf_inet_address A few netfilter modules provide their own union of IPv4 and IPv6 address storage. Will unify that in this patch series. (1/4): Rename union nf_conntrack_address to union nf_inet_addr and move it to x_tables.h. Signed-off-by: Jan Engelhardt [EMAIL PROTECTED] Signed-off-by: Patrick McHardy [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] ... --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -48,6 +48,12 @@ enum nf_inet_hooks { NF_INET_NUMHOOKS }; +union nf_inet_addr { + u_int32_t all[4]; + __be32 ip; + __be32 ip6[4]; +}; + #ifdef __KERNEL__ #ifdef CONFIG_NETFILTER This breaks the busybox build: CC ipsvd/tcpudp.o In file included from /usr/include/linux/netfilter_ipv4.h:8, from ipsvd/tcpudp.c:33: /usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list before 'u_int32_t' What is this 'u_int32_t' nonsense anyway? If a user-visible header is likely to be included by libc directly from a 'standard' header, it may not require stdint.h. Therefore it should use the system-specific types such as '__u32'. If it isn't likely to be included by libc, which is the case for netfilter, then it might as well just use the proper C types. Those who are stuck on C89 or earlier might still prefer to use '__u32' even when there's no need for it, but 'u_int32_t' is just silly. I suspect we should eradicate it. I couldn't make busybox work with it -- __BIT_TYPES_DEFINED__ is defined in sys/types.h and prevents the definitions of u_int32_t et al from appearing in linux/types.h. And if I include linux/types.h first, other things break. A later commit adds struct in_addr and struct in6_addr to this union too, which breaks busybox even harder. How is this supposed to be used in userspace? Or is it even supposed to be exposed? -- dwmw2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/17 net-2.6.26] [NETFILTER]: Consolidate masq_inet_event and masq_device_event.
Denis V. Lunev wrote: They do exactly the same job. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/netfilter/ipt_MASQUERADE.c | 14 ++ 1 files changed, 2 insertions(+), 12 deletions(-) Looks fine. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: protocol 0300 is buggy spam in dmesg when injectingcapturing on same interface
On Mon, Feb 18, 2008 at 05:39:03PM +0200, Pekka Pietikainen wrote: When playing with some L2 level fuzzing I started getting lots of protocol 0300 is buggy, dev eth3 spew in dmesg. That interface is also capturing the traffic that's being sent, that's probably why the dev_queue_xmit_nit codepath is getting called in the first place. Any ideas? Add a If it came from AF_PACKET, don't print out anything to that if-statement? I'm probably just plastering over a bug in af_packet.c with this one, but the following patch should make it shut up. The printk definately needs a ntohs for skb2-protocol, took me a while to figure out where the 0300 even came from :-) Signed-off-by: Pekka Pietikainen [EMAIL PROTECTED] diff -up linux-2.6.24.i686/net/core/dev.c.orig linux-2.6.24.i686/net/core/dev.c --- linux-2.6.24.i686/net/core/dev.c.orig 2008-02-19 15:22:12.0 +0200 +++ linux-2.6.24.i686/net/core/dev.c2008-02-19 15:29:37.0 +0200 @@ -1262,10 +1262,11 @@ static void dev_queue_xmit_nit(struct sk if (skb_network_header(skb2) skb2-data || skb2-network_header skb2-tail) { - if (net_ratelimit()) - printk(KERN_CRIT protocol %04x is - buggy, dev %s\n, - skb2-protocol, dev-name); + if (skb2-protocol != htons(ETH_P_ALL) + net_ratelimit()) + pr_crit(protocol %04x is buggy, dev %s\n, + ntohs(skb2-protocol), + dev-name); skb_reset_network_header(skb2); } -- Pekka Pietikainen -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [NETLABEL] Minor cleanup: remove unused method definition
Hi, This patch removes definition of netlbl_cfg_cipsov4_del() method in netlabel/netlabel_kapi.c and in include/net/netlabel.h as it is not used. Regards, Rami Rosen Signed-off-by: Rami Rosen [EMAIL PROTECTED] diff --git a/include/net/netlabel.h b/include/net/netlabel.h index 0ca67d7..911d8c6 100644 --- a/include/net/netlabel.h +++ b/include/net/netlabel.h @@ -350,7 +350,6 @@ int netlbl_cfg_cipsov4_add(struct cipso_v4_doi *doi_def, int netlbl_cfg_cipsov4_add_map(struct cipso_v4_doi *doi_def, const char *domain, struct netlbl_audit *audit_info); -int netlbl_cfg_cipsov4_del(u32 doi, struct netlbl_audit *audit_info); /* * LSM security attribute operations @@ -408,11 +407,6 @@ static inline int netlbl_cfg_cipsov4_add_map(struct cipso_v4_doi *doi_def, { return -ENOSYS; } -static inline int netlbl_cfg_cipsov4_del(u32 doi, -struct netlbl_audit *audit_info) -{ - return -ENOSYS; -} static inline int netlbl_secattr_catmap_walk( struct netlbl_lsm_secattr_catmap *catmap, u32 offset) diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c index 39793a1..02b268d 100644 --- a/net/netlabel/netlabel_kapi.c +++ b/net/netlabel/netlabel_kapi.c @@ -203,21 +203,6 @@ cfg_cipsov4_add_map_failure: return ret_val; } -/** - * netlbl_cfg_cipsov4_del - Removean existing CIPSOv4 DOI definition - * @doi: the CIPSO DOI value - * @audit_info: NetLabel audit information - * - * Description: - * Removes an existing CIPSOv4 DOI definition from the NetLabel subsystem. - * Returns zero on success, negative values on failure. - * - */ -int netlbl_cfg_cipsov4_del(u32 doi, struct netlbl_audit *audit_info) -{ - return cipso_v4_doi_remove(doi, audit_info, netlbl_cipsov4_doi_free); -} - /* * Security Attribute Functions */
Re: [NETFILTER]: Introduce nf_inet_address
On Tue, 2008-02-19 at 15:01 +0100, Patrick McHardy wrote: David Woodhouse wrote: +union nf_inet_addr { + u_int32_t all[4]; + __be32 ip; + __be32 ip6[4]; +}; + #ifdef __KERNEL__ #ifdef CONFIG_NETFILTER This breaks the busybox build: CC ipsvd/tcpudp.o In file included from /usr/include/linux/netfilter_ipv4.h:8, from ipsvd/tcpudp.c:33: /usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list before 'u_int32_t' What is this 'u_int32_t' nonsense anyway? If a user-visible header is likely to be included by libc directly from a 'standard' header, it may not require stdint.h. Therefore it should use the system-specific types such as '__u32'. Right, I queued this patch to fix it. That does the trick -- but are we using u_int32_t elsewhere in user-visible headers? Does it work there? How? A later commit adds struct in_addr and struct in6_addr to this union too, which breaks busybox even harder. Thats odd, the iptables headers have always used struct in_addr and struct in6_addr in struct ipt_ip/struct ip6t_ip6, which are also used by userspace. What is ipsvd/tcpudp.c? I couldn't find it in the Debian busybox source. It's in busybox 1.9.1. Just including netinet/in.h seems to be sufficient to make it happy again. I wonder if netfilter.h should include that for itself? -- dwmw2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AW: AW: Problem receiving multicast with kernel.2.6.24 #3
Reither Robert wrote: Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, Stand G16 OK, found one solution (but i think, this should not the normal way) MC joining first the VLAN device (eth0.3) and than the physical (eth0) does the trick, 100% success ... Doing it the opposite way, gives the same problem as before ... So somehow joining the VLAN interface disrupts the join to the physical one ... Ideas ? Does this patch help? commit 6548b91f39381b2c5f02f99c14734546354bff89 Author: Jorge Boncompte [DTI2] [EMAIL PROTECTED] Date: Mon Feb 18 16:02:01 2008 +0100 [NET]: Messed multicast lists after dev_mc_sync/unsync Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you introduced a new field da_synced to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED] Signed-off-by: Patrick McHardy [EMAIL PROTECTED] diff --git a/net/core/dev.c b/net/core/dev.c index 6cfc123..9516105 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int *count, } } - da = kmalloc(sizeof(*da), GFP_ATOMIC); + da = kzalloc(sizeof(*da), GFP_ATOMIC); if (da == NULL) return -ENOMEM; memcpy(da-da_addr, addr, alen);
Re: [NETFILTER]: Introduce nf_inet_address
David Woodhouse wrote: +union nf_inet_addr { + u_int32_t all[4]; + __be32 ip; + __be32 ip6[4]; +}; + #ifdef __KERNEL__ #ifdef CONFIG_NETFILTER This breaks the busybox build: CC ipsvd/tcpudp.o In file included from /usr/include/linux/netfilter_ipv4.h:8, from ipsvd/tcpudp.c:33: /usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list before 'u_int32_t' What is this 'u_int32_t' nonsense anyway? If a user-visible header is likely to be included by libc directly from a 'standard' header, it may not require stdint.h. Therefore it should use the system-specific types such as '__u32'. Right, I queued this patch to fix it. If it isn't likely to be included by libc, which is the case for netfilter, then it might as well just use the proper C types. Those who are stuck on C89 or earlier might still prefer to use '__u32' even when there's no need for it, but 'u_int32_t' is just silly. I suspect we should eradicate it. Yes, some more consitency would be nice. So far the consensus was to not use it in new code, but keep using it in subsystems like netfilter that (almost) consistently use it everywhere. I couldn't make busybox work with it -- __BIT_TYPES_DEFINED__ is defined in sys/types.h and prevents the definitions of u_int32_t et al from appearing in linux/types.h. And if I include linux/types.h first, other things break. A later commit adds struct in_addr and struct in6_addr to this union too, which breaks busybox even harder. Thats odd, the iptables headers have always used struct in_addr and struct in6_addr in struct ipt_ip/struct ip6t_ip6, which are also used by userspace. What is ipsvd/tcpudp.c? I couldn't find it in the Debian busybox source. How is this supposed to be used in userspace? Or is it even supposed to be exposed? Yes, its meant to replace many self-made AF-independant address representations. commit 6f2e68f81457d72bdb14a7ead305a1da06e78893 Author: Patrick McHardy [EMAIL PROTECTED] Date: Tue Feb 19 14:54:36 2008 +0100 [NETFILTER]: Use __u32 in struct nf_inet_addr As reported by David Woodhouse [EMAIL PROTECTED], using u_int32_t in struct nf_inet_addr breaks the busybox build. Fix by using __u32. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index d74e79b..b74b615 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -51,7 +51,7 @@ enum nf_inet_hooks { }; union nf_inet_addr { - u_int32_t all[4]; + __u32 all[4]; __be32 ip; __be32 ip6[4]; struct in_addr in;
Re: [ANNOUNCE] ESFQ -- SFQ patches for Linux 2.6.24
David Miller wrote: From: Brock Noland [EMAIL PROTECTED] Date: Sat, 9 Feb 2008 20:30:58 -0600 Is this going to be merged anytime soon? If it gets submitted to the proper mailing list, it might. 'linux-net' is for user questions, it is not where the networking developers hang out, 'netdev' is. And you have to post patches for review, not URL's point to the patches. It has to be int he email, in an applyable form so people can review the thing properly. Since SFQ is not exactly simple and I needed something like this myself, I followed Paul's suggestion and added a new scheduler (DRR) for this with more flexible limits. I'll rediff against net-2.6.26 within the next days and send a final version for review (anyone interested is welcome to already review this version of course :). commit 13d0cc64d0f7fed945c357cf4ca43330c8f95ad2 Author: Patrick McHardy [EMAIL PROTECTED] Date: Mon Feb 18 22:21:55 2008 +0100 [NET_SCHED]: Add DRR scheduler Signed-off-by: Patrick McHardy [EMAIL PROTECTED] diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index dbb7ac3..2fca9c4 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -482,4 +482,20 @@ struct tc_netem_corrupt #define NETEM_DIST_SCALE 8192 +/* DRR */ + +enum +{ + TCA_DRR_UNSPEC, + TCA_DRR_QUANTUM, + __TCA_DRR_MAX +}; + +#define TCA_DRR_MAX(__TCA_DRR_MAX - 1) + +struct tc_drr_stats +{ + s32 deficit; +}; + #endif diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 82adfe6..7e1ab99 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -196,6 +196,9 @@ config NET_SCH_NETEM If unsure, say N. +config NET_SCH_DRR + tristate DRR scheduler + config NET_SCH_INGRESS tristate Ingress Qdisc depends on NET_CLS_ACT diff --git a/net/sched/Makefile b/net/sched/Makefile index 1d2b0f7..b055f74 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -28,6 +28,7 @@ obj-$(CONFIG_NET_SCH_TEQL)+= sch_teql.o obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o obj-$(CONFIG_NET_SCH_NETEM)+= sch_netem.o +obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c new file mode 100644 index 000..aa241b5 --- /dev/null +++ b/net/sched/sch_drr.c @@ -0,0 +1,534 @@ +/* + * net/sched/sch_drr.c Deficit Round Robin scheduler + * + * Copyright (c) 2008 Patrick McHardy [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + */ + +#include linux/module.h +#include linux/init.h +#include linux/errno.h +#include linux/netdevice.h +#include linux/pkt_sched.h +#include net/sch_generic.h +#include net/pkt_sched.h +#include net/pkt_cls.h + +struct drr_class { + struct hlist_node hlist; + u32 classid; + unsigned intrefcnt; + + struct gnet_stats_basic bstats; + struct gnet_stats_queue qstats; + struct gnet_stats_rate_est rate_est; + struct list_headalist; + struct Qdisc * qdisc; + + u32 quantum; + s32 deficit; +}; + +#define DRR_HSIZE 16 + +struct drr_sched { + struct list_headactive; + struct tcf_proto * filter_list; + unsigned intfilter_cnt; + struct hlist_head clhash[DRR_HSIZE]; + struct sk_buff *requeue; +}; + +static unsigned int drr_hash(u32 h) +{ + h ^= h 8; + h ^= h 4; + + return h (DRR_HSIZE - 1); +} + +static struct drr_class *drr_find_class(struct Qdisc *sch, u32 classid) +{ + struct drr_sched *q = qdisc_priv(sch); + struct drr_class *cl; + struct hlist_node *n; + + hlist_for_each_entry(cl, n, q-clhash[drr_hash(classid)], hlist) { + if (cl-classid == classid) + return cl; + } + return NULL; +} + +static void drr_purge_queue(struct drr_class *cl) +{ + unsigned int len = cl-qdisc-q.qlen; + + qdisc_reset(cl-qdisc); + qdisc_tree_decrease_qlen(cl-qdisc, len); +} + +static const struct nla_policy drr_policy[TCA_DRR_MAX + 1] = { + [TCA_DRR_QUANTUM] = { .type = NLA_U32 }, +}; + +static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid, + struct nlattr **tca, unsigned long *arg) +{ + struct drr_sched *q = qdisc_priv(sch); + struct drr_class *cl = (struct drr_class *)*arg; + struct nlattr
AW: AW: Problem receiving multicast with kernel.2.6.24 #3
Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, Stand G16 OK, found one solution (but i think, this should not the normal way) MC joining first the VLAN device (eth0.3) and than the physical (eth0) does the trick, 100% success ... Doing it the opposite way, gives the same problem as before ... So somehow joining the VLAN interface disrupts the join to the physical one ... Ideas ? Robert -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETFILTER]: Introduce nf_inet_address
David Woodhouse wrote: On Tue, 2008-02-19 at 15:01 +0100, Patrick McHardy wrote: David Woodhouse wrote: +union nf_inet_addr { + u_int32_t all[4]; + __be32 ip; + __be32 ip6[4]; +}; + #ifdef __KERNEL__ #ifdef CONFIG_NETFILTER This breaks the busybox build: CC ipsvd/tcpudp.o In file included from /usr/include/linux/netfilter_ipv4.h:8, from ipsvd/tcpudp.c:33: /usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list before 'u_int32_t' What is this 'u_int32_t' nonsense anyway? If a user-visible header is likely to be included by libc directly from a 'standard' header, it may not require stdint.h. Therefore it should use the system-specific types such as '__u32'. Right, I queued this patch to fix it. That does the trick -- but are we using u_int32_t elsewhere in user-visible headers? Does it work there? How? Its used in nearly every ip_tables header file. Those are most likely not included by glibc and iptables includes sys/types.h. Besides iptables I'm only aware of a perl module that uses these files, which is probably also including sys/types.h. A later commit adds struct in_addr and struct in6_addr to this union too, which breaks busybox even harder. Thats odd, the iptables headers have always used struct in_addr and struct in6_addr in struct ipt_ip/struct ip6t_ip6, which are also used by userspace. What is ipsvd/tcpudp.c? I couldn't find it in the Debian busybox source. It's in busybox 1.9.1. Just including netinet/in.h seems to be sufficient to make it happy again. I wonder if netfilter.h should include that for itself? That would break iptables compilation, which already includes linux/in.h in some files. I guess the best fix for now is to include netinet/in.h in busybox and long-term clean this up properly. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
AW: AW: AW: Problem receiving multicast with kernel.2.6.24, REAL solution
Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, Stand G16 Hurray, your patch does the trick for me !!! Now join/leave und receiving of multicasts work like it should !! May i kiss you Patrick, this saves my week ;-) Hope we see a minor kernel release with this patch very soon ... Greetings Robert -Ursprüngliche Nachricht- Von: Patrick McHardy [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 19. Februar 2008 15:03 An: Reither Robert Cc: David Stevens; netdev@vger.kernel.org Betreff: Re: AW: AW: Problem receiving multicast with kernel.2.6.24 #3 OK, found one solution (but i think, this should not the normal way) MC joining first the VLAN device (eth0.3) and than the physical (eth0) does the trick, 100% success ... Doing it the opposite way, gives the same problem as before ... So somehow joining the VLAN interface disrupts the join to the physical one ... Ideas ? Does this patch help? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.
Daniel Lezcano wrote: Denis V. Lunev wrote: On Tue, 2008-02-19 at 10:51 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote: Denis V. Lunev wrote: Default ARP parameters should be findable regardless of the context. Required to make inetdev_event working. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/neighbour.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index c895ad4..45ed620 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,9 +1275,7 @@ static inline struct neigh_parms *lookup_neigh_params(struct neigh_table *tbl, struct neigh_parms *p; for (p = tbl-parms; p; p = p-next) { -if (p-net != net) -continue; -if ((p-dev p-dev-ifindex == ifindex) || +if ((p-dev p-dev-ifindex == ifindex p-net == net) || (!p-dev !ifindex)) return p; } If the values are: p-dev == NULL ifindex == 0 p-net != net The parms should not be taken into account and the looping must continue. But with this modification it is not the case, if we specify parms ifindex == 0, the first parms with the dev field set to NULL will be taken belonging or not to the right net. They should be taken. In the other case inetdev_event will fail for sure in the middle. You could check. These are ARP defaults and I do not see a problem for now to get them. Because there is a parms default per namespace. So several instances of them per nd table. That was the initial approach with Eric's patchset. These changes are not in mainstream and I do not want to touch ARP as this is not a simple thing. In reality ARP will be needed only when we'll have a real device inside a namespace. Right now I prefer to have minimal set of working changes to finish IP and upper layers. core/neighbour.c is a common part between several protocols, especially ipv4 and ipv6. If you modify this function just to fit your need in the arp that will block me for ipv6 until you make parms default per namespace. So please, find another way to do that, perhaps just add a helper function. I suggest you do parms default per namespace first, it is quite small and easy :) Just let me the time to send the copy-parms-default function. Ok, so after a long discussion with Denis about this patch, I will change the ipv6 code to share the neigh-parms. It is not a problem. Having the behavior of the neighbour subsystem per namespace is not a must-have. Acked-by: Daniel Lezcano [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET]: Messed multicast lists after dev_mc_sync/unsync
commit 6548b91f39381b2c5f02f99c14734546354bff89 Author: Jorge Boncompte [DTI2] [EMAIL PROTECTED] Date: Mon Feb 18 16:02:01 2008 +0100 [NET]: Messed multicast lists after dev_mc_sync/unsync Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you introduced a new field da_synced to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED] Signed-off-by: Patrick McHardy [EMAIL PROTECTED] diff --git a/net/core/dev.c b/net/core/dev.c index 6cfc123..9516105 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int *count, } } - da = kmalloc(sizeof(*da), GFP_ATOMIC); + da = kzalloc(sizeof(*da), GFP_ATOMIC); if (da == NULL) return -ENOMEM; memcpy(da-da_addr, addr, alen);
Re: [net-2.6][DRIVER][VETH] fix dev refcount race
Daniel Lezcano wrote: Subject: veth fix dev refcount race From: Daniel Lezcano [EMAIL PROTECTED] When deleting the veth driver, veth_close calls netif_carrier_off for the two extremities of the network device. netif_carrier_off on the peer device will fire an event and hold a reference on the peer device. Just after, the peer is unregistered taking the rtnl_lock while the linkwatch_event is scheduled. If __linkwatch_run_queue does not occurs before the unregistering, unregister_netdevice will wait for the dev refcount to reach zero holding the rtnl_lock and linkwatch_event Brr... AFAIS the unregister process waits for refcount without the rtnl_lock. The run-todo was invented for this. No? will wait for the rtnl_lock and hold the dev refcount. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- drivers/net/veth.c | 53 - 1 file changed, 40 insertions(+), 13 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][SCTP]: Pick up an orphaned sctp_sockets_allocated counter.
This counter is currently write-only. Drawing an analogy with the similar tcp counter, I think that this one should be pointed by the sockets_allocated members of sctp_prot and sctpv6_prot. Or should it be instead removed at all? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/sctp/socket.c b/net/sctp/socket.c index d47d578..44797ad 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6488,6 +6488,7 @@ struct proto sctp_prot = { .memory_pressure = sctp_memory_pressure, .enter_memory_pressure = sctp_enter_memory_pressure, .memory_allocated = sctp_memory_allocated, + .sockets_allocated = sctp_sockets_allocated, REF_PROTO_INUSE(sctp) }; @@ -6521,6 +6522,7 @@ struct proto sctpv6_prot = { .memory_pressure = sctp_memory_pressure, .enter_memory_pressure = sctp_enter_memory_pressure, .memory_allocated = sctp_memory_allocated, + .sockets_allocated = sctp_sockets_allocated, REF_PROTO_INUSE(sctpv6) }; #endif /* defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) */ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Lksctp-developers] [PATCH][SCTP]: Pick up an orphaned sctp_sockets_allocated counter.
Pavel Emelyanov wrote: This counter is currently write-only. Drawing an analogy with the similar tcp counter, I think that this one should be pointed by the sockets_allocated members of sctp_prot and sctpv6_prot. Or should it be instead removed at all? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Ack. Looks like it got missed. It should be added. -vlad --- diff --git a/net/sctp/socket.c b/net/sctp/socket.c index d47d578..44797ad 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6488,6 +6488,7 @@ struct proto sctp_prot = { .memory_pressure = sctp_memory_pressure, .enter_memory_pressure = sctp_enter_memory_pressure, .memory_allocated = sctp_memory_allocated, + .sockets_allocated = sctp_sockets_allocated, REF_PROTO_INUSE(sctp) }; @@ -6521,6 +6522,7 @@ struct proto sctpv6_prot = { .memory_pressure = sctp_memory_pressure, .enter_memory_pressure = sctp_enter_memory_pressure, .memory_allocated = sctp_memory_allocated, + .sockets_allocated = sctp_sockets_allocated, REF_PROTO_INUSE(sctpv6) }; #endif /* defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) */ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Lksctp-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lksctp-developers -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...
Andrew Morton wrote: On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote: ... and possibly reboot/poweroff (it flows by too fast to be legible). [ 8803.850634] ACPI: Preparing to enter system sleep state S3 [ 8803.853141] Suspending console(s) [ 8805.287505] serial 00:09: disabled [ 8805.291564] Trying to free already-free IRQ 11 [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2 [ 8805.291628] [c0152127] free_irq+0xb7/0x130 [ 8805.291675] [c024bd80] e100_suspend+0xc0/0x100 [ 8805.291724] [c01eaa36] pci_device_suspend+0x26/0x70 [ 8805.291747] [c0243674] suspend_device+0x94/0xd0 [ 8805.291763] [c02439a3] device_suspend+0x153/0x240 [ 8805.291784] [c014314f] suspend_devices_and_enter+0x4f/0xf0 [ 8805.291808] [c0143a5f] ? freeze_processes+0x3f/0x80 [ 8805.291825] [c01432fa] enter_state+0xaa/0x140 [ 8805.291840] [c014341f] state_store+0x8f/0xd0 [ 8805.291852] [c0143390] ? state_store+0x0/0xd0 [ 8805.291866] [c01d3404] kobj_attr_store+0x24/0x30 [ 8805.291901] [c01b547b] sysfs_write_file+0xbb/0x110 [ 8805.291936] [c0177d79] vfs_write+0x99/0x130 [ 8805.291963] [c01b53c0] ? sysfs_write_file+0x0/0x110 [ 8805.291979] [c01782fd] sys_write+0x3d/0x70 [ 8805.291998] [c010409a] sysenter_past_esp+0x5f/0xa5 [ 8805.292038] === [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled [ 8805.376670] hwsleep-0322 [00] enter_sleep_state : Entering sleep state [S3] [ 8805.376670] Back to C! Interface is unused normally (only for netconsole sometimes). dmesg and config attached. Does reverting this: commit 8543da6672b0994921f014f2250e27ae81645580 Author: Auke Kok [EMAIL PROTECTED] Date: Wed Dec 12 16:30:42 2007 -0800 e100: free IRQ to remove warningwhenrebooting with this patch: --- a/drivers/net/e100.c~revert-1 +++ a/drivers/net/e100.c @@ -2804,9 +2804,8 @@ static int e100_suspend(struct pci_dev * pci_enable_wake(pdev, PCI_D3cold, 0); } - free_irq(pdev-irq, netdev); - pci_disable_device(pdev); + free_irq(pdev-irq, netdev); pci_set_power_state(pdev, PCI_D3hot); return 0; @@ -2848,8 +2847,6 @@ static void e100_shutdown(struct pci_dev pci_enable_wake(pdev, PCI_D3cold, 0); } - free_irq(pdev-irq, netdev); - pci_disable_device(pdev); pci_set_power_state(pdev, PCI_D3hot); } _ fix it? Hmm ... after resume device has disappeared at all ... {pts/1}% cat /proc/interrupts CPU0 0:1290492XT-PIC-XTtimer 1: 6675XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 3: 2XT-PIC-XT 4: 2XT-PIC-XT 5: 3XT-PIC-XT 7: 4XT-PIC-XTirda0 8: 0XT-PIC-XTrtc0 9:583XT-PIC-XTacpi 10: 2XT-PIC-XT 11: 31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 5451, pcmcia0.0 12: 28070XT-PIC-XTi8042 14: 21705XT-PIC-XTide0 15: 82123XT-PIC-XTide1 NMI: 0 Non-maskable interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 I hope that's not a separate bug... I'll take a look at this as well. thanks for reporting. Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [2.6.25-rc2] igb: Fix panic with NICs with 1000BASE-SX PHY
From: Bill Hayes [EMAIL PROTECTED] Auke, Jeff and David, This patch eliminates a kernel panic with the igb driver in 2.6.25-rc2 when running on a Intel 82575 Ethernet controller with a 1000BASE-SX PHY. The panic does not happen with the 1000BASE-T PHY, only with a SX connection. Signed-off-by: Bill Hayes [EMAIL PROTECTED] Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] drivers/net/igb/igb_main.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index d4eb8e2..666c8f0 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -843,7 +843,8 @@ void igb_reset(struct igb_adapter *adapter) wr32(E1000_VET, ETHERNET_IEEE_VLAN_TYPE); igb_reset_adaptive(adapter-hw); - adapter-hw.phy.ops.get_phy_info(adapter-hw); + if (adapter-hw.phy.ops.get_phy_info) + adapter-hw.phy.ops.get_phy_info(adapter-hw); igb_release_manageability(adapter); } @@ -2083,7 +2084,8 @@ static void igb_set_multi(struct net_device *netdev) static void igb_update_phy_info(unsigned long data) { struct igb_adapter *adapter = (struct igb_adapter *) data; - adapter-hw.phy.ops.get_phy_info(adapter-hw); + if (adapter-hw.phy.ops.get_phy_info) + adapter-hw.phy.ops.get_phy_info(adapter-hw); } /** -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25-rc2, 2.6.24-rc8] page allocation failure...
Andrew Morton wrote: On Sun, 17 Feb 2008 13:20:59 + Daniel J Blueman [EMAIL PROTECTED] wrote: I'm still hitting this with e1000e on 2.6.25-rc2, 10 times again. are you sure? I don't think that's the case and you're seeing e1000 dumps here... It's clearly non-fatal, but then do we expect it to occur? Daniel --- [dmesg] [ 1250.822786] swapper: page allocation failure. order:3, mode:0x4020 [ 1250.822786] Pid: 0, comm: swapper Not tainted 2.6.25-rc2-119 #2 [ 1250.822786] [ 1250.822786] Call Trace: [ 1250.822786] IRQ [8025fe9e] __alloc_pages+0x34e/0x3a0 [ 1250.822786] [8048c6df] ? __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8027acc2] __slab_alloc+0x102/0x3d0 [ 1250.822786] [8048c6df] ? __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8027b8cb] __kmalloc_track_caller+0x7b/0xc0 [ 1250.822786] [8048b74f] __alloc_skb+0x6f/0x160 [ 1250.822786] [8048c6df] __netdev_alloc_skb+0x1f/0x40 [ 1250.822786] [8042652d] e1000_alloc_rx_buffers+0x1ed/0x260 [ 1250.822786] [80426b5a] e1000_clean_rx_irq+0x22a/0x330 [ 1250.822786] [80422981] e1000_clean+0x1e1/0x540 [ 1250.822786] [8024b7a5] ? tick_program_event+0x45/0x70 [ 1250.822786] [804930ba] net_rx_action+0x9a/0x150 [ 1250.822786] [802336b4] __do_softirq+0x74/0xf0 [ 1250.822786] [8020c5fc] call_softirq+0x1c/0x30 [ 1250.822786] [8020eaad] do_softirq+0x3d/0x80 [ 1250.822786] [80233635] irq_exit+0x85/0x90 [ 1250.822786] [8020eba5] do_IRQ+0x85/0x100 [ 1250.822786] [8020a5b0] ? mwait_idle+0x0/0x50 [ 1250.822786] [8020b981] ret_from_intr+0x0/0xa [ 1250.822786] EOI [8020a5f5] ? mwait_idle+0x45/0x50 [ 1250.822786] [80209a92] ? enter_idle+0x22/0x30 [ 1250.822786] [8020a534] ? cpu_idle+0x74/0xa0 [ 1250.822786] [80527825] ? rest_init+0x55/0x60 They're regularly reported with e1000 too - I don't think aything really changed. e1000 has this crazy problem where because of a cascade of follies (mainly borked hardware) it has to do a 32kb allocation for a 9kb(?) packet. It would be sad if that was carried over into e1000e? can't be, I personally removed that code. for MTU 1500 e1000e uses a plain normal sized SKB. for anything bigger e1000e uses pages. so I don't see how this bug could still be showing up for e1000e at all. The large skb receive code is all gone (literally, removed). *please* rmmod e1000; modprobe e1000e and show the dumps again so we know for sure that we're not looking at e1000 dumps. short fix: increase ring size for e1000 with `modprobe e1000 RxDescriptors=4096` (or use ethtool) and `echo -n 8192 /proc/sys/vm/min_free_kbytes` or something like that. what nic hardware is this on? lspci? Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: Detected Tx Unit Hang
Bernd Schubert wrote: On Saturday 16 February 2008, Kok, Auke wrote: Bernd Schubert wrote: Hello, I can't login to one of our servers and just got this in an ipmi sol session: [18169.209181] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang [18169.209183] Tx Queue 0 [18169.209184] TDH e3 [18169.209185] TDT e3 [18169.209186] next_to_use e3 [18169.209187] next_to_cleanbd [18169.209188] buffer_info[next_to_clean] [18169.209189] time_stamp 10043e4d2 [18169.209190] next_to_watchbe [18169.209191] jiffies 10043e6f6 [18169.209192] next_to_watch.status 1 [18169.256978] e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang [18169.256979] Tx Queue 0 [18169.256980] TDH de [18169.256982] TDT de [18169.256983] next_to_use de [18169.256984] next_to_cleanbc [18169.256985] buffer_info[next_to_clean] [18169.256986] time_stamp 10043e511 [18169.256987] next_to_watchbd [18169.256988] jiffies 10043e701 [18169.256989] next_to_watch.status 1 This is with 2.6.22.18. Is there any chance to recover the system? For some reasons I would prefer not to reboot now. if that's all you have then it was false alarm. there should be a 'netdev timeout - link reset' following those messages. can you send some more context on those messages? All I presently know is that there are 20 servers and login doesn't work any more - sysrq+t does show me it hangs in fuse, which is accessing the underlying nfs (we are using unionfs-fuse). While I checked the sysrq-t output suddenly these e1000 messages appeared. Thinking a bit about it, it either could be 2.6.22.18 has an e1000 bug, which 2.6.22.X didn't have (X=16, I think, but I'm not sure) or someone mis-configured the switch/network environment today. Hmm, now that I think about the last part, there already had been other networking problems today, which were supposed to be fixed several hours ago. Seems they didn't fix it properly. in real tx hang cases, the hardware is reset within 2 seconds, and everything continues as normal. Thanks, this gives me hope I don't need to reboot the serves (reboot would mean I would need to start 60 md-raid rebuilds...). my first thought after I read this e-mail is that the tx-hang message is just a symptom of your system not responding or being spinlocked all the time. These TX hang issues normally completely do not interfere with normal system operation and unless you have continuous TX resets you would be able to logon perfectly fine. I think you might have hit another kernel bug here... perhaps even unionfs/fuse related and that certainly looks plausible from your problem description. looking at the changelog for 2.6.22.16-2.6.22.18 I can't see anything relevant (see http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=shortlog), but there are definately no e1000 driver changes in that range anyway. I don't suppose you can do a git-bisect? that would certainly help. I don't think we can rule out anything just yet here. At least try to revert some of your systems to the previous kernel version and see if the problem goes away... Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Support arbitrary initial TCP timestamps
Adding yet another member to the already bloated tcp_sock structure to implement this is too high a cost. Yes, I was worried that would be deemed too high of a cost, but it was the most efficient way I could think to accomplish what I wanted. I would instead prefer that there be some global random number calculated when the first TCP socket is created, and use that as a global offset. You can even recompute it every few hours if you like. That would work fine if my mine purpose was to randomize the tcp timestamp to mitigate the leak in information regarding uptime, but despite the brief description, that's only a side effect of what I intended to do. What I wanted was a way to be able to choose an initial tcp timestamp for a particular connection that was not tied directly to jiffies. The two patches following this show my intended use case. I intend to enhance syncookie support to allow it to support advanced tcp options (sack and window scaling). Normally syncookies encode the bare minimum state of a connection in the ISN they choose, but the 32bit ISN isn't enough to encode advanced tcp options so you are left with a working but crippled tcp stack during a synflood attack. If in addition to choosing an ISN we are able to choose an initial tcp timestamp, we are then able to encode an additional 32 bits of information that can contain more of the advanced tcp options. Perhaps I should clarify. I don't see a way to implement the additional syncookie enhancements without storing an offset in some type of per-socket structure. Given that, is it still deemed too high of a cost? Is enhancing syncookies not deemed important enough to warrant the additional 4 bytes? What if there was an additional config variable to support arbitrary/random tcp timestamps, and then syncookies only used the additional options when that setting was chosen? Is there some possiblity I'm missing that could get this feature in a form suitable for inclusion? Thanks. --Glenn -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: One consequence of Herbert's change is that the chip will see a different datastream. The initial skb-data linear area will be smaller, and the transition to the fragmented area of pages will be quicker. I see. Perhaps when we get to the end of the data-stream, there is a tiny frag that the chip cannot handle. That's the only thing I can think of. Please try this patch to see if the problem goes away. This will disable SG on 5701 so we always get linear SKBs. diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index db606b6..bb37e76 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev, } else tp-tg3_flags = ~TG3_FLAG_RX_CHECKSUMS; + if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5701) + dev-features = ~(NETIF_F_IP_CSUM | NETIF_F_SG); + /* flow control autonegotiation is default behavior */ tp-tg3_flags |= TG3_FLAG_PAUSE_AUTONEG; tp-link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX; This patch does appear to fix the data corruption (tested with 2.6.24.2). However, it results in performance problems with the iSCSI application that I am trying to run on this machine. The test program that I described in the previous message still gets good performance in both directions. iperf -r gets good performance in both directions (940 Mbits/s or 117 MB/s). However, my target-mode iSCSI application (which obviously generates rx/tx traffic patterns more complicated than the synthetic tests) gets very poor performance in one direction but good performance in the other direction. iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, but remains at a decent 115 MB/s when the 3Com NIC is doing heavy tx with light rx. When I revert Herbert's patch instead of applying the patch above, I get 115 MB/s in both cases. (With a stock unpatched kernel, the test fails almost immediately because the iSCSI control PDUs are corrupted, causing the TCP connection to be dropped.) The SysKonnect NIC that does not exhibit this problem has a chip that says BCM5411KQM TT0128 P2Q and 56975E. Tony -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6][DRIVER][VETH] fix dev refcount race
Subject: veth fix dev refcount race From: Daniel Lezcano [EMAIL PROTECTED] When deleting the veth driver, veth_close calls netif_carrier_off for the two extremities of the network device. netif_carrier_off on the peer device will fire an event and hold a reference on the peer device. Just after, the peer is unregistered taking the rtnl_lock while the linkwatch_event is scheduled. If __linkwatch_run_queue does not occurs before the unregistering, unregister_netdevice will wait for the dev refcount to reach zero holding the rtnl_lock and linkwatch_event will wait for the rtnl_lock and hold the dev refcount. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- drivers/net/veth.c | 53 - 1 file changed, 40 insertions(+), 13 deletions(-) Index: net-2.6/drivers/net/veth.c === --- net-2.6.orig/drivers/net/veth.c +++ net-2.6/drivers/net/veth.c @@ -244,18 +244,6 @@ static int veth_open(struct net_device * return 0; } -static int veth_close(struct net_device *dev) -{ - struct veth_priv *priv; - - if (netif_carrier_ok(dev)) { - priv = netdev_priv(dev); - netif_carrier_off(dev); - netif_carrier_off(priv-peer); - } - return 0; -} - static int veth_dev_init(struct net_device *dev) { struct veth_net_stats *stats; @@ -286,13 +274,50 @@ static void veth_setup(struct net_device dev-hard_start_xmit = veth_xmit; dev-get_stats = veth_get_stats; dev-open = veth_open; - dev-stop = veth_close; dev-ethtool_ops = veth_ethtool_ops; dev-features |= NETIF_F_LLTX; dev-init = veth_dev_init; dev-destructor = veth_dev_free; } +static void veth_change_state(struct net_device *dev) +{ + struct net_device *peer; + struct veth_priv *priv; + + priv = netdev_priv(dev); + peer = priv-peer; + + if (netif_carrier_ok(peer)) { + if (!netif_carrier_ok(dev)) + netif_carrier_on(dev); + } else { + if (netif_carrier_ok(dev)) + netif_carrier_off(dev); + } +} + +static int veth_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev = ptr; + + if (dev-open != veth_open) + goto out; + + switch (event) { + case NETDEV_CHANGE: + veth_change_state(dev); + break; + } +out: + return NOTIFY_DONE; +} + +static struct notifier_block veth_notifier_block __read_mostly = { + .notifier_call = veth_device_event, +}; + /* * netlink interface */ @@ -454,12 +479,14 @@ static struct rtnl_link_ops veth_link_op static __init int veth_init(void) { + register_netdevice_notifier(veth_notifier_block); return rtnl_link_register(veth_link_ops); } static __exit void veth_exit(void) { rtnl_link_unregister(veth_link_ops); + unregister_netdevice_notifier(veth_notifier_block); } module_init(veth_init);
Re: AW: AW: AW: Problem receiving multicast with kernel.2.6.24, REAL solution
Reither Robert wrote: Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, Stand G16 Hurray, your patch does the trick for me !!! Now join/leave und receiving of multicasts work like it should !! May i kiss you Patrick, this saves my week ;-) The credit goes to Jorge Boncompte, I only broke it :| Hope we see a minor kernel release with this patch very soon ... Yes, I'll send it to -stable as soon as it hits upstream. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NETLABEL] Minor cleanup: remove unused method definition
On Tuesday 19 February 2008 9:25:31 am Rami Rosen wrote: Hi, This patch removes definition of netlbl_cfg_cipsov4_del() method in netlabel/netlabel_kapi.c and in include/net/netlabel.h as it is not used. Regards, Rami Rosen Signed-off-by: Rami Rosen [EMAIL PROTECTED] This was added for use by Smack (and any other LSMs which want to configure NetLabel directly) and since this is an area that is undergoing a lot of churn at this point I'd prefer if this function was left in place for the time being. At a later date if this function is still unused, I'll gladly ack it's removal or do so myself. -- paul moore linux security @ hp -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] igb: Fix panic with NICs with 1000BASE-SX PHY
From: Bill Hayes [EMAIL PROTECTED] This patch eliminates a kernel panic with the igb driver in 2.6.25-rc2 when running on a Intel 82575 Ethernet controller with a 1000BASE-SX PHY. The panic does not happen with the 1000BASE-T PHY, only with a SX connection. Signed-off-by: Bill Hayes [EMAIL PROTECTED] Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/igb/igb_main.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index 3480cc7..6a1f230 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -816,7 +816,8 @@ void igb_reset(struct igb_adapter *adapter) wr32(E1000_VET, ETHERNET_IEEE_VLAN_TYPE); igb_reset_adaptive(adapter-hw); - adapter-hw.phy.ops.get_phy_info(adapter-hw); + if (adapter-hw.phy.ops.get_phy_info) + adapter-hw.phy.ops.get_phy_info(adapter-hw); } /** @@ -2052,7 +2053,8 @@ static void igb_set_multi(struct net_device *netdev) static void igb_update_phy_info(unsigned long data) { struct igb_adapter *adapter = (struct igb_adapter *) data; - adapter-hw.phy.ops.get_phy_info(adapter-hw); + if (adapter-hw.phy.ops.get_phy_info) + adapter-hw.phy.ops.get_phy_info(adapter-hw); } /** -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, That's strange. The patch should only affect TX performance slightly since we are just turning off SG for TX. Please take an ethereal trace to see what's happening and compare with a good trace. but remains at a decent 115 MB/s when the 3Com NIC is doing heavy tx with light rx. When I revert Herbert's patch instead of applying the patch above, I get 115 MB/s in both cases. (With a stock unpatched kernel, the test fails almost immediately because the iSCSI control PDUs are corrupted, causing the TCP connection to be dropped.) The SysKonnect NIC that does not exhibit this problem has a chip that says BCM5411KQM TT0128 P2Q and 56975E. I think this is the 5700, but please send me the tg3 output that identifies the chip and the revision. Something like this: eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 10/100/1000Base-T Ethernet 00:10:18:04:57:0d eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: The SysKonnect NIC that does not exhibit this problem has a chip that says BCM5411KQM TT0128 P2Q and 56975E. I think this is the 5700, but please send me the tg3 output that identifies the chip and the revision. Something like this: eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 10/100/1000Base-T Ethernet 00:10:18:04:57:0d eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1] Here is the dmesg output for the SysKonnect NIC: eth0: Tigon3 [partno(SK-9D21) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:00:5a:9d:0c:4a eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] Tony -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND/patch 2.6.25-rc2-git] net/enc28j60: low power mode
Keep enc28j60 chips in low-power mode when they're not in use. At typically 120 mA, these chips run hot even when idle; this low power mode cuts that power usage by a factor of around 100. This version provides a generic routine to poll a register until its masked value equals some value ... e.g. bit set or cleared. It's basically what the previous wait_phy_ready() did, but this version is generalized to support the handshaking needed to enter and exit low power mode. Signed-off-by: David Brownell [EMAIL PROTECTED] Signed-off-by: Claudio Lanconelli [EMAIL PROTECTED] --- drivers/net/enc28j60.c | 82 ++--- 1 files changed, 58 insertions(+), 24 deletions(-) --- a/drivers/net/enc28j60.c +++ b/drivers/net/enc28j60.c @@ -400,26 +400,31 @@ enc28j60_packet_write(struct enc28j60_ne mutex_unlock(priv-lock); } -/* - * Wait until the PHY operation is complete. - */ -static int wait_phy_ready(struct enc28j60_net *priv) +static unsigned long msec20_to_jiffies; + +static int poll_ready(struct enc28j60_net *priv, u8 reg, u8 mask, u8 val) { - unsigned long timeout = jiffies + 20 * HZ / 1000; - int ret = 1; + unsigned long timeout = jiffies + msec20_to_jiffies; /* 20 msec timeout read */ - while (nolock_regb_read(priv, MISTAT) MISTAT_BUSY) { + while ((nolock_regb_read(priv, reg) mask) != val) { if (time_after(jiffies, timeout)) { if (netif_msg_drv(priv)) - printk(KERN_DEBUG DRV_NAME - : PHY ready timeout!\n); - ret = 0; - break; + dev_dbg(priv-spi-dev, + reg %02x ready timeout!\n, reg); + return -ETIMEDOUT; } cpu_relax(); } - return ret; + return 0; +} + +/* + * Wait until the PHY operation is complete. + */ +static int wait_phy_ready(struct enc28j60_net *priv) +{ + return poll_ready(priv, MISTAT, MISTAT_BUSY, 0) ? 0 : 1; } /* @@ -594,6 +599,32 @@ static void nolock_txfifo_init(struct en nolock_regw_write(priv, ETXNDL, end); } +/* + * Low power mode shrinks power consumption about 100x, so we'd like + * the chip to be in that mode whenever it's inactive. (However, we + * can't stay in lowpower mode during suspend with WOL active.) + */ +static void enc28j60_lowpower(struct enc28j60_net *priv, bool is_low) +{ + if (netif_msg_drv(priv)) + dev_dbg(priv-spi-dev, %s power...\n, + is_low ? low : high); + + mutex_lock(priv-lock); + if (is_low) { + nolock_reg_bfclr(priv, ECON1, ECON1_RXEN); + poll_ready(priv, ESTAT, ESTAT_RXBUSY, 0); + poll_ready(priv, ECON1, ECON1_TXRTS, 0); + /* ECON2_VRPS was set during initialization */ + nolock_reg_bfset(priv, ECON2, ECON2_PWRSV); + } else { + nolock_reg_bfclr(priv, ECON2, ECON2_PWRSV); + poll_ready(priv, ESTAT, ESTAT_CLKRDY, ESTAT_CLKRDY); + /* caller sets ECON1_RXEN */ + } + mutex_unlock(priv-lock); +} + static int enc28j60_hw_init(struct enc28j60_net *priv) { u8 reg; @@ -612,8 +643,8 @@ static int enc28j60_hw_init(struct enc28 priv-tx_retry_count = 0; priv-max_pk_counter = 0; priv-rxfilter = RXFILTER_NORMAL; - /* enable address auto increment */ - nolock_regb_write(priv, ECON2, ECON2_AUTOINC); + /* enable address auto increment and voltage regulator powersave */ + nolock_regb_write(priv, ECON2, ECON2_AUTOINC | ECON2_VRPS); nolock_rxfifo_init(priv, RXSTART_INIT, RXEND_INIT); nolock_txfifo_init(priv, TXSTART_INIT, TXEND_INIT); @@ -690,7 +721,7 @@ static int enc28j60_hw_init(struct enc28 static void enc28j60_hw_enable(struct enc28j60_net *priv) { - /* enable interrutps */ + /* enable interrupts */ if (netif_msg_hw(priv)) printk(KERN_DEBUG DRV_NAME : %s() enabling interrupts.\n, __FUNCTION__); @@ -726,15 +757,12 @@ enc28j60_setlink(struct net_device *ndev int ret = 0; if (!priv-hw_enable) { - if (autoneg == AUTONEG_DISABLE speed == SPEED_10) { + /* link is in low power mode now; duplex setting +* will take effect on next enc28j60_hw_init(). +*/ + if (autoneg == AUTONEG_DISABLE speed == SPEED_10) priv-full_duplex = (duplex == DUPLEX_FULL); - if (!enc28j60_hw_init(priv)) { - if (netif_msg_drv(priv)) - dev_err(ndev-dev, - hw_reset() failed\n); - ret = -EINVAL; -
[RESEND/patch 2.6.25-rc2-git] net/enc28j60: section fix
Minor bugfixes to the enc28j60 driver ... wrong section marking, indentation, and bogus use of spi_bus_type. There's still major overuse of printk; essentially every place it's used should switch to dev_*() messaging primitives. Signed-off-by: David Brownell [EMAIL PROTECTED] Acked-by: Claudio Lanconelli [EMAIL PROTECTED] --- drivers/net/enc28j60.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) --- a/drivers/net/enc28j60.c2008-01-29 11:07:14.0 -0800 +++ b/drivers/net/enc28j60.c2008-01-29 12:43:18.0 -0800 @@ -1555,7 +1555,7 @@ error_alloc: return ret; } -static int enc28j60_remove(struct spi_device *spi) +static int __devexit enc28j60_remove(struct spi_device *spi) { struct enc28j60_net *priv = dev_get_drvdata(spi-dev); @@ -1572,9 +1572,8 @@ static int enc28j60_remove(struct spi_de static struct spi_driver enc28j60_driver = { .driver = { .name = DRV_NAME, - .bus = spi_bus_type, .owner = THIS_MODULE, - }, +}, .probe = enc28j60_probe, .remove = __devexit_p(enc28j60_remove), }; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND/patch 2.6.25-rc2-git] net/enc28j60: oops fix
Prevent oops on enc28j60 packet RX: make sure buffers are aligned. Not all architectures support unaligned accesses in kernel space. Signed-off-by: David Brownell [EMAIL PROTECTED] Acked-by: Claudio Lanconelli [EMAIL PROTECTED] --- Seems the enc28j60 patches didn't get picked up, ergo resend... they're not in net-2.6.git, even this oops fix. drivers/net/enc28j60.c |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) --- a/drivers/net/enc28j60.c2008-02-06 09:29:00.0 -0800 +++ b/drivers/net/enc28j60.c2008-02-06 09:30:03.0 -0800 @@ -900,7 +900,7 @@ static void enc28j60_hw_rx(struct net_de if (RSV_GETBIT(rxstat, RSV_LENCHECKERR)) ndev-stats.rx_frame_errors++; } else { - skb = dev_alloc_skb(len); + skb = dev_alloc_skb(len + NET_IP_ALIGN); if (!skb) { if (netif_msg_rx_err(priv)) dev_err(ndev-dev, @@ -908,6 +908,7 @@ static void enc28j60_hw_rx(struct net_de ndev-stats.rx_dropped++; } else { skb-dev = ndev; + skb_reserve(skb, NET_IP_ALIGN); /* copy the packet from the receive buffer */ enc28j60_mem_read(priv, priv-next_pk_ptr + sizeof(rsv), len, skb_put(skb, len)); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bonding support for eth1394?
Roland Dreier wrote on 2007-10-13: The bonding sources have a few occurrences of EOPNOTSUPP. Unless I missed something, they are all related to setting the hardware address of the interface. AFAICS this is impossible with IP over FireWire. If it is crucial to bonding to be able to change the slaves' hardware addresses, then you are out of luck. There are a few changes to the bonding driver pending that will add support for bonding IP-over-InfiniBand interfaces. IPoIB also cannot change its HW address, so the patches address that issue. Once those patches land, bonding eth1394 interfaces may just work. Bill Fink wrote on 2007-10-14: While that might allow multiple eth1394 interfaces to be bonded, I believe the user wanted to bond an eth1394 interface with a normal Ethernet interface, and I don't think that will work even with the IPoIB bonding changes, since bonding of different fundamental types of network interfaces still won't be supported, and I'm pretty sure eth1394 is not considered a standard Ethernet interface (different MAC address format for one thing). Karl, you could try kernel 2.6.24(.2) which AFAIU features the mentioned changes. -- Stefan Richter -=-==--- --=- =--== http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, That's strange. The patch should only affect TX performance slightly since we are just turning off SG for TX. Please take an ethereal trace to see what's happening and compare with a good trace. Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop, but rather that the performance drop is just another manifestation of whatever bug is causing the data corruption. I do not regularly use wireshark or look at network packet dumps, so I am not really sure what to look for. Given the above information, do you still believe that there is value in examining the packet dump? Tony -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Messed multicast lists after dev_mc_sync/unsync
From: Patrick McHardy [EMAIL PROTECTED] Date: Tue, 19 Feb 2008 15:56:59 +0100 [NET]: Messed multicast lists after dev_mc_sync/unsync Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you introduced a new field da_synced to struct dev_addr_list that is not properly initialized to 0. So when any of the current users (8021q, macvlan, mac80211) calls dev_mc_sync/unsync they mess the address list for both devices. The attached patch fixed it for me and avoid future problems. Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED] Signed-off-by: Patrick McHardy [EMAIL PROTECTED] I've added the headline description of the referenced commit to this commit message, and will queue this up to -stable. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6.25 patch] fix broken error handling in ieee80211_sta_process_addba_request()
The Coverity checker spotted this buggy error handling added by commit 07db218396650933abff3c5c1ad1e2a6e0cfedeb. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- 6003e3d899a8fd6425ff509363b776f8807df25d diff --git a/net/mac80211/ieee80211_sta.c b/net/mac80211/ieee80211_sta.c index 2019b4f..094565e 100644 --- a/net/mac80211/ieee80211_sta.c +++ b/net/mac80211/ieee80211_sta.c @@ -1116,9 +1116,10 @@ static void ieee80211_sta_process_addba_request(struct net_device *dev, /* prepare reordering buffer */ tid_agg_rx-reorder_buf = kmalloc(buf_size * sizeof(struct sk_buf *), GFP_ATOMIC); - if ((!tid_agg_rx-reorder_buf) net_ratelimit()) { - printk(KERN_ERR can not allocate reordering buffer - to tid %d\n, tid); + if (!tid_agg_rx-reorder_buf) { + if (net_ratelimit()) + printk(KERN_ERR can not allocate reordering buffer + to tid %d\n, tid); goto end; } memset(tid_agg_rx-reorder_buf, 0, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote: ... Unfortunately the ISP's syslog stops. But I've been able to borrow two Quad Xeon boxes and have reproduced the problem. Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? I send here my proposal: it's intended for testing and to check one of possible solutions here. IMHO your lockdep reports show there is no use to change anything around sk_dst_lock: it would need the global change of this lock to fix this problem. So the fix should be done around pch-upl lock and this means changing ppp_generic. In the patch below I've used trylock in places which seem to allow for skipping some things (while config is changed only) or simply don't need this lock because there is no ppp struct. This could be modified to add some waiting loop if necessary. Another option is to change the write side of this lock: it looks like more vulnerable if something missed because there are more locks involved, but probably should be enough to solve this problem too. I think pppol2tp need to be first checked only with hlist_lock bh patch, unless there were some lockdep reports on these other locks too. (BTW, I added ppp maintainer to CC - I hope we get Paul's opinion on this.) Regards, Jarek P. (testing patch #1) --- drivers/net/ppp_generic.c | 33 +++-- 1 files changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c index 4dc5b4b..5cbc534 100644 --- a/drivers/net/ppp_generic.c +++ b/drivers/net/ppp_generic.c @@ -1473,7 +1473,7 @@ void ppp_input(struct ppp_channel *chan, struct sk_buff *skb) { struct channel *pch = chan-ppp; - int proto; + int proto, locked; if (!pch || skb-len == 0) { kfree_skb(skb); @@ -1481,8 +1481,13 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } proto = PPP_PROTO(skb); - read_lock_bh(pch-upl); - if (!pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { + /* +* We use trylock to avoid dependency between soft-irq-safe upl lock +* and soft-irq-unsafe sk_dst_lock. +*/ + local_bh_disable(); + locked = read_trylock(pch-upl); + if (!locked || !pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { /* put it on the channel queue */ skb_queue_tail(pch-file.rq, skb); /* drop old frames if queue too long */ @@ -1493,7 +1498,10 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } else { ppp_do_recv(pch-ppp, skb, pch); } - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } /* Put a 0-length skb in the receive queue as an error indication */ @@ -1506,16 +1514,18 @@ ppp_input_error(struct ppp_channel *chan, int code) if (!pch) return; - read_lock_bh(pch-upl); - if (pch-ppp) { + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if (read_trylock(pch-upl) pch-ppp) { skb = alloc_skb(0, GFP_ATOMIC); if (skb) { skb-len = 0; /* probably unnecessary */ skb-cb[0] = code; ppp_do_recv(pch-ppp, skb, pch); } + read_unlock(pch-upl); } - read_unlock_bh(pch-upl); + local_bh_enable(); } /* @@ -2044,10 +2054,13 @@ int ppp_unit_number(struct ppp_channel *chan) int unit = -1; if (pch) { - read_lock_bh(pch-upl); - if (pch-ppp) + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if (read_trylock(pch-upl) pch-ppp) { unit = pch-ppp-file.index; - read_unlock_bh(pch-upl); + read_unlock(pch-upl); + } + local_bh_enable(); } return unit; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference
Unless I miss a guaranteed relation between between f and new_fa-fa_info this patch is required for fixing a NULL dereference introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- net/ipv4/fib_hash.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- linux-2.6/net/ipv4/fib_hash.c.old 2008-02-19 23:23:14.0 +0200 +++ linux-2.6/net/ipv4/fib_hash.c 2008-02-19 23:38:28.0 +0200 @@ -367,17 +367,18 @@ static struct fib_node *fib_find_node(st } return NULL; } static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg) { struct fn_hash *table = (struct fn_hash *) tb-tb_data; - struct fib_node *new_f, *f; + struct fib_node *new_f = NULL; + struct fib_node *f; struct fib_alias *fa, *new_fa; struct fn_zone *fz; struct fib_info *fi; u8 tos = cfg-fc_tos; __be32 key; int err; if (cfg-fc_dst_len 32) @@ -491,33 +492,32 @@ static int fn_hash_insert(struct fib_tab } err = -ENOENT; if (!(cfg-fc_nlflags NLM_F_CREATE)) goto out; err = -ENOBUFS; - new_f = NULL; if (!f) { new_f = kmem_cache_zalloc(fn_hash_kmem, GFP_KERNEL); if (new_f == NULL) goto out; INIT_HLIST_NODE(new_f-fn_hash); INIT_LIST_HEAD(new_f-fn_alias); new_f-fn_key = key; f = new_f; } new_fa = f-fn_embedded_alias; if (new_fa-fa_info != NULL) { new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL); if (new_fa == NULL) - goto out_free_new_f; + goto out; } new_fa-fa_info = fi; new_fa-fa_tos = tos; new_fa-fa_type = cfg-fc_type; new_fa-fa_scope = cfg-fc_scope; new_fa-fa_state = 0; /* @@ -535,19 +535,19 @@ static int fn_hash_insert(struct fib_tab if (new_f) fz-fz_nent++; rt_cache_flush(-1); rtmsg_fib(RTM_NEWROUTE, key, new_fa, cfg-fc_dst_len, tb-tb_id, cfg-fc_nlinfo, 0); return 0; -out_free_new_f: - kmem_cache_free(fn_hash_kmem, new_f); out: + if (new_f) + kmem_cache_free(fn_hash_kmem, new_f); fib_release_info(fi); return err; } static int fn_hash_delete(struct fib_table *tb, struct fib_config *cfg) { struct fn_hash *table = (struct fn_hash*)tb-tb_data; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference
Adrian Bunk a écrit : Unless I miss a guaranteed relation between between f and new_fa-fa_info this patch is required for fixing a NULL dereference introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- net/ipv4/fib_hash.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- linux-2.6/net/ipv4/fib_hash.c.old 2008-02-19 23:23:14.0 +0200 +++ linux-2.6/net/ipv4/fib_hash.c 2008-02-19 23:38:28.0 +0200 @@ -367,17 +367,18 @@ static struct fib_node *fib_find_node(st } return NULL; } static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg) { struct fn_hash *table = (struct fn_hash *) tb-tb_data; - struct fib_node *new_f, *f; + struct fib_node *new_f = NULL; + struct fib_node *f; struct fib_alias *fa, *new_fa; struct fn_zone *fz; struct fib_info *fi; u8 tos = cfg-fc_tos; __be32 key; int err; if (cfg-fc_dst_len 32) @@ -491,33 +492,32 @@ static int fn_hash_insert(struct fib_tab } err = -ENOENT; if (!(cfg-fc_nlflags NLM_F_CREATE)) goto out; err = -ENOBUFS; - new_f = NULL; if (!f) { new_f = kmem_cache_zalloc(fn_hash_kmem, GFP_KERNEL); if (new_f == NULL) goto out; INIT_HLIST_NODE(new_f-fn_hash); INIT_LIST_HEAD(new_f-fn_alias); new_f-fn_key = key; f = new_f; } new_fa = f-fn_embedded_alias; if (new_fa-fa_info != NULL) { new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL); if (new_fa == NULL) - goto out_free_new_f; + goto out; } new_fa-fa_info = fi; new_fa-fa_tos = tos; new_fa-fa_type = cfg-fc_type; new_fa-fa_scope = cfg-fc_scope; new_fa-fa_state = 0; /* @@ -535,19 +535,19 @@ static int fn_hash_insert(struct fib_tab if (new_f) fz-fz_nent++; rt_cache_flush(-1); rtmsg_fib(RTM_NEWROUTE, key, new_fa, cfg-fc_dst_len, tb-tb_id, cfg-fc_nlinfo, 0); return 0; -out_free_new_f: - kmem_cache_free(fn_hash_kmem, new_f); out: + if (new_f) + kmem_cache_free(fn_hash_kmem, new_f); fib_release_info(fi); return err; } static int fn_hash_delete(struct fib_table *tb, struct fib_config *cfg) { struct fn_hash *table = (struct fn_hash*)tb-tb_data; Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() does. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Wed, Feb 20, 2008 at 12:06:40AM +0100, Jarek Poplawski wrote: ... (testing patch #1) SORRY!!! take 2 (unlocking fixed) --- drivers/net/ppp_generic.c | 39 +-- 1 files changed, 29 insertions(+), 10 deletions(-) diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c index 4dc5b4b..c4e3808 100644 --- a/drivers/net/ppp_generic.c +++ b/drivers/net/ppp_generic.c @@ -1473,7 +1473,7 @@ void ppp_input(struct ppp_channel *chan, struct sk_buff *skb) { struct channel *pch = chan-ppp; - int proto; + int proto, locked; if (!pch || skb-len == 0) { kfree_skb(skb); @@ -1481,8 +1481,13 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } proto = PPP_PROTO(skb); - read_lock_bh(pch-upl); - if (!pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { + /* +* We use trylock to avoid dependency between soft-irq-safe upl lock +* and soft-irq-unsafe sk_dst_lock. +*/ + local_bh_disable(); + locked = read_trylock(pch-upl); + if (!locked || !pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { /* put it on the channel queue */ skb_queue_tail(pch-file.rq, skb); /* drop old frames if queue too long */ @@ -1493,7 +1498,10 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } else { ppp_do_recv(pch-ppp, skb, pch); } - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } /* Put a 0-length skb in the receive queue as an error indication */ @@ -1502,12 +1510,14 @@ ppp_input_error(struct ppp_channel *chan, int code) { struct channel *pch = chan-ppp; struct sk_buff *skb; + int locked; if (!pch) return; - read_lock_bh(pch-upl); - if (pch-ppp) { + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if ((locked = read_trylock(pch-upl)) pch-ppp) { skb = alloc_skb(0, GFP_ATOMIC); if (skb) { skb-len = 0; /* probably unnecessary */ @@ -1515,7 +1525,10 @@ ppp_input_error(struct ppp_channel *chan, int code) ppp_do_recv(pch-ppp, skb, pch); } } - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } /* @@ -2044,10 +2057,16 @@ int ppp_unit_number(struct ppp_channel *chan) int unit = -1; if (pch) { - read_lock_bh(pch-upl); - if (pch-ppp) + int locked; + + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if ((locked = read_trylock(pch-upl)) pch-ppp) unit = pch-ppp-file.index; - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } return unit; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RTNL]: Add missing link netlink attribute policy definitions
From: Thomas Graf [EMAIL PROTECTED] Date: Wed, 20 Feb 2008 00:46:06 +0100 IFLA_LINK is no longer a write-only attribute on the kernel side and must thus be validated. Same goes for the newly introduced IFLA_LINKINFO. Fixes undefined behaviour if either of the attributes are not well formed. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied, thanks Thomas. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2.6 patch] net/phy/mdio_bus.c: fix a check-after-use
This patch fixes a check-after-use spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- 6beeb3ac577d74d72b2f91bd654eecb904c3c17e diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 6e9f619..963630c 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -49,13 +49,13 @@ int mdiobus_register(struct mii_bus *bus) int i; int err = 0; - mutex_init(bus-mdio_lock); - if (NULL == bus || NULL == bus-name || NULL == bus-read || NULL == bus-write) return -EINVAL; + mutex_init(bus-mdio_lock); + if (bus-reset) bus-reset(bus); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at net/core/skbuff.c:95!
Hi I have a reproducible crash that can be triggered remotely at the layer2 level. Example crash outputs is as follows kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0 kernel: [ cut here ] kernel: kernel BUG at net/core/skbuff.c:95! kernel: invalid opcode: [#1] kernel: SMP kernel: Modules linked in: kernel: CPU:0 kernel: EIP:0060:[c052753e]Not tainted VLI kernel: EFLAGS: 00010286 (2.6.23.16 #1) kernel: EIP is at skb_over_panic+0x5e/0x70 kernel: eax: 0076 ebx: df600460 ecx: 0086 edx: kernel: esi: 03e5 edi: 03e8 ebp: dcf652e0 esp: c087fed0 kernel: ds: 007b es: 007b fs: 00d8 gs: ss: 0068 kernel: Process swapper (pid: 0, ti=c087f000 task=c076e0e0 task.ti=c081) kernel: Stack: c0746150 c0541fc7 03e8 03e5 c166ac00 c166ac2f c166b017 c166ac80 kernel: c1588000 df600460 deb48418 c0541fcc d246 5429a9b7 c07f6468 c07f645c kernel: dcf652e0 c054209b dcf652e0 dcf652e0 0003 deb48033 c0542105 kernel: Call Trace: kernel: [c0541fc7] llc_station_ac_send_test_r+0x137/0x190 kernel: [c0541fcc] llc_station_ac_send_test_r+0x13c/0x190 kernel: [c054209b] llc_station_next_state+0x7b/0xc0 kernel: [c0542105] llc_station_state_process+0x25/0x40 kernel: [c053b37a] llc_rcv+0x24a/0x290 kernel: [c05308e2] netif_receive_skb+0x242/0x360 kernel: [c0358563] e100_poll+0x193/0x4a0 kernel: [c0118312] run_rebalance_domains+0x112/0x410 kernel: [c052d6f4] net_rx_action+0x74/0x110 kernel: [c011ff65] __do_softirq+0x75/0xf0 kernel: [c010580b] do_softirq+0x5b/0xb0 kernel: [c011c738] profile_tick+0x38/0x60 kernel: [c013d680] handle_fasteoi_irq+0x0/0xd0 kernel: [c01058da] do_IRQ+0x7a/0xc0 kernel: [c0101750] default_idle+0x0/0x50 kernel: [c01038c3] common_interrupt+0x23/0x30 kernel: [c0101750] default_idle+0x0/0x50 kernel: [c0101784] default_idle+0x34/0x50 kernel: [c01017d4] cpu_idle+0x34/0x80 kernel: [c0817962] start_kernel+0x272/0x350 kernel: [c0817390] unknown_bootoption+0x0/0x1d0 kernel: === skb_over_panic: text:c055be07 len:1000 put:997 head:de32a800 data:de32a82f tail:0xde32ac17 end:0xde32a880 dev:eth0 [ cut here ] kernel BUG at net/core/skbuff.c:95! invalid opcode: [#1] SMP Modules linked in: Pid: 0, comm: swapper Not tainted (2.6.24.2 #1) EIP: 0060:[c054078e] EFLAGS: 00010282 CPU: 0 EIP is at skb_over_panic+0x5e/0x70 EAX: 0076 EBX: de32daa0 ECX: 0092 EDX: ESI: 03e5 EDI: 03e8 EBP: df3cf200 ESP: c08a3ec0 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process swapper (pid: 0, ti=c08a3000 task=c0794180 task.ti=c0833000) Stack: c076ae10 c055be07 03e8 03e5 de32a800 de32a82f de32ac17 de32a880 df9f0800 de32daa0 df199418 c055be0c d246 5429a9b7 c081a8f4 c081a8e8 df3cf200 c055bedb df3cf200 df3cf200 0003 df199033 c055bf45 Call Trace: [c055be07] llc_station_ac_send_test_r+0x137/0x190 [c055be0c] llc_station_ac_send_test_r+0x13c/0x190 [c055bedb] llc_station_next_state+0x7b/0xc0 [c055bf45] llc_station_state_process+0x25/0x40 [c055515a] llc_rcv+0x24a/0x2c0 [c054a0bb] netif_receive_skb+0x27b/0x3a0 [c03673c9] e100_poll+0x179/0x410 [c05470f1] net_rx_action+0x91/0x150 [c0120275] __do_softirq+0x75/0xf0 [c0105a3b] do_softirq+0x5b/0xb0 [c011ca78] profile_tick+0x38/0x60 [c013ea80] handle_fasteoi_irq+0x0/0xd0 [c0105b0a] do_IRQ+0x7a/0xc0 [c0101730] default_idle+0x0/0x50 [c0103903] common_interrupt+0x23/0x30 [c0101730] default_idle+0x0/0x50 [c0101764] default_idle+0x34/0x50 [c01017c8] cpu_idle+0x48/0xa0 [c083a942] start_kernel+0x272/0x350 [c083a370] unknown_bootoption+0x0/0x1d0 === Code: 00 00 89 44 24 14 8b 83 98 00 00 00 89 44 24 10 8b 43 50 89 74 24 0c 89 4c 24 04 89 44 24 08 c7 04 24 10 ae 76 c0 e8 42 ba bd ff 0f 0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 56 53 83 EIP: [c054078e] skb_over_panic+0x5e/0x70 SS:ESP 0068:c08a3ec0 Kernel panic - not syncing: Fatal exception in interrupt The issue was originally noticed on 2.6.20.21 machines, when we had ~60 servers all panic at the same time after a switch went crazy. To reproduce download linkloop http://freshmeat.net/projects/linkloop/ then run linkloop -d -s 1000 target mac thanks jim # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24.2 # Tue Feb 19 15:55:53 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set
Re: [RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference
From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 20 Feb 2008 00:06:14 +0100 Adrian Bunk a écrit : Unless I miss a guaranteed relation between between f and new_fa-fa_info this patch is required for fixing a NULL dereference introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and spotted by the Coverity checker. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] ... Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() does. Applied, thanks everyone. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop, but rather that the performance drop is just another manifestation of whatever bug is causing the data corruption. I do not regularly use wireshark or look at network packet dumps, so I am not really sure what to look for. Given the above information, do you still believe that there is value in examining the packet dump? Can you confirm whether you're getting TCP checksum errors on the other side that is receiving packets from the 5701? You can just check statistics using netstat -s. I suspect that after we turn off SG, checksum is no longer offloaded and we are getting lots of TCP checksum errors instead that are slowing the performance. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTNL]: Add missing link netlink attribute policy definitions
IFLA_LINK is no longer a write-only attribute on the kernel side and must thus be validated. Same goes for the newly introduced IFLA_LINKINFO. Fixes undefined behaviour if either of the attributes are not well formed. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.26/net/core/rtnetlink.c === --- net-2.6.26.orig/net/core/rtnetlink.c2008-02-19 20:30:08.0 +0100 +++ net-2.6.26/net/core/rtnetlink.c 2008-02-20 00:39:54.0 +0100 @@ -693,10 +693,12 @@ [IFLA_BROADCAST]= { .type = NLA_BINARY, .len = MAX_ADDR_LEN }, [IFLA_MAP] = { .len = sizeof(struct rtnl_link_ifmap) }, [IFLA_MTU] = { .type = NLA_U32 }, + [IFLA_LINK] = { .type = NLA_U32 }, [IFLA_TXQLEN] = { .type = NLA_U32 }, [IFLA_WEIGHT] = { .type = NLA_U32 }, [IFLA_OPERSTATE]= { .type = NLA_U8 }, [IFLA_LINKMODE] = { .type = NLA_U8 }, + [IFLA_LINKINFO] = { .type = NLA_NESTED }, [IFLA_NET_NS_PID] = { .type = NLA_U32 }, }; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] bluetooth : put hci dev after del conn
On Feb 19, 2008 12:44 PM, David Miller [EMAIL PROTECTED] wrote: From: Dave Young [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 15:55:55 +0800 Move hci_dev_put to del_conn to avoid hci dev going away before hci conn. This looks correct so I have applied it. Signed-off-by: Dave Young [EMAIL PROTECTED] Please remove the extraneous space at the end of your signoff line next time :-) Will do :) Also, I reworked the loop in del_conn() so that it no longer generates a compile warning, so I had to apply your patch by hand. Thanks a lot. Regards dave -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATHCH 1/16] ServerEngines 10Gb NIC driver
On Mon, 18 Feb 2008 10:00:51 -0800 Subbu Seetharaman [EMAIL PROTECTED] wrote: I have one question about bit fields. Several of headers in the common code are generated by srcgen from f/w source files. Some of the structures in these headers have bit fields (with separate definitions for little endian and big endian hosts). Are these un-acceptable in Linux driver submissions ? The netdev maintainer, Jeff Garzik (to whom you should submit your driver), frowns upon the use of bitfields, and for good reason. See, for example, http://marc.info/?l=linux-kernelm=118444531031506w=2 http://lkml.org/lkml/2005/8/5/361 http://lkml.org/lkml/2006/6/20/470 http://en.wikipedia.org/wiki/Bit_field. You should avoid them if at all possible. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: Michael Chan wrote: On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, That's strange. The patch should only affect TX performance slightly since we are just turning off SG for TX. Please take an ethereal trace to see what's happening and compare with a good trace. Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop, but rather that the performance drop is just another manifestation of whatever bug is causing the data corruption. I do not regularly use wireshark or look at network packet dumps, so I am not really sure what to look for. Given the above information, do you still believe that there is value in examining the packet dump? Tony Hi Tony. Can you give us the output of : sudo lspci -vvv - -s 03:01.0' (assuming that is still the correct address of the 3Com NIC.) Also, after some digging, I found that the 5701 can run into trouble if a 64-bit DMA read terminates early and then completes as a 32-bit transfer. The problem is reportedly very rare, but the failure mode looks like a match. Can you apply the following patch and see if it helps your performance / corruption problems? diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index db606b6..7ad08ce 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -11409,6 +11409,8 @@ static int __devinit tg3_get_invariants(struct tg3 *tp) tp-tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED; if ((pci_state_reg PCISTATE_BUS_32BIT) != 0) tp-tg3_flags |= TG3_FLAG_PCI_32BIT; + else if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5701) + tp-grc_mode |= GRC_MODE_FORCE_PCI32BIT; /* Chip-specific fixup from Broadcom driver */ if ((tp-pci_chip_rev_id == CHIPREV_ID_5704_A0) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop, but rather that the performance drop is just another manifestation of whatever bug is causing the data corruption. Interesting. So the workload that regressed is mostly RX with a little TX traffic? Can you try to reproduce this with something like netperf to eliminate other variables? This is all very puzzling since the patch in question shouldn't change an RX load at all. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25 patch] fix broken error handling in ieee80211_sta_process_addba_request()
On 19-02-2008 23:58, Adrian Bunk wrote: ... --- a/net/mac80211/ieee80211_sta.c +++ b/net/mac80211/ieee80211_sta.c @@ -1116,9 +1116,10 @@ static void ieee80211_sta_process_addba_request(struct net_device *dev, ... + printk(KERN_ERR can not allocate reordering buffer + printk(KERN_ERR cannot allocate reordering buffer Probably this can be fixed during the commit. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote: Zhang, Yanmin a �crit : On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Could you add a comment someplace that says refcnt wants to be on a different cache line from input/output/ops or performance tanks badly, to warn some future kernel hacker who starts adding new fields to the structure? Ok. Below is the new patch. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch. If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 +0800 @@ -52,15 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ -#ifdef CONFIG_NET_CLS_ROUTE - __u32 tclassid; -#endif + struct dst_entry*path; struct neighbour*neighbour; struct hh_cache *hh; @@ -70,10 +65,20 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + +#ifdef CONFIG_NET_CLS_ROUTE + __u32 tclassid; +#endif + + /* +* __refcnt wants to be on a different cache line from +* input/output/ops or performance tanks badly +*/ atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels. Could you please test now with 32 bits one ? I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no difference between pure kernel and patched kernel. New update: On 8-core stoakley, the regression becomes 2~3% with kernel 2.6.25-rc2. On tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 cores+hyperthreading), the regression is still 4% with 2.6.25-rc2. With my patch, on tigerton, almost all regression disappears. On tulsa, only about 2% regression disappears. So this issue is triggerred with multiple-cpu. Perhaps process scheduler is another factor causing the issue to happen, but it's very hard to change scheduler. Eric, I tested your new patch in function loopback_xmit. It has no improvement, while it doesn't introduce new issues. As you tested it on dual-core machine and got improvement, how about merging your patch with mine? -yanmin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tbench regression in 2.6.25-rc1
Zhang, Yanmin a écrit : On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote: Zhang, Yanmin a �crit : On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said: I also think __refcnt is the key. I did a new testing by adding 2 unsigned long pading before lastuse, so the 3 members are moved to next cache line. The performance is recovered. How about below patch? Almost all performance is recovered with the new patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Could you add a comment someplace that says refcnt wants to be on a different cache line from input/output/ops or performance tanks badly, to warn some future kernel hacker who starts adding new fields to the structure? Ok. Below is the new patch. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch. If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 +0800 @@ -52,15 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ -#ifdef CONFIG_NET_CLS_ROUTE - __u32 tclassid; -#endif + struct dst_entry*path; struct neighbour *neighbour; struct hh_cache *hh; @@ -70,10 +65,20 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + +#ifdef CONFIG_NET_CLS_ROUTE + __u32 tclassid; +#endif + + /* +* __refcnt wants to be on a different cache line from +* input/output/ops or performance tanks badly +*/ atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels. Could you please test now with 32 bits one ? I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no difference between pure kernel and patched kernel. New update: On 8-core stoakley, the regression becomes 2~3% with kernel 2.6.25-rc2. On tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 cores+hyperthreading), the regression is still 4% with 2.6.25-rc2. With my patch, on tigerton, almost all regression disappears. On tulsa, only about 2% regression disappears. So this issue is triggerred with multiple-cpu. Perhaps process scheduler is another factor causing the issue to happen, but it's very hard to change scheduler. Thanks very much Yanmin, I think we can apply your patch as is, if no regression was found for 32bits. Eric, I tested your new patch in function loopback_xmit. It has no improvement, while it doesn't introduce new issues. As you tested it on dual-core machine and got improvement, how about merging your patch with mine? No, thank you, that was an experiment and is not related to your findings on dst_entry. I am currently working on a 'distributed refcount' infrastructure, to be able to spread on several nodes (for NUMA machines) or several cache lines (normal SMP machines) the high pressure we currently have on some refcnt (struct dst_entry, struct net_device, and many more refcnts ...) Instead of NR_CPUS allocations, goal is to be able to restrict to a small value like 4, 8 or 16 the number of 32bits entities used to store one refcnt, even if NR_CPUS=4096 or so. atomic_inc(p-refcnt) - distref_inc(p-refcnt) distref_inc(struct distref *p) { atomic_inc(myptr[p-offset]); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: Question about polling
On 18-02-2008 10:18, Badalian Vyacheslav wrote: Hello all. Hi, Interesting think: Have PC that do NAT. Bandwidth about 600 mbs. Have 4 CPU (2xCoRe 2 DUO HT OFF 3.2 HZ). irqbalance in kernel is off. nat2 ~ # cat /proc/irq/217/smp_affinity 0001 nat2 ~ # cat /proc/irq/218/smp_affinity 0003 Load SI on CPU0 and CPU1 is about 90% Good... try do echo /proc/irq/217/smp_affinity echo /proc/irq/218/smp_affinity Get 100% SI at CPU0 Question Why? I think you should show here /proc/interrupts in all these cases. I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30% perfomance... but i have 4 CPU... i must get more perfomance if i cat to smp_affinity. picture looks liks this: 0-3 CPU get over 50% SI bandwith up 55% SI... bandwith up... 100% SI on CPU0 I remember patch to fix problem like it... patched function e1000_clean... kernel on pc have this patch (2.6.24-rc7-git2)... e1000 driver work much better (i up to 1.5-2x bandwidth before i get 100% SI), but i think that it not get 100% that it can =) If some patch works for you, and you can show here its advantages, you should probably add here some link and request for merging. BTW, I wonder if you tried to check if changing CONFIG_HZ makes any difference here? Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html