Re: wireless vs. alignment requirements
Now, the IP stack actually assumes that its header is four-byte aligned (see comment at NET_IP_ALIGN, although it is not said explicitly that the alignment requirement for an IP header is four) so that is actually something for the hardware/firmware (!) to do, for example Broadcom Good point. In fact IIRC we've always had the policy that drivers should do their best to generate aligned packets but it is not a requirement since on some platforms it's more important for the DMA to be aligned. We still require four-byte alignment, no? So it's up the platform code to fix up any exceptions should they show up. Daniel, what's the specific case that you had in mind with this patch? Well. This goes back to a user reporting unaligned accesses on sparc64. Davem thought this came from the ether addr comparisons but the user later reported that the patch from davem didn't fix it, and I think Daniel just made a sweep over all ether addr comparisons replacing them with unaligned ones. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Sat, Nov 24, 2007 at 03:53:34PM +1100, Rusty Russell wrote: So, you're saying that there's a problem with in-tree modules using symbols they shouldn't? Can you give an example? I believe that is fairly important in tree too because the kernel has become so big now that review cannot be the only enforcement mechanism for this anymore. If people aren't reviewing, this won't make them review. I don't think the With millions of LOC the primary maintainers cannot review everything. It's not that anybody is doing a bad job -- it is just so much code that explicit mechanisms are better than implicit contracts. problem is that people are conniving to avoid review. No of course not -- it is just too much code to let everything be reviewed by the core subsystem maintainers. But with explicit marking of internal symbols they would need to look at it because the relationship will be clearly spelled out in the code. Several distributions have policies that require to keep the changes to these exported interfaces minimal and that is very hard with thousands of exported symbol. With name spaces the number of truly publicly exported symbols will hopefully shrink to a much smaller, more manageable set. *This* makes sense. But it's not clear that the burden should be placed on kernel coders. You can create a list yourself. How do I tell the difference between truly publicly exported symbols and others? Out of tree solutions generally do not scale. Nobody else can keep up with 2+ Million changes each merge window. If a symbol has more than one in-tree user, it's hard to argue against an There are still classes of drivers. e.g. for the SCSI example: SD,SG,SR etc. are more internal while low level drivers like aic7xxx are clearly external drivers. out-of-tree module using the symbol, unless you're arguing against *all* out-of-tree modules. No, actually namespaces kind of help out of tree modules. Once they only use interfaces that are really generic driver interfaces and fairly stable their authors will have much less pain forward porting to newer kernel version. But currently the authors cannot even know what is an instable internal interface and what is a generic relatively stable driver level interface. Namespaces are a mechanism to make this all explicit. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Sat, Nov 24, 2007 at 09:33:36AM +0100, Johannes Berg wrote: We still require four-byte alignment, no? Not at all. If NET_IP_ALIGN is zero then it won't be four-byte aligned (since the Ethernet header is 14 bytes long). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
Johannes, Hence, going back to the 802.11 header and the IP header alignment requirement, if we get the IP header alignment requirement right now I cannot possibly see any way we would use compare_ether_addr() on an address that is not at least two-byte aligned as required. ACK. I agree completely. The problem with the zd1211rw driver is, that it copies the complete frame received from the device into the SKB and pulls later the five bytes ZD1211 uses for the PLCP information. This causes the 802.11 header to be on an odd address. The reported problems are caused by this. The zd1211rw-mac80211 is not affected, because the PLCP header is not copied into the skb and this way the 802.11 header becomes correctly aligned. Here is a patch, which should solve the zd1211rw alignment issues. It compiles, but it is not tested right now, because I got the idea while writing this e-mail. An official submission will follow. Uli diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c index a903645..fb54cd7 100644 --- a/drivers/net/wireless/zd1211rw/zd_mac.c +++ b/drivers/net/wireless/zd1211rw/zd_mac.c @@ -1166,15 +1166,22 @@ static void do_rx(unsigned long mac_ptr) int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length) { struct sk_buff *skb; + unsigned int length_to_reserve; - skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length); + /* This ensures that there is enough place for the radiotap header +* and the the 802.11 header is aligned by four following the +* five-byte ZD1211-specific PLCP header. +*/ + length_to_reserve = ((sizeof(struct zd_rt_hdr) + 3) ~3) + 3; + + skb = dev_alloc_skb(length_to_reserve + length); if (!skb) { struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac); dev_warn(zd_mac_dev(mac), Could not allocate skb.\n); ieee-stats.rx_dropped++; return -ENOMEM; } - skb_reserve(skb, sizeof(struct zd_rt_hdr)); + skb_reserve(skb, length_to_reserve); memcpy(__skb_put(skb, length), buffer, length); skb_queue_tail(mac-rx_queue, skb); tasklet_schedule(mac-rx_tasklet); -- Uli Kunitz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Sat, 2007-11-24 at 21:32 +0800, Herbert Xu wrote: On Sat, Nov 24, 2007 at 09:33:36AM +0100, Johannes Berg wrote: We still require four-byte alignment, no? Not at all. If NET_IP_ALIGN is zero then it won't be four-byte aligned (since the Ethernet header is 14 bytes long). Right. I just didn't think that would be a valid value for an architecture to set. johannes signature.asc Description: This is a digitally signed message part
Re: wireless vs. alignment requirements
From: Johannes Berg [EMAIL PROTECTED] Date: Sat, 24 Nov 2007 14:49:36 +0100 On Sat, 2007-11-24 at 21:32 +0800, Herbert Xu wrote: On Sat, Nov 24, 2007 at 09:33:36AM +0100, Johannes Berg wrote: We still require four-byte alignment, no? Not at all. If NET_IP_ALIGN is zero then it won't be four-byte aligned (since the Ethernet header is 14 bytes long). Right. I just didn't think that would be a valid value for an architecture to set. It is, and explicitly used by powerpc to get more of the DMA transfers 64-byte aligned which is critical for performance on some powerpc boxes. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Sat, Nov 24, 2007 at 02:49:36PM +0100, Johannes Berg wrote: Right. I just didn't think that would be a valid value for an architecture to set. OK. Let me clarify this a bit more. We require at least one of the following rules to be met: * the IPv4/IPv6 header is aligned by 8 bytes on reception; * or the platform provides unaligned exception handlers. So if your platform violates both rules then it won't work with the IP stack, simple as that. Fortunately I don't think such a platform exists currently on Linux. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ZD1211RW unaligned accesses...
On Wed, Nov 21, 2007 at 01:00:44PM +, Shaddy Baddah wrote: It hasn't seemed to. I patched the source (confirming the patched lines are in), compiled, installed and rebooted to effect the changes. My zd1211rw modules timestamp indicates that I have an updated module: Thanks for your quick response and sorry for my late answer :) I think Dave's patch is definietly on the right track but there are subsequent unaligned accesses of a similar kind which is why it still appears to be broken if you look at the kernel messages. But there is definitely progress because those addresses are now bigger (0x394/0x39c/0x3a8 vs. 0x2** earlier). So please try the following patch (instead of the original one) which should fix all the unailgned accesses in do_rx. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c index a903645..d06b05b 100644 --- a/drivers/net/wireless/zd1211rw/zd_mac.c +++ b/drivers/net/wireless/zd1211rw/zd_mac.c @@ -1166,15 +1166,16 @@ static void do_rx(unsigned long mac_ptr) int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length) { struct sk_buff *skb; + unsigned int hlen = ALIGN(sizeof(struct zd_rt_hdr), 16); - skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length); + skb = dev_alloc_skb(hlen + length); if (!skb) { struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac); dev_warn(zd_mac_dev(mac), Could not allocate skb.\n); ieee-stats.rx_dropped++; return -ENOMEM; } - skb_reserve(skb, sizeof(struct zd_rt_hdr)); + skb_reserve(skb, hlen - ZD_PLCP_HEADER_SIZE); memcpy(__skb_put(skb, length), buffer, length); skb_queue_tail(mac-rx_queue, skb); tasklet_schedule(mac-rx_tasklet); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ZD1211RW unaligned accesses...
Herbert Xu wrote: So please try the following patch (instead of the original one) which should fix all the unailgned accesses in do_rx. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c index a903645..d06b05b 100644 --- a/drivers/net/wireless/zd1211rw/zd_mac.c +++ b/drivers/net/wireless/zd1211rw/zd_mac.c @@ -1166,15 +1166,16 @@ static void do_rx(unsigned long mac_ptr) int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length) { struct sk_buff *skb; + unsigned int hlen = ALIGN(sizeof(struct zd_rt_hdr), 16); - skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length); + skb = dev_alloc_skb(hlen + length); if (!skb) { struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac); dev_warn(zd_mac_dev(mac), Could not allocate skb.\n); ieee-stats.rx_dropped++; return -ENOMEM; } - skb_reserve(skb, sizeof(struct zd_rt_hdr)); + skb_reserve(skb, hlen - ZD_PLCP_HEADER_SIZE); memcpy(__skb_put(skb, length), buffer, length); skb_queue_tail(mac-rx_queue, skb); tasklet_schedule(mac-rx_tasklet); ACK. This patch should solve it and is better than my patch. -- Uli Kunitz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[SKY2] Problems (2.6.24-rc3-git1)
Hi, A little while ago, something went horribly wrong. I could still use my mouse and the desktop was still alive more or less... everything using networking was dead AND the keyboard was dead... So i composed commands using existing text on the screen. The device: sky2 :02:00.0: v1.20 addr 0xdbffc000 irq 17 Yukon-EC (0xb6) rev 2 sky2 :02:00.0: PCI Express Advanced Error Reporting not configured or MMCONFIG problem? sky2 :02:00.0: No interrupt generated using MSI, switching to INTx mode. sky2 eth0: addr 00:15:f2:aa:8b:3e From dmesg: sky2 eth0: hung mac 124:39 fifo 195 (185:180) sky2 eth0: receiver hang detected sky2 eth0: disabling interface NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 442 .. 461 report=442 done=442 And it continues until i press the reset button. -- Ian Kumlien pomac () vapor ! com -- http://pomac.netswarm.net signature.asc Description: This is a digitally signed message part
RE: [PATCH 2.6.24 2/2]S2io: Fix to aggregate vlan tagged packets
Jeff, Does this patch still fail? Ram -Original Message- From: Jeff Garzik [mailto:[EMAIL PROTECTED] Sent: Friday, November 23, 2007 7:05 PM To: [EMAIL PROTECTED] Cc: netdev@vger.kernel.org; support Subject: Re: [PATCH 2.6.24 2/2]S2io: Fix to aggregate vlan tagged packets Ramkrishna Vepa wrote: - Fix to aggregate vlan packets. IP offset is incremented by 4 bytes if the packet contains vlan header. Signed-off-by: Santoshkumar Rastapur [EMAIL PROTECTED] Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED] --- ACK but cannot apply due to dropped patches - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
Simon, can you test this patch? I think it's the most straightforward 2.6.24 fix. diff -r c60016ba6237 net/core/netpoll.c --- a/net/core/netpoll.cTue Nov 13 09:09:36 2007 -0800 +++ b/net/core/netpoll.cFri Nov 23 13:10:28 2007 -0600 @@ -203,6 +203,12 @@ static void refill_skbs(void) spin_unlock_irqrestore(skb_pool.lock, flags); } +/* used to mark an skb as owned by netpoll */ +static void netpoll_skb_destroy(struct sk_buff *skb) +{ + return; +} + static void zap_completion_queue(void) { unsigned long flags; @@ -219,10 +225,12 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist-next; - if (skb-destructor) + if (skb-destructor == netpoll_skb_destroy) { + skb-destructor = NULL; + __kfree_skb(skb); + } + else dev_kfree_skb_any(skb); /* put this one back */ - else - __kfree_skb(skb); } } @@ -252,6 +260,7 @@ repeat: atomic_set(skb-users, 1); skb_reserve(skb, reserve); + skb-destructor = netpoll_skb_destroy; return skb; } -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
Herbert Xu wrote: On Sat, Nov 24, 2007 at 02:49:36PM +0100, Johannes Berg wrote: Right. I just didn't think that would be a valid value for an architecture to set. OK. Let me clarify this a bit more. We require at least one of the following rules to be met: * the IPv4/IPv6 header is aligned by 8 bytes on reception; * or the platform provides unaligned exception handlers. So if your platform violates both rules then it won't work with the IP stack, simple as that. Fortunately I don't think such a platform exists currently on Linux. Cheers, Then what about hardware that can't dma ethernet to non-aligned address. Sky2 hardware breaks if DMA is not 8 byte aligned. IMHO the IP stack should handle any alignment, and do the appropriate memove if the CPU requires alignment. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
OK. Let me clarify this a bit more. We require at least one of the following rules to be met: * the IPv4/IPv6 header is aligned by 8 bytes on reception; * or the platform provides unaligned exception handlers. So if your platform violates both rules then it won't work with the IP stack, simple as that. Fortunately I don't think such a platform exists currently on Linux. Ok, thanks for the clarification. Eight bytes really sucks for wireless, many things are multiples of four and QoS vs. non-QoS frames have a multiple of four and common hardware only adds two padding bytes to get it aligned on four bytes so there's no easy way to get hardware to align it properly. Hmm. johannes signature.asc Description: This is a digitally signed message part
Re: wireless vs. alignment requirements
Then what about hardware that can't dma ethernet to non-aligned address. Sky2 hardware breaks if DMA is not 8 byte aligned. IMHO the IP stack should handle any alignment, and do the appropriate memove if the CPU requires alignment. Wouldn't that better be handled in the driver rather than having the test in the generic RX path? johannes signature.asc Description: This is a digitally signed message part
[CFT][PATCH] proc_net: Remove userspace visible changes.
Ok. I have kicked around a lot implementation ideas and took a good hard look at my /proc/net implementation. The patch below should close all of the holes with /proc/net that I am aware of. Bind mounts work and properly capture /proc/net/ stat of /proc/net and /proc/net/ return the same information. cd /proc/net/ ; ls .. works The dentry has the proper parent and no longer appears deleted. As well as few more theoretical cases I have been able to imagine, like open(/proc/net, O_NOFOLLOW | O_DIRECTORY) getdents... Please take a look and kick this patch around. I don't expect anyone to find any issues but a few more eyeballs before I send this along to Linus would be appreciated. Thanks. From: Eric W. Biederman [EMAIL PROTECTED] Subject: [PATCH] proc_net: Remove userspace visible changes. This patch fixes some bugs in corner cases of the /proc/net implementation. In proc_net_shadow_dentry. - Set the parent dentry properly. - Make the dentry appear hashed so .. works. Remove the unreachable proc_net_lookup. Implement proc_net_getattr to complete the set of implemented inode operations. Implement proc_net_open which changes the directory we are openting to remove the need to implement any other file operations. Add a big fat comment on how /proc/net works to make it easier for someone else to look at and understand this code. This patch should remove the last of the accidental user visible artifacts that arose from adding network namespace support to /proc/net. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- fs/proc/proc_net.c | 116 +-- 1 files changed, 93 insertions(+), 23 deletions(-) diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 131f9c6..b0b4b3f 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -50,24 +50,69 @@ struct net *get_proc_net(const struct inode *inode) } EXPORT_SYMBOL_GPL(get_proc_net); +/* + * The contents of the files under /proc/net depend on which network + * namespace you are in. + * + * This implementation relies on the following properties. + * + * - Each network namespaces has it's own /proc/net dcache tree. + * - A directory with a follow_link method never calls lookup + * - It is possible in -open to competely change which underlying + * filesystem, path, and inode the struct file refers to. + * - A dcache entry with DCACHE_UNHASHED clear and pprev set + * appares hashed (and thus valid) to the dcache. + * + * To give each network namespace it's own /proc/net directory + * in a manner transparent to user space (and not requiring /proc) + * be remounted we do the following things: + * + * Keep a different dentry tree for each network namespace under + * /proc/net. + * + * Have the root of the /proc/net dentry tree be a ``unhashed'' + * dentry with it's root pointing at the /proc dentry. Making + * it appear in parallel with the normal /proc/net. + * + * Redirect all opens of the normal /proc/net to the one appropriate + * for the opening process in -open. + * + * Redirect all directory traversals onto the appropriate /proc/net + * with a follow_link method. + * + * Wrap all other applicable inode operations so they appear to + * happen not on the normal /proc/net but on the network namespace + * specific one. + * + * Currently we can use a bind mount inside a network namespace + * to /proc/net visible to processes outside that network namespace. + * Long term /proc/net should migrate to /proc/pid/net removing + * the need for the bind mount for monitoring processes. + */ + static struct proc_dir_entry *proc_net_shadow; -static struct dentry *proc_net_shadow_dentry(struct dentry *parent, - struct proc_dir_entry *de) +static struct dentry *proc_net_shadow_dentry(struct net *net, +struct dentry *dentry) { + struct proc_dir_entry *de = net-proc_net; struct dentry *shadow = NULL; struct inode *inode; if (!de) goto out; de_get(de); - inode = proc_get_inode(parent-d_inode-i_sb, de-low_ino, de); + inode = proc_get_inode(dentry-d_sb, de-low_ino, de); if (!inode) goto out_de_put; - shadow = d_alloc_name(parent, de-name); + shadow = d_alloc(dentry-d_parent, dentry-d_name); if (!shadow) goto out_iput; - shadow-d_op = parent-d_op; /* proc_dentry_operations */ + shadow-d_op = dentry-d_op; /* proc_dentry_operations */ d_instantiate(shadow, inode); + + /* Make the dentry looked hashed */ + shadow-d_hash.pprev = shadow-d_hash.next; + shadow-d_flags = ~DCACHE_UNHASHED; out: return shadow; out_iput: @@ -77,36 +122,36 @@ out_de_put: goto out; } -static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd) +static void *proc_net_follow_link(struct dentry *dentry, struct nameidata
2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
Hi, I have recently assembled a Core 2 Duo system with 4GB RAM and I believe there might be a bug in the r8169 driver in 4GB RAM configurations. Initially I can use one of two active r8169 NICs on the motherboard with this quantity of RAM with other devices, without issue. But after some amount of data (generally about 50MB), no more network packets are sent/received. The choke affects other devices on the system too, notably libata, which does not recover gracefully. In my logs, I see a stream of: DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 The device :04:00.0 corresponds to one of the r8169s. The reason I believe r8169 is at fault is that I was doing a rebuild of my RAID5 across 3 SATA drives via libata's ahci driver, and transferring over the network. When the choke occurred the RAID sync stopped, libata errors were seen, and I simply did a ifconfig br0 down (which contained the r8169) and the messages went away. Bringing the NIC up again would see some initial functionality then very rapidly it would go back to the same error messages. The Intel chipset I am using does not support any kind of hardware IOMMU, so I am forced to use swiotlb in a 4GB RAM configuration. In an attempt to delay the failures, I used the swiotlb option to increase the swiotlb's page allocation with swiotlb=65536 (which seems to correspond to a 256MB bounce buffer). Assuming both libata and r8169 use the swiotlb, and both systems are impaired when these messages appear, removing r8169 would appear to be key. Indeed, if there is no significant libata activity, the problem still occurs on the NIC within approximately the same amount of transfer. This option delays the failure for some time but it will happen eventually, which makes me suspicious that maybe the driver is somehow pinning an area of the buffer and not releasing it. (I hunted bugzilla for reports similar to this one, but couldn't find anything.) Having tested the r8169 driver on an AMD system I did not experience the same problems with 4GB RAM, so this could be a bug specific to swiotlb. I would have added more people to CC but I have no idea who might be responsible. Andrew, I've added you just in case you're aware of other similar reports (maybe r8169 on big iron) and have anybody from the sw-iommu camp that could be added to CC. -- Cheers, Alistair. 137/1 Warrender Park Road, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
Alistair John Strachan [EMAIL PROTECTED] : [...] The choke affects other devices on the system too, notably libata, which does not recover gracefully. In my logs, I see a stream of: DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 You are using jumbo frames, aren't you ? -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
when these messages appear, removing r8169 would appear to be key. Indeed, if there is no significant libata activity, the problem still occurs on the NIC within approximately the same amount of transfer. You seem to have a leak, which actually isn't suprising rtl8169_xmit_frags allocates a set of maps for a fragmented packet rtl8169_start_xmit allocates a buffer When we finish the transit we free the main buffer (always using skb-len when sometimes its skb-headlne. We don't seem to free the fragment buffers at all. Looks like the unmap path for fragmented packets is broken with any kind of iommu Alan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Sat, Nov 24, 2007 at 12:11:08PM -0800, Stephen Hemminger wrote: Then what about hardware that can't dma ethernet to non-aligned address. Sky2 hardware breaks if DMA is not 8 byte aligned. IMHO the IP stack should handle any alignment, and do the appropriate memove if the CPU requires alignment. Luckily all sky2 users have been on x86 so far :) Here's an idea. Put the data of the packet into the page frags where alignment is not an issue but copy the header so that it is aligned. Would that work? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
Alan Cox [EMAIL PROTECTED] : [...] You seem to have a leak, which actually isn't suprising rtl8169_xmit_frags allocates a set of maps for a fragmented packet rtl8169_start_xmit allocates a buffer When we finish the transit we free the main buffer (always using skb-len when sometimes its skb-headlne. We don't seem to free the fragment buffers at all. Looks like the unmap path for fragmented packets is broken with any kind of iommu Are you referring to the pci_unmap part ? There is a 1:1 correspondance between a Tx descriptor entry and {an unfragmented skb or a fragment of a skb}. Afaiks rtl8169_unmap_tx_skb() is issued for each Tx descriptor entry, be it after a Tx completion irq or a general Tx ring cleanup. I'll read it again after some sleep but the leak does not seem clear to me. -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
Francois Romieu [EMAIL PROTECTED] : Alistair John Strachan [EMAIL PROTECTED] : [...] The choke affects other devices on the system too, notably libata, which does not recover gracefully. In my logs, I see a stream of: DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 You are using jumbo frames, aren't you ? See below for my late night crap. At least it should avoid the driver issuing Rx/Tx DMA with the single static buffer of lib/swiotlb.c (io_tlb_overflow_buffer). Ghee. diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 1f647b9..72a7370 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -2262,10 +2262,16 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct pci_dev *pdev, mapping = pci_map_single(pdev, skb-data, rx_buf_sz, PCI_DMA_FROMDEVICE); + if (pci_dma_mapping_error(mapping)) + goto err_kfree_skb; + rtl8169_map_to_asic(desc, mapping, rx_buf_sz); out: return skb; +err_kfree_skb: + dev_kfree_skb(skb); + skb = NULL; err_out: rtl8169_make_unusable_by_asic(desc); goto out; @@ -2486,6 +2492,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb, dma_addr_t mapping; u32 status, len; void *addr; + int rc; entry = (entry + 1) % NUM_TX_DESC; @@ -2493,6 +2500,22 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb, len = frag-size; addr = ((void *) page_address(frag-page)) + frag-page_offset; mapping = pci_map_single(tp-pci_dev, addr, len, PCI_DMA_TODEVICE); + rc = pci_dma_mapping_error(mapping); + if (unlikely(rc 0)) { + while (cur_frag-- 0) { + frag = info-frags + cur_frag; + entry = (entry - 1) % NUM_TX_DESC; + txd = tp-TxDescArray + entry; + len = frag-size; + mapping = le64_to_cpu(txd-addr); + pci_unmap_single(tp-pci_dev, mapping, len, +PCI_DMA_TODEVICE); + txd-opts1 = 0x00; + txd-opts2 = 0x00; + txd-addr = 0x00; + } + return rc; + } /* anti gcc 2.95.3 bugware (sic) */ status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC)); @@ -2534,13 +2557,13 @@ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev) static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - unsigned int frags, entry = tp-cur_tx % NUM_TX_DESC; + unsigned int entry = tp-cur_tx % NUM_TX_DESC; struct TxDesc *txd = tp-TxDescArray + entry; void __iomem *ioaddr = tp-mmio_addr; dma_addr_t mapping; u32 status, len; u32 opts1; - int ret = NETDEV_TX_OK; + int frags, ret = NETDEV_TX_OK; if (unlikely(TX_BUFFS_AVAIL(tp) skb_shinfo(skb)-nr_frags)) { if (netif_msg_drv(tp)) { @@ -2557,7 +2580,11 @@ static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) opts1 = DescOwn | rtl8169_tso_csum(skb, dev); frags = rtl8169_xmit_frags(tp, skb, opts1); - if (frags) { + if (frags 0) { + printk(KERN_ERR %s: PCI mapping failure (%d).\n, dev-name, + frags); + goto err_busy; + } else if (frags 0) { len = skb_headlen(skb); opts1 |= FirstFrag; } else { @@ -2605,6 +2632,7 @@ out: err_stop: netif_stop_queue(dev); +err_busy: ret = NETDEV_TX_BUSY; err_update_stats: dev-stats.tx_dropped++; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Sat, Nov 24, 2007 at 10:13:19PM +0100, Johannes Berg wrote: Eight bytes really sucks for wireless, many things are multiples of four and QoS vs. non-QoS frames have a multiple of four and common hardware only adds two padding bytes to get it aligned on four bytes so there's no easy way to get hardware to align it properly. Hmm. Sorry I was wrong about the 8 bytes requirement. Although the IPv6 protocol does try to maintain an 8-byte alignment the Linux stack never does anything that requires that. So 4 bytes is enough. However, the wireless core is definitely not out of the woods. It needs to support variable hardware header lengths that are not always a multiple of 4. So here's my suggestion. Modify the wireless core to fix up any packets which aren't aligned correctly. That should make it work albeit in a way that's less than optimal. Then for each driver where you care about this performance (seriously I wouldn't for the speeds these things run at :), pick the most common wireless hardware header length and have the IP (or any other upper-level protocol) header aligned to at least 4 bytes. Or better if you know what hardware header length that you're going to get (e.g., based on what mode you're in) then do the skb_reserve accordingly. It's a good thing these things aren't running at 10Gb :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
On Sunday 25 November 2007 00:25:10 Francois Romieu wrote: Alistair John Strachan [EMAIL PROTECTED] : [...] The choke affects other devices on the system too, notably libata, which does not recover gracefully. In my logs, I see a stream of: DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 You are using jumbo frames, aren't you ? Yes, 7200 byte frames. I'll certainly try out your patch and report back. -- Cheers, Alistair. 137/1 Warrender Park Road, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
On Sunday 25 November 2007 01:27:54 Francois Romieu wrote: Francois Romieu [EMAIL PROTECTED] : Alistair John Strachan [EMAIL PROTECTED] : [...] The choke affects other devices on the system too, notably libata, which does not recover gracefully. In my logs, I see a stream of: DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0 You are using jumbo frames, aren't you ? See below for my late night crap. At least it should avoid the driver issuing Rx/Tx DMA with the single static buffer of lib/swiotlb.c (io_tlb_overflow_buffer). Ghee. No improvement. It might be possible to reproduce the problem on your end if you add iommu support and force enable the swiotlb (which should be possible even with 4GB RAM). diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 1f647b9..72a7370 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -2262,10 +2262,16 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct pci_dev *pdev, mapping = pci_map_single(pdev, skb-data, rx_buf_sz, PCI_DMA_FROMDEVICE); + if (pci_dma_mapping_error(mapping)) + goto err_kfree_skb; + rtl8169_map_to_asic(desc, mapping, rx_buf_sz); out: return skb; +err_kfree_skb: + dev_kfree_skb(skb); + skb = NULL; err_out: rtl8169_make_unusable_by_asic(desc); goto out; @@ -2486,6 +2492,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb, dma_addr_t mapping; u32 status, len; void *addr; + int rc; entry = (entry + 1) % NUM_TX_DESC; @@ -2493,6 +2500,22 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb, len = frag-size; addr = ((void *) page_address(frag-page)) + frag-page_offset; mapping = pci_map_single(tp-pci_dev, addr, len, PCI_DMA_TODEVICE); + rc = pci_dma_mapping_error(mapping); + if (unlikely(rc 0)) { + while (cur_frag-- 0) { + frag = info-frags + cur_frag; + entry = (entry - 1) % NUM_TX_DESC; + txd = tp-TxDescArray + entry; + len = frag-size; + mapping = le64_to_cpu(txd-addr); + pci_unmap_single(tp-pci_dev, mapping, len, + PCI_DMA_TODEVICE); + txd-opts1 = 0x00; + txd-opts2 = 0x00; + txd-addr = 0x00; + } + return rc; + } /* anti gcc 2.95.3 bugware (sic) */ status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC)); @@ -2534,13 +2557,13 @@ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev) static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - unsigned int frags, entry = tp-cur_tx % NUM_TX_DESC; + unsigned int entry = tp-cur_tx % NUM_TX_DESC; struct TxDesc *txd = tp-TxDescArray + entry; void __iomem *ioaddr = tp-mmio_addr; dma_addr_t mapping; u32 status, len; u32 opts1; - int ret = NETDEV_TX_OK; + int frags, ret = NETDEV_TX_OK; if (unlikely(TX_BUFFS_AVAIL(tp) skb_shinfo(skb)-nr_frags)) { if (netif_msg_drv(tp)) { @@ -2557,7 +2580,11 @@ static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) opts1 = DescOwn | rtl8169_tso_csum(skb, dev); frags = rtl8169_xmit_frags(tp, skb, opts1); - if (frags) { + if (frags 0) { + printk(KERN_ERR %s: PCI mapping failure (%d).\n, dev-name, +frags); + goto err_busy; + } else if (frags 0) { len = skb_headlen(skb); opts1 |= FirstFrag; } else { @@ -2605,6 +2632,7 @@ out: err_stop: netif_stop_queue(dev); +err_busy: ret = NETDEV_TX_BUSY; err_update_stats: dev-stats.tx_dropped++; -- Cheers, Alistair. 137/1 Warrender Park Road, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: eth0: hung mac 7:69 fifo 0 (165:176)
Paul Collins wrote: Hi Stephen, Running amd64 kernel built from 2ffbb8377c7a0713baf6644e285adc27a5654582 after about three days of uptime, this morning I found the network dead and the following in dmesg: sky2 eth0: hung mac 7:69 fifo 0 (165:176) sky2 eth0: receiver hang detected sky2 eth0: disabling interface NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 26 .. 26 report=26 done=26 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 26 .. 26 report=26 done=26 The watchdog had been blorping for about three hours when I discovered it and rebooted the machine. Hello, I have exactly the same problem with my 88E8053 on 2.6.24-rc3 here. While there have always been issues with sky2 on that particular board, now the situation is worse than ever. Netdev watchdog goes into an endless loop reporting timeouts and the whole machine goes down to the point that I'm forced to reset (not even SysRq works). Here's the snippet from the log: sky2 eth0: hung mac 123:3 fifo 194 (150:144) sky2 eth0: receiver hang detected sky2 eth0: disabling interface NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 178 .. 188 report=178 done=178 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 178 .. 188 report=178 done=178 NETDEV WATCHDOG: eth0: transmit timed out sky2 eth0: tx timeout sky2 eth0: transmit ring 178 .. 188 report=178 done=178 NETDEV WATCHDOG: eth0: transmit timed out The board is identical to Paul's. While mac hangs were common in 2.6.23 and earlier, it was possible to recover the interface (either automatically, or by manual rmmod/modprobe). I can't reliably reproduce the issue, but it consistently comes up a couple of times a day during high network load. Any hints, patches are highly appreciated. Thanks, -- Elvis - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: forcedeth ethernet driver Low power state
On Sun, 25 Nov 2007 03:52:33 +0100 Jeroen [EMAIL PROTECTED] wrote: Hi, I'm migrating my server from windows 2003 server to Ubuntu, but I am stumbling over the Low Power State Link Speed option for my NIC (forcedeth) I need to disable this option in my windows driver otherwise the trough pout is horrible because the link fluctuates constantly from 100/1000. Anyway, my question is where and how can I turn off this feature for the forcedeth driver? I've looked in the source and as far as I can tell there is no bootoption for this. There are some references noted in the code, but AFAIK no setting. Any ideas? Thanks in advance! (cc's added) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html