Re: wireless vs. alignment requirements

2007-11-24 Thread Johannes Berg

  Now, the IP stack actually assumes that its header is four-byte aligned
  (see comment at NET_IP_ALIGN, although it is not said explicitly that
  the alignment requirement for an IP header is four) so that is actually
  something for the hardware/firmware (!) to do, for example Broadcom
 
 Good point.  In fact IIRC we've always had the policy that drivers
 should do their best to generate aligned packets but it is not a
 requirement since on some platforms it's more important for the DMA
 to be aligned.

We still require four-byte alignment, no?

 So it's up the platform code to fix up any exceptions should they
 show up.
 
 Daniel, what's the specific case that you had in mind with this
 patch?

Well. This goes back to a user reporting unaligned accesses on sparc64.
Davem thought this came from the ether addr comparisons but the user
later reported that the patch from davem didn't fix it, and I think
Daniel just made a sweep over all ether addr comparisons replacing them
with unaligned ones.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-24 Thread Andi Kleen
On Sat, Nov 24, 2007 at 03:53:34PM +1100, Rusty Russell wrote:
 So, you're saying that there's a problem with in-tree modules using symbols 
 they shouldn't?  Can you give an example?
 
  I believe that is fairly important in tree too because the
  kernel has become so big now that review cannot be the only
  enforcement mechanism for this anymore.
 
 If people aren't reviewing, this won't make them review.  I don't think the 

With millions of LOC the primary maintainers cannot review everything.
It's not that anybody is doing a bad job -- it is just so much code
that explicit mechanisms are better than implicit contracts.

 problem is that people are conniving to avoid review.

No of course not -- it is just too much code to let everything
be reviewed by the core subsystem maintainers. But with explicit
marking of internal symbols they would need to look at it because
the relationship will be clearly spelled out in the code.

  Several distributions have policies that require to 
  keep the changes to these exported interfaces minimal and that
  is very hard with thousands of exported symbol.  With name spaces
  the number of truly publicly exported symbols will hopefully
  shrink to a much smaller, more manageable set.
 
 *This* makes sense.  But it's not clear that the burden should be placed on 
 kernel coders.  You can create a list yourself.  How do I tell the difference 
 between truly publicly exported symbols and others?

Out of tree solutions generally do not scale.  Nobody else can 
keep up with 2+ Million changes each merge window.

 
 If a symbol has more than one in-tree user, it's hard to argue against an 

There are still classes of drivers. e.g. for the SCSI example: SD,SG,SR etc.
are more internal while low level drivers like aic7xxx are clearly external
drivers.

 out-of-tree module using the symbol, unless you're arguing against *all* 
 out-of-tree modules.

No, actually namespaces kind of help out of tree modules. Once they only
use interfaces that are really generic driver interfaces and fairly stable
their authors will have much less pain forward porting to newer kernel
version. But currently the authors cannot even know what is an instable
internal interface and what is a generic relatively stable driver level
interface. Namespaces are a mechanism to make this all explicit.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Herbert Xu
On Sat, Nov 24, 2007 at 09:33:36AM +0100, Johannes Berg wrote:

 We still require four-byte alignment, no?

Not at all.  If NET_IP_ALIGN is zero then it won't be four-byte
aligned (since the Ethernet header is 14 bytes long).

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Ulrich Kunitz
Johannes,

 Hence, going back to the 802.11 header and the IP header alignment
 requirement, if we get the IP header alignment requirement right now I
 cannot possibly see any way we would use compare_ether_addr() on an
 address that is not at least two-byte aligned as required.

ACK. I agree completely.

The problem with the zd1211rw driver is, that it copies the
complete frame received from the device into the SKB and pulls
later the five bytes ZD1211 uses for the PLCP information.
This causes the 802.11 header to be on an odd address. The
reported problems are caused by this.

The zd1211rw-mac80211 is not affected, because the PLCP header is
not copied into the skb and this way the 802.11 header becomes
correctly aligned.

Here is a patch, which should solve the zd1211rw alignment issues.
It compiles, but it is not tested right now, because I got the
idea while writing this e-mail. An official submission will
follow.

Uli

diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index a903645..fb54cd7 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -1166,15 +1166,22 @@ static void do_rx(unsigned long mac_ptr)
 int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length)
 {
struct sk_buff *skb;
+   unsigned int length_to_reserve;
 
-   skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length);
+   /* This ensures that there is enough place for the radiotap header
+* and the the 802.11 header is aligned by four following the
+* five-byte ZD1211-specific PLCP header.
+*/
+   length_to_reserve = ((sizeof(struct zd_rt_hdr) + 3)  ~3) + 3;
+
+   skb = dev_alloc_skb(length_to_reserve + length);
if (!skb) {
struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac);
dev_warn(zd_mac_dev(mac), Could not allocate skb.\n);
ieee-stats.rx_dropped++;
return -ENOMEM;
}
-   skb_reserve(skb, sizeof(struct zd_rt_hdr));
+   skb_reserve(skb, length_to_reserve);
memcpy(__skb_put(skb, length), buffer, length);
skb_queue_tail(mac-rx_queue, skb);
tasklet_schedule(mac-rx_tasklet);

-- 
Uli Kunitz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Johannes Berg

On Sat, 2007-11-24 at 21:32 +0800, Herbert Xu wrote:
 On Sat, Nov 24, 2007 at 09:33:36AM +0100, Johannes Berg wrote:
 
  We still require four-byte alignment, no?
 
 Not at all.  If NET_IP_ALIGN is zero then it won't be four-byte
 aligned (since the Ethernet header is 14 bytes long).

Right. I just didn't think that would be a valid value for an
architecture to set.

johannes


signature.asc
Description: This is a digitally signed message part


Re: wireless vs. alignment requirements

2007-11-24 Thread David Miller
From: Johannes Berg [EMAIL PROTECTED]
Date: Sat, 24 Nov 2007 14:49:36 +0100

 
 On Sat, 2007-11-24 at 21:32 +0800, Herbert Xu wrote:
  On Sat, Nov 24, 2007 at 09:33:36AM +0100, Johannes Berg wrote:
  
   We still require four-byte alignment, no?
  
  Not at all.  If NET_IP_ALIGN is zero then it won't be four-byte
  aligned (since the Ethernet header is 14 bytes long).
 
 Right. I just didn't think that would be a valid value for an
 architecture to set.

It is, and explicitly used by powerpc to get more of the
DMA transfers 64-byte aligned which is critical for
performance on some powerpc boxes.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Herbert Xu
On Sat, Nov 24, 2007 at 02:49:36PM +0100, Johannes Berg wrote:
 
 Right. I just didn't think that would be a valid value for an
 architecture to set.

OK.  Let me clarify this a bit more.  We require at least one
of the following rules to be met:

* the IPv4/IPv6 header is aligned by 8 bytes on reception;
* or the platform provides unaligned exception handlers.

So if your platform violates both rules then it won't work with
the IP stack, simple as that.  Fortunately I don't think such a
platform exists currently on Linux.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ZD1211RW unaligned accesses...

2007-11-24 Thread Herbert Xu
On Wed, Nov 21, 2007 at 01:00:44PM +, Shaddy Baddah wrote:

 It hasn't seemed to. I patched the source (confirming the patched lines 
 are in), compiled, installed and rebooted to effect the changes. My 
 zd1211rw modules timestamp indicates that I have an updated module:

Thanks for your quick response and sorry for my late answer :)

I think Dave's patch is definietly on the right track but there
are subsequent unaligned accesses of a similar kind which is
why it still appears to be broken if you look at the kernel
messages.

But there is definitely progress because those addresses are now
bigger (0x394/0x39c/0x3a8 vs. 0x2** earlier).

So please try the following patch (instead of the original one)
which should fix all the unailgned accesses in do_rx.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
b/drivers/net/wireless/zd1211rw/zd_mac.c
index a903645..d06b05b 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -1166,15 +1166,16 @@ static void do_rx(unsigned long mac_ptr)
 int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length)
 {
struct sk_buff *skb;
+   unsigned int hlen = ALIGN(sizeof(struct zd_rt_hdr), 16);
 
-   skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length);
+   skb = dev_alloc_skb(hlen + length);
if (!skb) {
struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac);
dev_warn(zd_mac_dev(mac), Could not allocate skb.\n);
ieee-stats.rx_dropped++;
return -ENOMEM;
}
-   skb_reserve(skb, sizeof(struct zd_rt_hdr));
+   skb_reserve(skb, hlen - ZD_PLCP_HEADER_SIZE);
memcpy(__skb_put(skb, length), buffer, length);
skb_queue_tail(mac-rx_queue, skb);
tasklet_schedule(mac-rx_tasklet);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ZD1211RW unaligned accesses...

2007-11-24 Thread Ulrich Kunitz
Herbert Xu wrote:

 So please try the following patch (instead of the original one)
 which should fix all the unailgned accesses in do_rx.
 
 Cheers,
 -- 
 Visit Openswan at http://www.openswan.org/
 Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
 Home Page: http://gondor.apana.org.au/~herbert/
 PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
 --
 diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c 
 b/drivers/net/wireless/zd1211rw/zd_mac.c
 index a903645..d06b05b 100644
 --- a/drivers/net/wireless/zd1211rw/zd_mac.c
 +++ b/drivers/net/wireless/zd1211rw/zd_mac.c
 @@ -1166,15 +1166,16 @@ static void do_rx(unsigned long mac_ptr)
  int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length)
  {
   struct sk_buff *skb;
 + unsigned int hlen = ALIGN(sizeof(struct zd_rt_hdr), 16);
  
 - skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length);
 + skb = dev_alloc_skb(hlen + length);
   if (!skb) {
   struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac);
   dev_warn(zd_mac_dev(mac), Could not allocate skb.\n);
   ieee-stats.rx_dropped++;
   return -ENOMEM;
   }
 - skb_reserve(skb, sizeof(struct zd_rt_hdr));
 + skb_reserve(skb, hlen - ZD_PLCP_HEADER_SIZE);
   memcpy(__skb_put(skb, length), buffer, length);
   skb_queue_tail(mac-rx_queue, skb);
   tasklet_schedule(mac-rx_tasklet);

ACK. This patch should solve it and is better than my patch.

-- 
Uli Kunitz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[SKY2] Problems (2.6.24-rc3-git1)

2007-11-24 Thread Ian Kumlien
Hi,

A little while ago, something went horribly wrong.

I could still use my mouse and the desktop was still alive more or
less... everything using networking was dead AND the keyboard was
dead... So i composed commands using existing text on the screen.

The device:
sky2 :02:00.0: v1.20 addr 0xdbffc000 irq 17 Yukon-EC (0xb6) rev 2
sky2 :02:00.0: PCI Express Advanced Error Reporting not configured or 
MMCONFIG problem?
sky2 :02:00.0: No interrupt generated using MSI, switching to INTx mode.
sky2 eth0: addr 00:15:f2:aa:8b:3e

From dmesg:
sky2 eth0: hung mac 124:39 fifo 195 (185:180)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 442 .. 461 report=442 done=442

And it continues until i press the reset button.


-- 
Ian Kumlien pomac () vapor ! com -- http://pomac.netswarm.net


signature.asc
Description: This is a digitally signed message part


RE: [PATCH 2.6.24 2/2]S2io: Fix to aggregate vlan tagged packets

2007-11-24 Thread Ramkrishna Vepa
Jeff,

Does this patch still fail?

Ram

 -Original Message-
 From: Jeff Garzik [mailto:[EMAIL PROTECTED]
 Sent: Friday, November 23, 2007 7:05 PM
 To: [EMAIL PROTECTED]
 Cc: netdev@vger.kernel.org; support
 Subject: Re: [PATCH 2.6.24 2/2]S2io: Fix to aggregate vlan tagged
packets
 
 Ramkrishna Vepa wrote:
  - Fix to aggregate vlan packets. IP offset is incremented by
4 bytes if the packet contains vlan header.
 
  Signed-off-by: Santoshkumar Rastapur [EMAIL PROTECTED]
  Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED]
  ---
 
 ACK but cannot apply due to dropped patches
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-24 Thread Matt Mackall
Simon, can you test this patch? I think it's the most straightforward
2.6.24 fix.

diff -r c60016ba6237 net/core/netpoll.c
--- a/net/core/netpoll.cTue Nov 13 09:09:36 2007 -0800
+++ b/net/core/netpoll.cFri Nov 23 13:10:28 2007 -0600
@@ -203,6 +203,12 @@ static void refill_skbs(void)
spin_unlock_irqrestore(skb_pool.lock, flags);
 }
 
+/* used to mark an skb as owned by netpoll */
+static void netpoll_skb_destroy(struct sk_buff *skb)
+{
+   return;
+}
+
 static void zap_completion_queue(void)
 {
unsigned long flags;
@@ -219,10 +225,12 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist-next;
-   if (skb-destructor)
+   if (skb-destructor == netpoll_skb_destroy) {
+   skb-destructor = NULL;
+   __kfree_skb(skb);
+   }
+   else
dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
}
}
 
@@ -252,6 +260,7 @@ repeat:
 
atomic_set(skb-users, 1);
skb_reserve(skb, reserve);
+   skb-destructor = netpoll_skb_destroy;
return skb;
 }
 

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Stephen Hemminger

Herbert Xu wrote:

On Sat, Nov 24, 2007 at 02:49:36PM +0100, Johannes Berg wrote:
  

Right. I just didn't think that would be a valid value for an
architecture to set.



OK.  Let me clarify this a bit more.  We require at least one
of the following rules to be met:

* the IPv4/IPv6 header is aligned by 8 bytes on reception;
* or the platform provides unaligned exception handlers.

So if your platform violates both rules then it won't work with
the IP stack, simple as that.  Fortunately I don't think such a
platform exists currently on Linux.

Cheers,
  


Then what about hardware that can't dma ethernet to non-aligned address.
Sky2 hardware breaks if DMA is not 8 byte aligned.  IMHO the IP stack should
handle any alignment, and do the appropriate memove if the CPU requires 
alignment.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Johannes Berg

 OK.  Let me clarify this a bit more.  We require at least one
 of the following rules to be met:
 
 * the IPv4/IPv6 header is aligned by 8 bytes on reception;
 * or the platform provides unaligned exception handlers.
 
 So if your platform violates both rules then it won't work with
 the IP stack, simple as that.  Fortunately I don't think such a
 platform exists currently on Linux.

Ok, thanks for the clarification.

Eight bytes really sucks for wireless, many things are multiples of four
and QoS vs. non-QoS frames have a multiple of four and common hardware
only adds two padding bytes to get it aligned on four bytes so there's
no easy way to get hardware to align it properly. Hmm.

johannes


signature.asc
Description: This is a digitally signed message part


Re: wireless vs. alignment requirements

2007-11-24 Thread Johannes Berg

 Then what about hardware that can't dma ethernet to non-aligned address.
 Sky2 hardware breaks if DMA is not 8 byte aligned.  IMHO the IP stack should
 handle any alignment, and do the appropriate memove if the CPU requires 
 alignment.

Wouldn't that better be handled in the driver rather than having the
test in the generic RX path?

johannes


signature.asc
Description: This is a digitally signed message part


[CFT][PATCH] proc_net: Remove userspace visible changes.

2007-11-24 Thread Eric W. Biederman

Ok.  I have kicked around a lot implementation ideas and took a good hard
look at my /proc/net implementation.  The patch below should close all
of the holes with /proc/net that I am aware of.

Bind mounts work and properly capture /proc/net/
stat of /proc/net and /proc/net/ return the same information.
cd /proc/net/ ; ls .. works
The dentry has the proper parent and no longer appears deleted.

As well as few more theoretical cases I have been able to imagine,
like open(/proc/net, O_NOFOLLOW | O_DIRECTORY) getdents...

Please take a look and kick this patch around.  I don't expect anyone
to find any issues but a few more eyeballs before I send this
along to Linus would be appreciated.  Thanks.


From: Eric W. Biederman [EMAIL PROTECTED]
Subject: [PATCH] proc_net: Remove userspace visible changes.

This patch fixes some bugs in corner cases of the /proc/net
implementation.

In proc_net_shadow_dentry.
- Set the parent dentry properly.
- Make the dentry appear hashed so .. works.

Remove the unreachable proc_net_lookup.

Implement proc_net_getattr to complete the
set of implemented inode operations.

Implement proc_net_open which changes the directory we
are openting to remove the need to implement any other
file operations.

Add a big fat comment on how /proc/net works to make it
easier for someone else to look at and understand this code.

This patch should remove the last of the accidental user visible artifacts
that arose from adding network namespace support to /proc/net.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 fs/proc/proc_net.c |  116 +--
 1 files changed, 93 insertions(+), 23 deletions(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 131f9c6..b0b4b3f 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -50,24 +50,69 @@ struct net *get_proc_net(const struct inode *inode)
 }
 EXPORT_SYMBOL_GPL(get_proc_net);
 
+/*
+ * The contents of the files under /proc/net depend on which network
+ * namespace you are in.  
+ *
+ * This implementation relies on the following properties.
+ *
+ * - Each network namespaces has it's own /proc/net dcache tree.
+ * - A directory with a follow_link method never calls lookup
+ * - It is possible in -open to competely change which underlying
+ *   filesystem, path, and inode the struct file refers to.
+ * - A dcache entry with DCACHE_UNHASHED clear and pprev set
+ *   appares hashed (and thus valid) to the dcache.
+ *
+ * To give each network namespace it's own /proc/net directory
+ * in a manner transparent to user space (and not requiring /proc)
+ * be remounted we do the following things:
+ *
+ *   Keep a different dentry tree for each network namespace under
+ *   /proc/net.
+ *
+ *   Have the root of the /proc/net dentry tree be a ``unhashed''
+ *   dentry with it's root pointing at the /proc dentry.  Making
+ *   it appear in parallel with the normal /proc/net.
+ *
+ *   Redirect all opens of the normal /proc/net to the one appropriate
+ *   for the opening process in -open.
+ *
+ *   Redirect all directory traversals onto the appropriate /proc/net
+ *   with a follow_link method.
+ *
+ *   Wrap all other applicable inode operations so they appear to
+ *   happen not on the normal /proc/net but on the network namespace
+ *   specific one.
+ *
+ * Currently we can use a bind mount inside a network namespace
+ * to /proc/net visible to processes outside that network namespace.
+ * Long term /proc/net should migrate to /proc/pid/net removing
+ * the need for the bind mount for monitoring processes.
+ */
+
 static struct proc_dir_entry *proc_net_shadow;
 
-static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
-   struct proc_dir_entry *de)
+static struct dentry *proc_net_shadow_dentry(struct net *net,
+struct dentry *dentry)
 {
+   struct proc_dir_entry *de = net-proc_net;
struct dentry *shadow = NULL;
struct inode *inode;
if (!de)
goto out;
de_get(de);
-   inode = proc_get_inode(parent-d_inode-i_sb, de-low_ino, de);
+   inode = proc_get_inode(dentry-d_sb, de-low_ino, de);
if (!inode)
goto out_de_put;
-   shadow = d_alloc_name(parent, de-name);
+   shadow = d_alloc(dentry-d_parent, dentry-d_name);
if (!shadow)
goto out_iput;
-   shadow-d_op = parent-d_op; /* proc_dentry_operations */
+   shadow-d_op = dentry-d_op; /* proc_dentry_operations */
d_instantiate(shadow, inode);
+
+   /* Make the dentry looked hashed */
+   shadow-d_hash.pprev = shadow-d_hash.next;
+   shadow-d_flags = ~DCACHE_UNHASHED;
 out:
return shadow;
 out_iput:
@@ -77,36 +122,36 @@ out_de_put:
goto out;
 }
 
-static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
+static void *proc_net_follow_link(struct dentry *dentry, struct nameidata 

2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Alistair John Strachan
Hi,

I have recently assembled a Core 2 Duo system with 4GB RAM and I believe there 
might be a bug in the r8169 driver in 4GB RAM configurations.

Initially I can use one of two active r8169 NICs on the motherboard with this 
quantity of RAM with other devices, without issue. But after some amount of 
data (generally about 50MB), no more network packets are sent/received.

The choke affects other devices on the system too, notably libata, which 
does not recover gracefully. In my logs, I see a stream of:

DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0

The device :04:00.0 corresponds to one of the r8169s.

The reason I believe r8169 is at fault is that I was doing a rebuild of my 
RAID5 across 3 SATA drives via libata's ahci driver, and transferring over the 
network. When the choke occurred the RAID sync stopped, libata errors were 
seen, and I simply did a ifconfig br0 down (which contained the r8169) and 
the messages went away. Bringing the NIC up again would see some initial 
functionality then very rapidly it would go back to the same error messages.

The Intel chipset I am using does not support any kind of hardware IOMMU, so I 
am forced to use swiotlb in a 4GB RAM configuration. In an attempt to delay 
the failures, I used the swiotlb option to increase the swiotlb's page 
allocation with swiotlb=65536 (which seems to correspond to a 256MB bounce 
buffer).

Assuming both libata and r8169 use the swiotlb, and both systems are impaired 
when these messages appear, removing r8169 would appear to be key. Indeed, if 
there is no significant libata activity, the problem still occurs on the NIC 
within approximately the same amount of transfer.

This option delays the failure for some time but it will happen eventually, 
which makes me suspicious that maybe the driver is somehow pinning an area of 
the buffer and not releasing it. (I hunted bugzilla for reports similar to 
this one, but couldn't find anything.)

Having tested the r8169 driver on an AMD system I did not experience the same 
problems with 4GB RAM, so this could be a bug specific to swiotlb. I would 
have added more people to CC but I have no idea who might be responsible.

Andrew, I've added you just in case you're aware of other similar reports 
(maybe r8169 on big iron) and have anybody from the sw-iommu camp that could 
be added to CC.

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Francois Romieu
Alistair John Strachan [EMAIL PROTECTED] :
[...]
 The choke affects other devices on the system too, notably libata, which 
 does not recover gracefully. In my logs, I see a stream of:
 
 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
 DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0

You are using jumbo frames, aren't you ?

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Alan Cox
 when these messages appear, removing r8169 would appear to be key. Indeed, if 
 there is no significant libata activity, the problem still occurs on the NIC 
 within approximately the same amount of transfer.

You seem to have a leak, which actually isn't suprising

rtl8169_xmit_frags allocates a set of maps for a fragmented packet

rtl8169_start_xmit allocates a buffer

When we finish the transit we free the main buffer (always using skb-len
when sometimes its skb-headlne. We don't seem to free the fragment
buffers at all.

Looks like the unmap path for fragmented packets is broken with any kind
of iommu

Alan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Herbert Xu
On Sat, Nov 24, 2007 at 12:11:08PM -0800, Stephen Hemminger wrote:

 Then what about hardware that can't dma ethernet to non-aligned address.
 Sky2 hardware breaks if DMA is not 8 byte aligned.  IMHO the IP stack should
 handle any alignment, and do the appropriate memove if the CPU requires 
 alignment.

Luckily all sky2 users have been on x86 so far :)

Here's an idea.  Put the data of the packet into the page frags
where alignment is not an issue but copy the header so that it
is aligned.

Would that work?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Francois Romieu
Alan Cox [EMAIL PROTECTED] :
[...]
 You seem to have a leak, which actually isn't suprising
 
   rtl8169_xmit_frags allocates a set of maps for a fragmented packet
 
   rtl8169_start_xmit allocates a buffer
 
 When we finish the transit we free the main buffer (always using skb-len
 when sometimes its skb-headlne. We don't seem to free the fragment
 buffers at all.
 Looks like the unmap path for fragmented packets is broken with any kind
 of iommu

Are you referring to the pci_unmap part ?

There is a 1:1 correspondance between a Tx descriptor entry and
{an unfragmented skb or a fragment of a skb}. Afaiks rtl8169_unmap_tx_skb()
is issued for each Tx descriptor entry, be it after a Tx completion irq or
a general Tx ring cleanup.

I'll read it again after some sleep but the leak does not seem clear to me.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Francois Romieu
Francois Romieu [EMAIL PROTECTED] :
 Alistair John Strachan [EMAIL PROTECTED] :
 [...]
  The choke affects other devices on the system too, notably libata, which 
  does not recover gracefully. In my logs, I see a stream of:
  
  DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
  DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
 
 You are using jumbo frames, aren't you ?

See below for my late night crap. At least it should avoid the driver
issuing Rx/Tx DMA with the single static buffer of lib/swiotlb.c
(io_tlb_overflow_buffer). Ghee.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 1f647b9..72a7370 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2262,10 +2262,16 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct 
pci_dev *pdev,
mapping = pci_map_single(pdev, skb-data, rx_buf_sz,
 PCI_DMA_FROMDEVICE);
 
+   if (pci_dma_mapping_error(mapping))
+   goto err_kfree_skb;
+
rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
 out:
return skb;
 
+err_kfree_skb:
+   dev_kfree_skb(skb);
+   skb = NULL;
 err_out:
rtl8169_make_unusable_by_asic(desc);
goto out;
@@ -2486,6 +2492,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, 
struct sk_buff *skb,
dma_addr_t mapping;
u32 status, len;
void *addr;
+   int rc;
 
entry = (entry + 1) % NUM_TX_DESC;
 
@@ -2493,6 +2500,22 @@ static int rtl8169_xmit_frags(struct rtl8169_private 
*tp, struct sk_buff *skb,
len = frag-size;
addr = ((void *) page_address(frag-page)) + frag-page_offset;
mapping = pci_map_single(tp-pci_dev, addr, len, 
PCI_DMA_TODEVICE);
+   rc = pci_dma_mapping_error(mapping);
+   if (unlikely(rc  0)) {
+   while (cur_frag--  0) {
+   frag = info-frags + cur_frag;
+   entry = (entry - 1) % NUM_TX_DESC;
+   txd = tp-TxDescArray + entry;
+   len = frag-size;
+   mapping = le64_to_cpu(txd-addr);
+   pci_unmap_single(tp-pci_dev, mapping, len,
+PCI_DMA_TODEVICE);
+   txd-opts1 = 0x00;
+   txd-opts2 = 0x00;
+   txd-addr = 0x00;
+   }
+   return rc;
+   }
 
/* anti gcc 2.95.3 bugware (sic) */
status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
@@ -2534,13 +2557,13 @@ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, 
struct net_device *dev)
 static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct rtl8169_private *tp = netdev_priv(dev);
-   unsigned int frags, entry = tp-cur_tx % NUM_TX_DESC;
+   unsigned int entry = tp-cur_tx % NUM_TX_DESC;
struct TxDesc *txd = tp-TxDescArray + entry;
void __iomem *ioaddr = tp-mmio_addr;
dma_addr_t mapping;
u32 status, len;
u32 opts1;
-   int ret = NETDEV_TX_OK;
+   int frags, ret = NETDEV_TX_OK;
 
if (unlikely(TX_BUFFS_AVAIL(tp)  skb_shinfo(skb)-nr_frags)) {
if (netif_msg_drv(tp)) {
@@ -2557,7 +2580,11 @@ static int rtl8169_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
opts1 = DescOwn | rtl8169_tso_csum(skb, dev);
 
frags = rtl8169_xmit_frags(tp, skb, opts1);
-   if (frags) {
+   if (frags  0) {
+   printk(KERN_ERR %s: PCI mapping failure (%d).\n, dev-name,
+  frags);
+   goto err_busy;
+   } else if (frags  0) {
len = skb_headlen(skb);
opts1 |= FirstFrag;
} else {
@@ -2605,6 +2632,7 @@ out:
 
 err_stop:
netif_stop_queue(dev);
+err_busy:
ret = NETDEV_TX_BUSY;
 err_update_stats:
dev-stats.tx_dropped++;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wireless vs. alignment requirements

2007-11-24 Thread Herbert Xu
On Sat, Nov 24, 2007 at 10:13:19PM +0100, Johannes Berg wrote:

 Eight bytes really sucks for wireless, many things are multiples of four
 and QoS vs. non-QoS frames have a multiple of four and common hardware
 only adds two padding bytes to get it aligned on four bytes so there's
 no easy way to get hardware to align it properly. Hmm.

Sorry I was wrong about the 8 bytes requirement.  Although the
IPv6 protocol does try to maintain an 8-byte alignment the Linux
stack never does anything that requires that.

So 4 bytes is enough.

However, the wireless core is definitely not out of the woods.
It needs to support variable hardware header lengths that are
not always a multiple of 4.

So here's my suggestion.  Modify the wireless core to fix up any
packets which aren't aligned correctly.  That should make it
work albeit in a way that's less than optimal.

Then for each driver where you care about this performance
(seriously I wouldn't for the speeds these things run at :),
pick the most common wireless hardware header length and have
the IP (or any other upper-level protocol) header aligned to
at least 4 bytes.  Or better if you know what hardware header
length that you're going to get (e.g., based on what mode you're
in) then do the skb_reserve accordingly.

It's a good thing these things aren't running at 10Gb :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Alistair John Strachan
On Sunday 25 November 2007 00:25:10 Francois Romieu wrote:
 Alistair John Strachan [EMAIL PROTECTED] :
 [...]

  The choke affects other devices on the system too, notably libata,
  which does not recover gracefully. In my logs, I see a stream of:
 
  DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
  DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0

 You are using jumbo frames, aren't you ?

Yes, 7200 byte frames. I'll certainly try out your patch and report back.

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space

2007-11-24 Thread Alistair John Strachan
On Sunday 25 November 2007 01:27:54 Francois Romieu wrote:
 Francois Romieu [EMAIL PROTECTED] :
  Alistair John Strachan [EMAIL PROTECTED] :
  [...]
 
   The choke affects other devices on the system too, notably libata,
   which does not recover gracefully. In my logs, I see a stream of:
  
   DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
   DMA: Out of SW-IOMMU space for 7222 bytes at device :04:00.0
 
  You are using jumbo frames, aren't you ?

 See below for my late night crap. At least it should avoid the driver
 issuing Rx/Tx DMA with the single static buffer of lib/swiotlb.c
 (io_tlb_overflow_buffer). Ghee.

No improvement. It might be possible to reproduce the problem on your end if 
you add iommu support and force enable the swiotlb (which should be possible 
even with 4GB RAM).

 diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
 index 1f647b9..72a7370 100644
 --- a/drivers/net/r8169.c
 +++ b/drivers/net/r8169.c
 @@ -2262,10 +2262,16 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct
 pci_dev *pdev, mapping = pci_map_single(pdev, skb-data, rx_buf_sz,
PCI_DMA_FROMDEVICE);

 + if (pci_dma_mapping_error(mapping))
 + goto err_kfree_skb;
 +
   rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
  out:
   return skb;

 +err_kfree_skb:
 + dev_kfree_skb(skb);
 + skb = NULL;
  err_out:
   rtl8169_make_unusable_by_asic(desc);
   goto out;
 @@ -2486,6 +2492,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private
 *tp, struct sk_buff *skb, dma_addr_t mapping;
   u32 status, len;
   void *addr;
 + int rc;

   entry = (entry + 1) % NUM_TX_DESC;

 @@ -2493,6 +2500,22 @@ static int rtl8169_xmit_frags(struct rtl8169_private
 *tp, struct sk_buff *skb, len = frag-size;
   addr = ((void *) page_address(frag-page)) + frag-page_offset;
   mapping = pci_map_single(tp-pci_dev, addr, len, 
 PCI_DMA_TODEVICE);
 + rc = pci_dma_mapping_error(mapping);
 + if (unlikely(rc  0)) {
 + while (cur_frag--  0) {
 + frag = info-frags + cur_frag;
 + entry = (entry - 1) % NUM_TX_DESC;
 + txd = tp-TxDescArray + entry;
 + len = frag-size;
 + mapping = le64_to_cpu(txd-addr);
 + pci_unmap_single(tp-pci_dev, mapping, len,
 +  PCI_DMA_TODEVICE);
 + txd-opts1 = 0x00;
 + txd-opts2 = 0x00;
 + txd-addr = 0x00;
 + }
 + return rc;
 + }

   /* anti gcc 2.95.3 bugware (sic) */
   status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
 @@ -2534,13 +2557,13 @@ static inline u32 rtl8169_tso_csum(struct sk_buff
 *skb, struct net_device *dev) static int rtl8169_start_xmit(struct sk_buff
 *skb, struct net_device *dev) {
   struct rtl8169_private *tp = netdev_priv(dev);
 - unsigned int frags, entry = tp-cur_tx % NUM_TX_DESC;
 + unsigned int entry = tp-cur_tx % NUM_TX_DESC;
   struct TxDesc *txd = tp-TxDescArray + entry;
   void __iomem *ioaddr = tp-mmio_addr;
   dma_addr_t mapping;
   u32 status, len;
   u32 opts1;
 - int ret = NETDEV_TX_OK;
 + int frags, ret = NETDEV_TX_OK;

   if (unlikely(TX_BUFFS_AVAIL(tp)  skb_shinfo(skb)-nr_frags)) {
   if (netif_msg_drv(tp)) {
 @@ -2557,7 +2580,11 @@ static int rtl8169_start_xmit(struct sk_buff *skb,
 struct net_device *dev) opts1 = DescOwn | rtl8169_tso_csum(skb, dev);

   frags = rtl8169_xmit_frags(tp, skb, opts1);
 - if (frags) {
 + if (frags  0) {
 + printk(KERN_ERR %s: PCI mapping failure (%d).\n, dev-name,
 +frags);
 + goto err_busy;
 + } else if (frags  0) {
   len = skb_headlen(skb);
   opts1 |= FirstFrag;
   } else {
 @@ -2605,6 +2632,7 @@ out:

  err_stop:
   netif_stop_queue(dev);
 +err_busy:
   ret = NETDEV_TX_BUSY;
  err_update_stats:
   dev-stats.tx_dropped++;

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2: eth0: hung mac 7:69 fifo 0 (165:176)

2007-11-24 Thread Elvis Pranskevichus
Paul Collins wrote:

 Hi Stephen,
 
 Running amd64 kernel built from 2ffbb8377c7a0713baf6644e285adc27a5654582
 after about three days of uptime, this morning I found the network dead
 and the following in dmesg:
 
   sky2 eth0: hung mac 7:69 fifo 0 (165:176)
   sky2 eth0: receiver hang detected
   sky2 eth0: disabling interface
   NETDEV WATCHDOG: eth0: transmit timed out
   sky2 eth0: tx timeout
   sky2 eth0: transmit ring 26 .. 26 report=26 done=26
   NETDEV WATCHDOG: eth0: transmit timed out
   sky2 eth0: tx timeout
   sky2 eth0: transmit ring 26 .. 26 report=26 done=26
 
 The watchdog had been blorping for about three hours when I discovered
 it and rebooted the machine.
 

Hello,

I have exactly the same problem with my 88E8053 on 2.6.24-rc3 here. While
there have always been issues with sky2 on that particular board, now the
situation is worse than ever. Netdev watchdog goes into an endless loop
reporting timeouts and the whole machine goes down to the point that I'm
forced to reset (not even SysRq works).

Here's the snippet from the log:

sky2 eth0: hung mac 123:3 fifo 194 (150:144)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 178 .. 188 report=178 done=178
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 178 .. 188 report=178 done=178
NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 178 .. 188 report=178 done=178
NETDEV WATCHDOG: eth0: transmit timed out

The board is identical to Paul's.

While mac hangs were common in 2.6.23 and earlier, it was possible to
recover the interface (either automatically, or by manual rmmod/modprobe). 
I can't reliably reproduce the issue, but it consistently comes up a couple
of times a day during high network load.

Any hints, patches are highly appreciated. 

Thanks,

--
Elvis

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forcedeth ethernet driver Low power state

2007-11-24 Thread Andrew Morton
On Sun, 25 Nov 2007 03:52:33 +0100 Jeroen [EMAIL PROTECTED] wrote:

 Hi,
 
 I'm migrating my server from windows 2003 server to Ubuntu, but I am
 stumbling over the Low Power State Link Speed option for my NIC
 (forcedeth)
 
 I need to disable this option in my windows driver otherwise the trough pout 
 is
 horrible because the link fluctuates constantly from 100/1000.
 
 Anyway, my question is where and how can I turn off this feature for the
 forcedeth driver? I've looked in the source and as far as I can tell there is 
 no
 bootoption for this. There are some references noted in the code, but AFAIK
 no setting.
 
 Any ideas? Thanks in advance!
 

(cc's added)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html