[PATCH] bonding: fix deadlock on high loads in bond_alb_monitor()

2006-09-30 Thread Jay Vosburgh

From: Karsten Keil [EMAIL PROTECTED]

In bond_alb_monitor the bond-curr_slave_lock write lock is taken
and then dev_set_promiscuity maybe called which can take some time,
depending on the network HW. If a network IRQ for this card come in
the softirq handler maybe try to deliver more packets which end up in
a request to the read lock of bond-curr_slave_lock - deadlock.
This issue was found by a test lab during network stress tests, this patch
disable the softirq handler for this case and solved the issue.

Signed-off-by: Karsten Keil [EMAIL PROTECTED]
Acked-by: Jay Vosburgh [EMAIL PROTECTED]

---

 drivers/net/bonding/bond_alb.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

921ada40bf96d7dc9c4258821af3d4616fb3cbae
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index e83bc82..3292316 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1433,7 +1433,7 @@ void bond_alb_monitor(struct bonding *bo
 * write lock to protect from other code that also
 * sets the promiscuity.
 */
-   write_lock(bond-curr_slave_lock);
+   write_lock_bh(bond-curr_slave_lock);
 
if (bond_info-primary_is_promisc 
(++bond_info-rlb_promisc_timeout_counter = 
RLB_PROMISC_TIMEOUT)) {
@@ -1448,7 +1448,7 @@ void bond_alb_monitor(struct bonding *bo
bond_info-primary_is_promisc = 0;
}
 
-   write_unlock(bond-curr_slave_lock);
+   write_unlock_bh(bond-curr_slave_lock);
 
if (bond_info-rlb_rebalance) {
bond_info-rlb_rebalance = 0;


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPROUTE2][PATCH] Add missing macros which was removed from kernel header. (Re: [GIT PATCH] NET: Fixes for net-2.6.19)

2006-09-30 Thread Masahide NAKAMURA
 David Miller wrote:
  commit 0844565fb8a9418f5a860aa480c1aef70319c9a2
  Author: Thomas Graf [EMAIL PROTECTED]
  Date:   Fri Aug 4 23:05:56 2006 -0700
 
  [NET]: Move netlink interface bits to linux/if.h
  
  Signed-off-by: Thomas Graf [EMAIL PROTECTED]
  Signed-off-by: David S. Miller [EMAIL PROTECTED]
  
  Stephen, we just removed the troublesome bits from linux/if.h when I
  put in Yoshifuji's patches last night, it should explicitly remove
  this problem.
  
  You will thus see that linux/rtnetlink.h no longer includes
  linux/if.h, which is why your errors were completely perplexing
  to me.  Instead, it includes linux/if_link.h
  
  It's been in my tree since last night, and if you had used
  the rtnetlink.h from my current tree you wouldn't have seen
  the error.
 
 Yes, as David mentioned you need to copy the latest rtnetlink.h
 at first. It is also required to be added if_{link,addr}.h, neighbour.h
 to iproute2 tree. Some macros may be needed for libnetlink.h, too.
 I'll send the patch to you if you haven't started on it.

Stephen, this patch is for iproute2. Please check and apply it after syncing
kernel headers (e.g. rtnetlink.h) with David's tree. Please also note to add
new ones (i.e. include/linux/{if_link.h,if_addr.h,neighbour.h}) which is
split from rtnetlink.h.


[PATCH] Add missing macros which was removed from kernel header.

{IFA,IFLA,NDA,NDTA}_{RTA,PAYLOAD} macro is removed from kernel
header since net-2.6.19 because it is not used by kernel code.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 include/libnetlink.h |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 63cc3c8..9de3a0b 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -4,6 +4,9 @@ #define __LIBNETLINK_H__ 1
 #include asm/types.h
 #include linux/netlink.h
 #include linux/rtnetlink.h
+#include linux/if_link.h
+#include linux/if_addr.h
+#include linux/neighbour.h
 
 struct rtnl_handle
 {
@@ -53,5 +56,37 @@ extern int rtnl_from_file(FILE *, rtnl_f
 #define NLMSG_TAIL(nmsg) \
((struct rtattr *) (((void *) (nmsg)) + NLMSG_ALIGN((nmsg)-nlmsg_len)))
 
+#ifndef IFA_RTA
+#define IFA_RTA(r) \
+   ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifaddrmsg
+#endif
+#ifndef IFA_PAYLOAD
+#define IFA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifaddrmsg))
+#endif
+
+#ifndef IFLA_RTA
+#define IFLA_RTA(r) \
+   ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg
+#endif
+#ifndef IFLA_PAYLOAD
+#define IFLA_PAYLOAD(n)NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
+#endif
+
+#ifndef NDA_RTA
+#define NDA_RTA(r) \
+   ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg
+#endif
+#ifndef NDA_PAYLOAD
+#define NDA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ndmsg))
+#endif
+
+#ifndef NDTA_RTA
+#define NDTA_RTA(r) \
+   ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ndtmsg
+#endif
+#ifndef NDTA_PAYLOAD
+#define NDTA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ndtmsg))
+#endif
+
 #endif /* __LIBNETLINK_H__ */
 
-- 
1.4.2




-- 
Masahide NAKAMURA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is TCP over IPsec broken in 2.6.18?

2006-09-30 Thread James Morris
On Sat, 30 Sep 2006, James Morris wrote:

 SELinux enabled is changed to permissive mode.


Ok, in the case where unencrypted packets are leaking, the problem is that 
xfrm_lookup() is returning a false zero on a polmatch denial like:

  avc:  denied  { polmatch } for  scontext=system_u:system_r:ftpd_t:s0 
  tcontext=system_u:object_r:unlabeled_t:s0 tclass=association


Follow the call back up from selinux_xfrm_policy_lookup(), when:

{
rc = avc_has_perm(fl_secid, sel_sid, SECCLASS_ASSOCIATION,
  ASSOCIATION__POLMATCH,
  NULL);

return rc;    -EACCESS
}


Which is propagated back via 

   xfrm_policy_match()
   xfrm_policy_lookup_bytype()
   xfrm_policy_lookup()
   
to

int xfrm_lookup() 
{
...

if (!policy) {
/* To accelerate a bit...  */
if ((dst_orig-flags  DST_NOXFRM) ||
 !xfrm_policy_count[XFRM_POLICY_OUT])
return 0;

policy = flow_cache_lookup(fl, dst_orig-ops-family,
   dir, xfrm_policy_lookup);
}

if (!policy)
return 0;    returns

...
}

and the callers then allow the packet to proceed unencrypted.

It seems that some logic needs to be reworked to ensure that the real 
error value is propagated back and returned via xfrm_lookup().

I was also seeing these AVCs when receiving ping requests:

  avc:  denied { sendto } for scontext=system_u:object_r:unlabeled_t:s0 
  tcontext=system_u :object_r:unlabeled_t:s0 tclass=association

Not sure if there are any deeper issues in this case: the callers need to 
be audited.




- James
-- 
James Morris [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 - oops in cache_alloc_refill()

2006-09-30 Thread Valdis . Kletnieks
On Fri, 29 Sep 2006 23:31:07 EDT, [EMAIL PROTECTED] said:
 Fair enough,  I'm going to try reverting the 2 commits and see if things
 behave better.

OK, it's definitely something in those 2 commits - I reverted them and the
resulting 2.6.18-mm2 kernel has been up and stable for 4 hours, even with
the problem gkrellm updating once a second the whole time.

I'm not *seeing* how those changes can cause trouble - unless it's this:

diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c
index 1840b69..9e19a96 100644
--- a/drivers/net/wireless/orinoco.c
+++ b/drivers/net/wireless/orinoco.c
@@ -3037,7 +3037,7 @@ static int orinoco_ioctl_getessid(struct
}
 
erq-flags = 1;
-   erq-length = strlen(essidbuf) + 1;
+   erq-length = strlen(essidbuf);

Does some other code go batshit if length ==0?  My current config doesn't
try to actually ifup the wireless if I also have connectivity via copper (in
order to avoid chewing up a DHCP lease in crowded address space if not needed).

% iwconfig eth5
eth5  IEEE 802.11b  ESSID:  Nickname:HERMES I
  Mode:Managed  Frequency:2.457 GHz  Access Point: Not-Associated   
  Bit Rate:11 Mb/s   Sensitivity:1/3  
  Retry limit:4   RTS thr:off   Fragment thr:off
  Power Management:off
  Link Quality=0/92  Signal level=134/153  Noise level=134/153
  Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
  Tx excessive retries:0  Invalid misc:0   Missed beacon:0

That ESSID the source of the trouble?



pgpCHCTBdDwWg.pgp
Description: PGP signature


Re: 2.6.18-mm2 - oops in cache_alloc_refill()

2006-09-30 Thread Andrew Morton
On Sat, 30 Sep 2006 03:50:43 -0400
[EMAIL PROTECTED] wrote:

 On Fri, 29 Sep 2006 23:31:07 EDT, [EMAIL PROTECTED] said:
  Fair enough,  I'm going to try reverting the 2 commits and see if things
  behave better.
 
 OK, it's definitely something in those 2 commits - I reverted them and the
 resulting 2.6.18-mm2 kernel has been up and stable for 4 hours, even with
 the problem gkrellm updating once a second the whole time.
 
 I'm not *seeing* how those changes can cause trouble - unless it's this:
 
 diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c
 index 1840b69..9e19a96 100644
 --- a/drivers/net/wireless/orinoco.c
 +++ b/drivers/net/wireless/orinoco.c
 @@ -3037,7 +3037,7 @@ static int orinoco_ioctl_getessid(struct
 }
  
 erq-flags = 1;
 -   erq-length = strlen(essidbuf) + 1;
 +   erq-length = strlen(essidbuf);

You know what the next question is ;)

Did reverting just that line fix it?

 Does some other code go batshit if length ==0? My current config doesn't
 try to actually ifup the wireless if I also have connectivity via copper (in
 order to avoid chewing up a DHCP lease in crowded address space if not 
 needed).
 
 % iwconfig eth5
 eth5  IEEE 802.11b  ESSID:  Nickname:HERMES I
   Mode:Managed  Frequency:2.457 GHz  Access Point: Not-Associated   
   Bit Rate:11 Mb/s   Sensitivity:1/3  
   Retry limit:4   RTS thr:off   Fragment thr:off
   Power Management:off
   Link Quality=0/92  Signal level=134/153  Noise level=134/153
   Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
   Tx excessive retries:0  Invalid misc:0   Missed beacon:0
 
 That ESSID the source of the trouble?
 

Might be.  I can't immediately spot a problem with it, but perhaps
length==0 causes the driver to not allocate a buffer and to then write to
the not-allocated buffer.  Not sure..
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Makefile for linux modules

2006-09-30 Thread Sam Ravnborg
Hi Robert.

I have a makefielt to make several driver modules:
 obj-$(CONFIG_FUSION_SPI)  += mptbase.o mptscsih.o
 mptspi.o
 obj-$(CONFIG_FUSION_FC)   += mptbase.o mptscsih.o
 mptfc.o
 obj-m += mptbase.o mptscsih.o mptsas.o
 obj-$(CONFIG_FUSION_LAN)  += mptlan.o
 obj-m += mptctl.o
 obj-m   += mptcfg.o
 obj-m   +=mptstm.o

The above kbuild file snippet tells us that you are creating
a number of modules:
mptbase.ko mptscsih.ko mptsas.ko mptlan.ko mptctl.ko mtpcfg.ko and mptstm.ko
They are each build from a single .c file.

 mptbase-objs := comfunc.o

Now you try to include confunc.o in every module.
To do so you need to tell kbuild that you are dealing with a module
based on composite .o files.
That would look like:
obj-$(CONFIG_FUSION_PCI) += mptbase-foo.o
mtpbase-foo-y := comfunc.o mptbase.o

This will result in a module named mtpbase-foo.ko which is hardly what
you try to achive. Likewise you will have duplicate symbols in the
modules due to comfunc.o being included more than once.

The only sane approce here is to compile comfunc.o as an independent
module and let the modutils pull in the comfunc (deservers a more
specific name) module as needed.

So what you need to do is simply:
obj-m += comfunc.o

And accept this is a module so all symbols that you needs must be properly
exported using EXPORT_SYMBOL*

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] myri10ge Large Receive Offload

2006-09-30 Thread Evgeniy Polyakov
On Sat, Sep 30, 2006 at 12:16:44AM +0200, Brice Goglin ([EMAIL PROTECTED]) 
wrote:
 Jeff Garzik a écrit :
  Brice Goglin wrote:
  This is a complete rework of the myri10ge receive path. The first
  patch converts skb allocation to use physical pages. The second one
  adds a software implementation of Large Receive Offload. The third
  one updates the driver version to 1.1.0.
 
  The complete driver code in our CVS actually also supports high-order
  allocations instead of single physical pages since it significantly
  increase the performance. Order=2 allows us to receive standard frames
  at line rate even on low-end hardware such as an AMD Athlon(tm) 64 X2
  Dual Core Processor 3800+ (2.0GHz). Some customer might not care a lot
  about memory fragmentation if the performance is better.
 
  But, since high-order allocations are generally considered a bad idea,
  we do not include the relevant code in the following patch for inclusion
  in Linux. Here, we simply pass order=0 to all page allocation routines.
  If necessary, I could drop the remaining reference to high-order
  (especially replace alloc_pages() with alloc_page()) but I'd rather
  keep it as is.
 
  If high-order allocations are ever considered OK under some circum-
  stances, we could send an additional patch (a module parameter would
  be used to switch from single physical pages to high-order pages).
 
  As Herbert's already done, I would rather let the net core people
  comment on this.  The code implementation doesn't look scary, but we
  may want to be smarter about this in the core net stack, rather than
  implementing it inside multiple drivers.
 
 Ok, makes sense. We look forward to see this.
 
 Could we get patch #1 merged anyway (page-based skb allocation)?
 
 Any comments about what I was saying about high-order allocations above?

It is quite strnage that you see very noticeble speed degradation after
switching from higher order to 0 order allocations, could specify where 
observed  bottleneck in network stack is?

 thanks,
 Brice

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2]: Disable MSI on 5706 if AMD 8132 bridge is present

2006-09-30 Thread Brice Goglin
Michael Chan a écrit :
 AMD believes this incompatibility is unique to the 5706, and
 prefers to locally disable MSI rather than globally disabling it
 using pci_msi_quirk.
   

FYI, pci_msi_quirk is the extreme solution, there is something in the
middle :) It is possible to disable MSI for only devices that are behind
this bridge (by setting the NO_MSI flag in its bus flag). We just merged
a couple patchs related to this NO_MSI flag and pci_msi_quirk (there are
very very few cases where MSI must be disabled globally, most of the
time a subset behind a bridge is enough) so I am very glad that you
didn't use it :)
Anyway, disabling locally is better here.

 + if (CHIP_NUM(bp) == CHIP_NUM_5706  disable_msi == 0) {
 + struct pci_dev *amd_8132 = NULL;
 +
 + while ((amd_8132 = pci_get_device(PCI_VENDOR_ID_AMD,
 +   PCI_DEVICE_ID_AMD_8132_BRIDGE,
 +   amd_8132))) {
   

What if the machine has such a bridge and board, but the board is not
actually located somewhere behind the bridge? I would rather walk the
PCI hierarchy from the board to the top and check whether we find a
AMD8132. Probably something like:

struct pci_dev * bridge = the bnx2 pci_dev;
while (bridge-bus  bridge-bus-self)
bridge = bridge-bus-self;
if (bridge-vendor == PCI_VENDOR_ID_AMD
 bridge-device == PCI_VENDOR_ID_AMD_8132_BRIDGE)
do your stuff to disable MSI on your board


Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.

2006-09-30 Thread Arnd Bergmann
Am Saturday 30 September 2006 01:17 schrieb Linas Vepstas:
 Implement basic low-watermark support for the transmit queue.
 Hardware low-watermarks allow a properly configured kernel
 to continously stream data to a device and not have to handle
 any interrupts at all in doing so. Correct zero-interrupt
 operation can be actually observed for this driver, when the
 socket buffer is made large enough.

Acked-by: Arnd Bergmann [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6]: powerpc/cell spidernet ethtool -i version number

2006-09-30 Thread Arnd Bergmann
Am Saturday 30 September 2006 01:26 schrieb Linas Vepstas:
 This patch moves transmit queue cleanup code out of the
 interrupt context, and into the NAPI polling routine.

 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
 Cc: James K Lewis [EMAIL PROTECTED]
 Cc: Arnd Bergmann [EMAIL PROTECTED]

The subject of this mail should be spidernet: use NAPI poll for
TX interrupts, otherwise

Acked-by: Arnd Bergmann [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6]: powerpc/cell spidernet stop error printing patch.

2006-09-30 Thread Arnd Bergmann
Am Saturday 30 September 2006 01:19 schrieb Linas Vepstas:
 Turn off mis-interpretation of the queue-empty interrupt
 status bit as an error. This bit is set as a part of
 the previous low-watermark patch.

 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
 Signed-off-by: James K Lewis [EMAIL PROTECTED]
 Cc: Arnd Bergmann [EMAIL PROTECTED]

Acked-by: Arnd Bergmann [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info.

2006-09-30 Thread Arnd Bergmann
Am Saturday 30 September 2006 01:21 schrieb Linas Vepstas:
 This patch adds version information as reported by
 ethtool -i to the Spidernet driver.

 From: James K Lewis [EMAIL PROTECTED]
 Signed-off-by: James K Lewis [EMAIL PROTECTED]
 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
 Cc: Arnd Bergmann [EMAIL PROTECTED]

Same comment as last time this was submitted:

 @@ -2303,6 +2304,8 @@ static struct pci_driver spider_net_driv
   */
  static int __init spider_net_init(void)
  {
 +   printk(spidernet Version %s.\n,VERSION);
 +
 if (rx_descriptors  SPIDER_NET_RX_DESCRIPTORS_MIN) {
 rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_MIN;
 pr_info(adjusting rx descriptors to %i.\n,

This is missing a printk level and a space character. More importantly,
you should not print the presence of the driver to the syslog, but
the presence of a device driven by it.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6]: powerpc/cell spidernet burst alignment patch.

2006-09-30 Thread Arnd Bergmann
Am Saturday 30 September 2006 01:15 schrieb Linas Vepstas:
 This patch increases the Burst Address alignment from 64 to 1024 in the
 Spidernet driver. This improves transmit performance for large packets.

 From: James K Lewis [EMAIL PROTECTED]
 Signed-off-by: James K Lewis [EMAIL PROTECTED]
 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
 Cc: Arnd Bergmann [EMAIL PROTECTED]

Acked-by: Arnd Bergmann [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6]: powerpc/cell spidernet ethernet patches

2006-09-30 Thread Arnd Bergmann
Am Saturday 30 September 2006 01:05 schrieb Linas Vepstas:
 Although these patches have not been baking in
 any -mm tree, they have been tested and are
 generally available as a part of the Cell SDK 2.0
 overseen by Arnd Bergmann. (Arnd, if you want
 to lend a voice of authority here, or to correct
 me, please do so...)

 The following sequence of six patches implement a
 series of changes to the transmit side of the
 spidernet ethernet device driver, significantly
 improving performance for large packets.

 This series of patches is almost identical to
 those previously mailed on 18-20 August, with one
 critical change: NAPI polling is used instead of
 homegrown polling.

 Although these patches improve things, I am not
 satisfied with how this driver behaves, and so
 plan to do additional work next week.


I'm not sure if I have missed a patch in here, but I
don't see anything reintroducing the 'netif_stop_queue'
that is missing from the transmit path.

Do you have a extra patch for that?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcp-lp: prevent chance for oops

2006-09-30 Thread Wong Edison

oic i learn it and will change it in coming on version

On 9/30/06, David Miller [EMAIL PROTECTED] wrote:

From: Wong Edison [EMAIL PROTECTED]
Date: Sat, 30 Sep 2006 03:27:00 +0800

 I do this since i have a sourceforge homepage for it. I update the
 CVS version there, test, and then submit to netdev. As I sync code
 in both side, this $Id$ is keeped.  May you just let it be? I think
 it will let my maintenance become more simple...

You could not even generate a clean patch to me, it just
gets in the way and causes more work for me and anyone
else who tries to use your patches.

If you want to use versioning, use MODULE_VERSION() just like
everyone else in the kernel does.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.18] 8139too: force media setting cleanup

2006-09-30 Thread Bernard Lee
From: Bernard Lee [EMAIL PROTECTED]

Setting bit 4  5 alone in 8139too module media option does not really
force 100Mbps full-duplex mode. When media option bit 0-3 is cleared,
8139too module does not force media setting. Therefore, bit 0-3 requires
to be set for bit 4  5 to take effect. The hidden bit 0-3 setting is
not stated in module description.

It can be fixed by changing rtl8139_private structure default_port
bitfield from 4-bit to 6-bit.

Besides, module media bit 9 is a duplicate of bit 4 (full-duplex). It
is suggested that bit 9 is freed. A remark is added to module
description that bit 0 can be used to force setting. It helps to clarify
10Mbps half-duplex mode.

Signed-off-by: Bernard Lee [EMAIL PROTECTED]
---

 8139too.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- linux-2.6.18/drivers/net/8139too.c  2006-09-20 11:42:06.0 +0800
+++ linux-2.6.18-fixed/drivers/net/8139too.c2006-09-30 16:30:32.0 
+0800
@@ -588,3 +588,3 @@ struct rtl8139_private {
unsigned int watchdog_fired : 1;
-   unsigned int default_port : 4;  /* Last dev-if_port value. */
+   unsigned int default_port : 6;  /* Last dev-if_port value. */
unsigned int have_thread : 1;
@@ -614,3 +614,3 @@ MODULE_PARM_DESC (debug, 8139too bitmap
 MODULE_PARM_DESC (multicast_filter_limit, 8139too maximum number of filtered 
multicast addresses);
-MODULE_PARM_DESC (media, 8139too: Bits 4+9: force full duplex, bit 5: 
100Mbps);
+MODULE_PARM_DESC (media, 8139too: bit 0: force setting, bit 4: full duplex, 
bit 5: 100Mbps);
 MODULE_PARM_DESC (full_duplex, 8139too: Force full duplex for board(s) (1));
@@ -1071,4 +1071,4 @@ static int __devinit rtl8139_init_one (s
if (option  0) {
-   tp-mii.full_duplex = (option  0x210) ? 1 : 0;
-   tp-default_port = option  0xFF;
+   tp-mii.full_duplex = (option  0x10) ? 1 : 0;
+   tp-default_port = option  0x3F;
if (tp-default_port)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is TCP over IPsec broken in 2.6.18?

2006-09-30 Thread Evgeniy Polyakov
On Sat, Sep 30, 2006 at 01:14:27AM -0400, James Morris ([EMAIL PROTECTED]) 
wrote:
 On Sat, 30 Sep 2006, James Morris wrote:
 
  I've just seen something similar and can recreate it with static keying 
  via setkey.
 
 It's SELinux related.  Things work when the one system in this setup with 
 SELinux enabled is changed to permissive mode.
 
 No audit messages or AVCs, and it's not the /selinux/compat_net setting.

I need to cofirm that broken system in my setup does have selinux enabled 
with enforcing mode.
I've changed it to permissive mode and it fixed setup (I do not see any 
warnings in dmesg).
 
 - James
 -- 
 James Morris
 [EMAIL PROTECTED]

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


/proc/net/tcp6 missing entries

2006-09-30 Thread James Cloos
Has anything changed recently that would affect /ptoc/net/tcp6?

None of the outgoing tcp/ip6 sockets show up there on my laptop.

I'm currently at Linus' tree as of f164c42161d21368d9cd4d6d6efc158baa2618db
with the then-upstream branch of libata (since merged into Linus' tree) and
Ingo's 18-rt3 patch.

I'd prefer not to waste time bisecting if it is already know and fixed

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 0xED7DAEA6
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] myri10ge Large Receive Offload

2006-09-30 Thread Brice Goglin
Brice Goglin a écrit :
 Could we get patch #1 merged anyway (page-based skb allocation)?
   

Oops, actually, one hunk has to be dropped (it reverts const'ification
of ethtool_ops by mistake). If it is ok to merge the patch apart from
this hunk,  will resend an updated patch.

Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is TCP over IPsec broken in 2.6.18?

2006-09-30 Thread James Morris
On Sat, 30 Sep 2006, Evgeniy Polyakov wrote:

 I need to cofirm that broken system in my setup does have selinux enabled 
 with enforcing mode.
 I've changed it to permissive mode and it fixed setup (I do not see any 
 warnings in dmesg).

Something better in your case would likely be to rebuild the kernel with 
CONFIG_SECURITY_NETWORK_XFRM=n until it's fixed.


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is TCP over IPsec broken in 2.6.18?

2006-09-30 Thread Evgeniy Polyakov
On Sat, Sep 30, 2006 at 10:36:29AM -0400, James Morris ([EMAIL PROTECTED]) 
wrote:
 On Sat, 30 Sep 2006, Evgeniy Polyakov wrote:
 
  I need to cofirm that broken system in my setup does have selinux enabled 
  with enforcing mode.
  I've changed it to permissive mode and it fixed setup (I do not see any 
  warnings in dmesg).
 
 Something better in your case would likely be to rebuild the kernel with 
 CONFIG_SECURITY_NETWORK_XFRM=n until it's fixed.

Well, it is acrypto test machine and I do not care about security there,
so I can even disable selinux completely, but it will not help to resolve
the issue, right?

So if you have some patches I'm more than happy to test them.
 
 - James
 -- 
 James Morris
 [EMAIL PROTECTED]

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is TCP over IPsec broken in 2.6.18?

2006-09-30 Thread Evgeniy Polyakov
On Sat, Sep 30, 2006 at 06:40:18PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
 On Sat, Sep 30, 2006 at 10:36:29AM -0400, James Morris ([EMAIL PROTECTED]) 
 wrote:
  On Sat, 30 Sep 2006, Evgeniy Polyakov wrote:
  
   I need to cofirm that broken system in my setup does have selinux enabled 
   with enforcing mode.
   I've changed it to permissive mode and it fixed setup (I do not see any 
   warnings in dmesg).
  
  Something better in your case would likely be to rebuild the kernel with 
  CONFIG_SECURITY_NETWORK_XFRM=n until it's fixed.
 
 Well, it is acrypto test machine and I do not care about security there,
 so I can even disable selinux completely, but it will not help to resolve
 the issue, right?
 
 So if you have some patches I'm more than happy to test them.

And to confirm theory about CONFIG_SECURITY_NETWORK_XFRM I'm compiling
kernel without it (and then without selinux at all).

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is TCP over IPsec broken in 2.6.18?

2006-09-30 Thread James Morris
On Sat, 30 Sep 2006, Evgeniy Polyakov wrote:

 On Sat, Sep 30, 2006 at 10:36:29AM -0400, James Morris ([EMAIL PROTECTED]) 
 wrote:
  On Sat, 30 Sep 2006, Evgeniy Polyakov wrote:
  
   I need to cofirm that broken system in my setup does have selinux enabled 
   with enforcing mode.
   I've changed it to permissive mode and it fixed setup (I do not see any 
   warnings in dmesg).
  
  Something better in your case would likely be to rebuild the kernel with 
  CONFIG_SECURITY_NETWORK_XFRM=n until it's fixed.
 
 Well, it is acrypto test machine and I do not care about security there,
 so I can even disable selinux completely, but it will not help to resolve
 the issue, right?

Yes, it is a workaround.

 
 So if you have some patches I'm more than happy to test them.

Ok, coming soon.


-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NET_SCHED]: Remove old estimator implementation

2006-09-30 Thread Patrick McHardy
[NET_SCHED]: Remove old estimator implementation

Remove unused file, estimators live in net/core/gen_estimator.c now.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 45cb5c100bbb42077eaab1ad6da7976bbe627603
tree e701f213eb170a3ceafbb17c807461b9b821f827
parent bf603625660b1742004bf86432ce3c210d14d4fd
author Patrick McHardy [EMAIL PROTECTED] Sat, 30 Sep 2006 18:32:27 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 30 Sep 2006 18:32:27 +0200

 net/sched/estimator.c |  196 -
 1 files changed, 0 insertions(+), 196 deletions(-)

diff --git a/net/sched/estimator.c b/net/sched/estimator.c
deleted file mode 100644
index 0ebc98e..000
--- a/net/sched/estimator.c
+++ /dev/null
@@ -1,196 +0,0 @@
-/*
- * net/sched/estimator.c	Simple rate estimator.
- *
- *		This program is free software; you can redistribute it and/or
- *		modify it under the terms of the GNU General Public License
- *		as published by the Free Software Foundation; either version
- *		2 of the License, or (at your option) any later version.
- *
- * Authors:	Alexey Kuznetsov, [EMAIL PROTECTED]
- */
-
-#include asm/uaccess.h
-#include asm/system.h
-#include linux/bitops.h
-#include linux/module.h
-#include linux/types.h
-#include linux/kernel.h
-#include linux/jiffies.h
-#include linux/string.h
-#include linux/mm.h
-#include linux/socket.h
-#include linux/sockios.h
-#include linux/in.h
-#include linux/errno.h
-#include linux/interrupt.h
-#include linux/netdevice.h
-#include linux/skbuff.h
-#include linux/rtnetlink.h
-#include linux/init.h
-#include net/sock.h
-#include net/pkt_sched.h
-
-/*
-   This code is NOT intended to be used for statistics collection,
-   its purpose is to provide a base for statistical multiplexing
-   for controlled load service.
-   If you need only statistics, run a user level daemon which
-   periodically reads byte counters.
-
-   Unfortunately, rate estimation is not a very easy task.
-   F.e. I did not find a simple way to estimate the current peak rate
-   and even failed to formulate the problem 8)8)
-
-   So I preferred not to built an estimator into the scheduler,
-   but run this task separately.
-   Ideally, it should be kernel thread(s), but for now it runs
-   from timers, which puts apparent top bounds on the number of rated
-   flows, has minimal overhead on small, but is enough
-   to handle controlled load service, sets of aggregates.
-
-   We measure rate over A=(1interval) seconds and evaluate EWMA:
-
-   avrate = avrate*(1-W) + rate*W
-
-   where W is chosen as negative power of 2: W = 2^(-ewma_log)
-
-   The resulting time constant is:
-
-   T = A/(-ln(1-W))
-
-
-   NOTES.
-
-   * The stored value for avbps is scaled by 2^5, so that maximal
- rate is ~1Gbit, avpps is scaled by 2^10.
-
-   * Minimal interval is HZ/4=250msec (it is the greatest common divisor
- for HZ=100 and HZ=1024 8)), maximal interval
- is (HZ*2^EST_MAX_INTERVAL)/4 = 8sec. Shorter intervals
- are too expensive, longer ones can be implemented
- at user level painlessly.
- */
-
-#define EST_MAX_INTERVAL	5
-
-struct qdisc_estimator
-{
-	struct qdisc_estimator	*next;
-	struct tc_stats		*stats;
-	spinlock_t		*stats_lock;
-	unsigned		interval;
-	int			ewma_log;
-	u64			last_bytes;
-	u32			last_packets;
-	u32			avpps;
-	u32			avbps;
-};
-
-struct qdisc_estimator_head
-{
-	struct timer_list	timer;
-	struct qdisc_estimator	*list;
-};
-
-static struct qdisc_estimator_head elist[EST_MAX_INTERVAL+1];
-
-/* Estimator array lock */
-static DEFINE_RWLOCK(est_lock);
-
-static void est_timer(unsigned long arg)
-{
-	int idx = (int)arg;
-	struct qdisc_estimator *e;
-
-	read_lock(est_lock);
-	for (e = elist[idx].list; e; e = e-next) {
-		struct tc_stats *st = e-stats;
-		u64 nbytes;
-		u32 npackets;
-		u32 rate;
-
-		spin_lock(e-stats_lock);
-		nbytes = st-bytes;
-		npackets = st-packets;
-		rate = (nbytes - e-last_bytes)(7 - idx);
-		e-last_bytes = nbytes;
-		e-avbps += ((long)rate - (long)e-avbps)  e-ewma_log;
-		st-bps = (e-avbps+0xF)5;
-
-		rate = (npackets - e-last_packets)(12 - idx);
-		e-last_packets = npackets;
-		e-avpps += ((long)rate - (long)e-avpps)  e-ewma_log;
-		e-stats-pps = (e-avpps+0x1FF)10;
-		spin_unlock(e-stats_lock);
-	}
-
-	mod_timer(elist[idx].timer, jiffies + ((HZidx)/4));
-	read_unlock(est_lock);
-}
-
-int qdisc_new_estimator(struct tc_stats *stats, spinlock_t *stats_lock, struct rtattr *opt)
-{
-	struct qdisc_estimator *est;
-	struct tc_estimator *parm = RTA_DATA(opt);
-
-	if (RTA_PAYLOAD(opt)  sizeof(*parm))
-		return -EINVAL;
-
-	if (parm-interval  -2 || parm-interval  3)
-		return -EINVAL;
-
-	est = kzalloc(sizeof(*est), GFP_KERNEL);
-	if (est == NULL)
-		return -ENOBUFS;
-
-	est-interval = parm-interval + 2;
-	est-stats = stats;
-	est-stats_lock = stats_lock;
-	est-ewma_log = parm-ewma_log;
-	est-last_bytes = stats-bytes;
-	est-avbps = stats-bps5;
-	est-last_packets = stats-packets;
-	est-avpps = stats-pps10;
-
-	est-next = 

Re: [PATCH][BNX2]: Disable MSI on 5706 if AMD 8132 bridge is present

2006-09-30 Thread Michael Chan
On Sat, 2006-09-30 at 12:13 +0200, Brice Goglin wrote:

 What if the machine has such a bridge and board, but the board is not
 actually located somewhere behind the bridge? I would rather walk the
 PCI hierarchy from the board to the top and check whether we find a
 AMD8132. Probably something like:
 

I have considered that.  I have PCI walking code like in tg3 to
determine if certain tg3 devices are behind some ICH or ServerWorks EPB
bridges.  The workaround needed in those cases have big impact on
performance and therefore it is important to determine exactly when to
apply those workarounds.

Here in this case, since the difference in performance between MSI and
INTA is very minor and almost negligible in a lot of cases, I decided to
keep it simple and just check for the presence of the 8132.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert [NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODE

2006-09-30 Thread Ismail Donmez
Hi,

With commit 10fd48f2376db52f08bf0420d2c4f580e39269e1 [1] ,  RB_EMPTY_NODE 
changed behaviour so it returns false when the node is empty as expected. 
Hence Herbert's fix for sched_htb.c should be reverted.

[1] 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=10fd48f2376db52f08bf0420d2c4f580e39269e1;hp=9817064b68fef7e4580c6df1ea597e106b9ff88b

Regards,
ismail

-- 
They that can give up essential liberty to obtain a little temporary safety 
deserve neither liberty nor safety.
-- Benjamin Franklin
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 6c058e3..1f1360e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -391,7 +391,7 @@ static inline void htb_add_class_to_row(
 /* If this triggers, it is a bug in this code, but it need not be fatal */
 static void htb_safe_rb_erase(struct rb_node *rb, struct rb_root *root)
 {
-	if (!RB_EMPTY_NODE(rb)) {
+	if (RB_EMPTY_NODE(rb)) {
 		WARN_ON(1);
 	} else {
 		rb_erase(rb, root);


Re: [PATCH] Revert [NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODE

2006-09-30 Thread Ismail Donmez
On Saturday 30 September 2006 22:23, you wrote:
 Hi,

 With commit 10fd48f2376db52f08bf0420d2c4f580e39269e1 [1] ,  RB_EMPTY_NODE
 changed behaviour so it returns false when the node is empty as expected.
   ^
make it : so it returns true when the node is empty as expected

/ismail
-- 
They that can give up essential liberty to obtain a little temporary safety 
deserve neither liberty nor safety.
-- Benjamin Franklin
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2

2006-09-30 Thread Andrew Morton
On Sat, 30 Sep 2006 15:37:06 +0200
Tobias Diedrich [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
 
  - More updates to the MSI code.  If your machine has Message Signalled
Interrupts, please enable it and give it a try.
 
 I'm happy to report, that with 2.6.18-mm2 suspend to disk works for
 me without additional patches, tested both with MSI interrupts
 disabled and enabled (forcedeth driver).

Thanks.

Which kernel version(s) didn't work?  -mm1?  Mainline?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG 2.6.18] unaligned access in ipvv6_rcv, nf_ip6_checksum, tcp_error, __ipv6_addr_type, fib6_lookup_1

2006-09-30 Thread Harald Welte
Hi!

I've just built 2.6.18 on a sparc64 box (Ultra 5) using gcc-3.3.5
(debian woody).  After booting the kernel, I get tons of unaligned
access messages related to various bits of the IPv6 code:

Kernel unaligned access at TPC[10022cf0] ipv6_rcv+0xb8/0x320 [ipv6]
Kernel unaligned access at TPC[10023800] __ipv6_addr_type+0x8/0x140 [ipv6]
Kernel unaligned access at TPC[1002fd64] fib6_lookup_1+0x2c/0x120 [ipv6]
Kernel unaligned access at TPC[10093878] tcp_error+0x40/0x2c0 [nf_conntrack]
Kernel unaligned access at TPC[1004ce54] nf_ip6_checksum+0x13c/0x1c0 [ipv6]
Kernel unaligned access at TPC[1004ce58] nf_ip6_checksum+0x140/0x1c0 [ipv6]
Kernel unaligned access at TPC[1004ce60] nf_ip6_checksum+0x148/0x1c0 [ipv6]

As I'm not really familiar at all with sparc assembly, the usual look
at objectdump, try to see what it does, look at corresponding source
code strategy doesn't work for me in this case.

There could be 7 different bugs, but I think it's more likely that some
common data structure is misaligned and thus causes unaligned accesses
all over the place.

I've put the ipv6.ko and nf_conntrack.ko modules online at
http://people.netfilter.org/laforge/tmp/2618_bug/

This could be a known issue, but I couldn't find any reference to it on
netdev, sparclinux or via google. 

Any assistance is appreciated, thanks!

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

We all know Linux is great...it does infinite loops in 5 seconds. -- Linus


pgpmlwbGFRFtJ.pgp
Description: PGP signature


Re: [BUG 2.6.18] unaligned access in ipvv6_rcv, nf_ip6_checksum, tcp_error, __ipv6_addr_type, fib6_lookup_1

2006-09-30 Thread David Miller
From: Harald Welte [EMAIL PROTECTED]
Date: Sat, 30 Sep 2006 22:20:40 +0200

 Kernel unaligned access at TPC[10022cf0] ipv6_rcv+0xb8/0x320 [ipv6]
 Kernel unaligned access at TPC[10023800] __ipv6_addr_type+0x8/0x140 [ipv6]
 Kernel unaligned access at TPC[1002fd64] fib6_lookup_1+0x2c/0x120 [ipv6]
 Kernel unaligned access at TPC[10093878] tcp_error+0x40/0x2c0 [nf_conntrack]
 Kernel unaligned access at TPC[1004ce54] nf_ip6_checksum+0x13c/0x1c0 [ipv6]
 Kernel unaligned access at TPC[1004ce58] nf_ip6_checksum+0x140/0x1c0 [ipv6]
 Kernel unaligned access at TPC[1004ce60] nf_ip6_checksum+0x148/0x1c0 [ipv6]

Thanks Harald, I'll take a look when I get back from Seattle
sunday night.  For the time being you can simply comment
out the printk statement in arch/sparc64/kernel/unaligned.c
if this stuff is filling up your logs.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] myri10ge Large Receive Offload

2006-09-30 Thread Brice Goglin
Evgeniy Polyakov a écrit :
 On Sat, Sep 30, 2006 at 12:16:44AM +0200, Brice Goglin ([EMAIL PROTECTED]) 
 wrote:
   
 Jeff Garzik a écrit :
 
 Brice Goglin wrote:
   
 The complete driver code in our CVS actually also supports high-order
 allocations instead of single physical pages since it significantly
 increase the performance. Order=2 allows us to receive standard frames
 at line rate even on low-end hardware such as an AMD Athlon(tm) 64 X2
 Dual Core Processor 3800+ (2.0GHz). Some customer might not care a lot
 about memory fragmentation if the performance is better.

 But, since high-order allocations are generally considered a bad idea,
 we do not include the relevant code in the following patch for inclusion
 in Linux. Here, we simply pass order=0 to all page allocation routines.
 If necessary, I could drop the remaining reference to high-order
 (especially replace alloc_pages() with alloc_page()) but I'd rather
 keep it as is.

 If high-order allocations are ever considered OK under some circum-
 stances, we could send an additional patch (a module parameter would
 be used to switch from single physical pages to high-order pages).
 
 Any comments about what I was saying about high-order allocations above?
 

 It is quite strnage that you see very noticeble speed degradation after
 switching from higher order to 0 order allocations, could specify where 
 observed  bottleneck in network stack is?
   

The bottleneck is not in the network stack, it is simply related to the
number of page allocations that are required. Since we store multiple
fragments in a same page, if MTU=1500, we need one 0-order allocation
every 2 fragments, while we need one 2-order allocation every 8
fragments. IIRC, we observed about 20% higher throughput on the receive
side when switching from order=0 to order=2 (7.5Gbit/s - 9.3Gbit/s with
roughly the same CPU usage).

Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert [NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODE

2006-09-30 Thread Herbert Xu
On Sat, Sep 30, 2006 at 10:23:46PM +0300, Ismail Donmez wrote:
 
 With commit 10fd48f2376db52f08bf0420d2c4f580e39269e1 [1] ,  RB_EMPTY_NODE 
 changed behaviour so it returns false when the node is empty as expected. 
 Hence Herbert's fix for sched_htb.c should be reverted.

I've fixed sched_htb.c? That's news to me :)

I fully agree with your patch though.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] bluetooth: use GFP_ATOMIC in *_sock_create's sk_alloc

2006-09-30 Thread akpm
From: Frederik Deweerdt [EMAIL PROTECTED]

I think that the bluetooth-guard-bt_proto-with-rwlock.patch introduced the 
following
BUG:
[   43.232000] BUG: sleeping function called from invalid context at 
mm/slab.c:2903
[   43.232000] in_atomic():1, irqs_disabled():0
[   43.232000]  [c0104114] show_trace_log_lvl+0x197/0x1ba
[   43.232000]  [c010415e] show_trace+0x27/0x29
[   43.232000]  [c010426e] dump_stack+0x26/0x28
[   43.232000]  [c011ad1c] __might_sleep+0xa2/0xaa
[   43.232000]  [c0173085] __kmalloc+0x9c/0xb3
[   43.232000]  [c02f9295] sk_alloc+0x1bc/0x1de
[   43.232000]  [c036d689] hci_sock_create+0x42/0x8a
[   43.236000]  [c0366f40] bt_sock_create+0xb5/0x154
[   43.236000]  [c02f69dc] __sock_create+0x131/0x356
[   43.236000]  [c02f6c2f] sock_create+0x2e/0x30
[   43.236000]  [c02f6c88] sys_socket+0x27/0x53
[   43.24]  [c02f7db5] sys_socketcall+0xa9/0x277
[   43.24]  [c0103131] sysenter_past_esp+0x56/0x8d
[   43.24]  [b7f38410] 0xb7f38410

This patch makes sk_alloc GFP_ATOMIC, because we are holding the 
bt_proto_rwlock, for
the following functions:
- bnep_sock_create
- cmtp_sock_create
- hci_sock_create
- hidp_sock_create
- l2cap_sock_create
- rfcomm_sock_create
- sco_sock_create

Signed-off-by: Frederik Deweerdt [EMAIL PROTECTED]
Cc: Masatake YAMATO [EMAIL PROTECTED]
Cc: Marcel Holtmann [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 net/bluetooth/bnep/sock.c   |2 +-
 net/bluetooth/cmtp/sock.c   |2 +-
 net/bluetooth/hci_sock.c|2 +-
 net/bluetooth/hidp/sock.c   |2 +-
 net/bluetooth/l2cap.c   |2 +-
 net/bluetooth/rfcomm/sock.c |2 +-
 net/bluetooth/sco.c |2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff -puN 
net/bluetooth/bnep/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 
net/bluetooth/bnep/sock.c
--- 
a/net/bluetooth/bnep/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc
+++ a/net/bluetooth/bnep/sock.c
@@ -181,7 +181,7 @@ static int bnep_sock_create(struct socke
if (sock-type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
 
-   sk = sk_alloc(PF_BLUETOOTH, GFP_KERNEL, bnep_proto, 1);
+   sk = sk_alloc(PF_BLUETOOTH, GFP_ATOMIC, bnep_proto, 1);
if (!sk)
return -ENOMEM;
 
diff -puN 
net/bluetooth/cmtp/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 
net/bluetooth/cmtp/sock.c
--- 
a/net/bluetooth/cmtp/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc
+++ a/net/bluetooth/cmtp/sock.c
@@ -172,7 +172,7 @@ static int cmtp_sock_create(struct socke
if (sock-type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
 
-   sk = sk_alloc(PF_BLUETOOTH, GFP_KERNEL, cmtp_proto, 1);
+   sk = sk_alloc(PF_BLUETOOTH, GFP_ATOMIC, cmtp_proto, 1);
if (!sk)
return -ENOMEM;
 
diff -puN 
net/bluetooth/hci_sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 
net/bluetooth/hci_sock.c
--- 
a/net/bluetooth/hci_sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc
+++ a/net/bluetooth/hci_sock.c
@@ -618,7 +618,7 @@ static int hci_sock_create(struct socket
 
sock-ops = hci_sock_ops;
 
-   sk = sk_alloc(PF_BLUETOOTH, GFP_KERNEL, hci_sk_proto, 1);
+   sk = sk_alloc(PF_BLUETOOTH, GFP_ATOMIC, hci_sk_proto, 1);
if (!sk)
return -ENOMEM;
 
diff -puN 
net/bluetooth/hidp/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 
net/bluetooth/hidp/sock.c
--- 
a/net/bluetooth/hidp/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc
+++ a/net/bluetooth/hidp/sock.c
@@ -178,7 +178,7 @@ static int hidp_sock_create(struct socke
if (sock-type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
 
-   sk = sk_alloc(PF_BLUETOOTH, GFP_KERNEL, hidp_proto, 1);
+   sk = sk_alloc(PF_BLUETOOTH, GFP_ATOMIC, hidp_proto, 1);
if (!sk)
return -ENOMEM;
 
diff -puN 
net/bluetooth/l2cap.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 
net/bluetooth/l2cap.c
--- a/net/bluetooth/l2cap.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc
+++ a/net/bluetooth/l2cap.c
@@ -559,7 +559,7 @@ static int l2cap_sock_create(struct sock
 
sock-ops = l2cap_sock_ops;
 
-   sk = l2cap_sock_alloc(sock, protocol, GFP_KERNEL);
+   sk = l2cap_sock_alloc(sock, protocol, GFP_ATOMIC);
if (!sk)
return -ENOMEM;
 
diff -puN 
net/bluetooth/rfcomm/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 
net/bluetooth/rfcomm/sock.c
--- 
a/net/bluetooth/rfcomm/sock.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc
+++ a/net/bluetooth/rfcomm/sock.c
@@ -336,7 +336,7 @@ static int rfcomm_sock_create(struct soc
 
sock-ops = rfcomm_sock_ops;
 
-   if (!(sk = rfcomm_sock_alloc(sock, protocol, GFP_KERNEL)))
+   if (!(sk = rfcomm_sock_alloc(sock, protocol, GFP_ATOMIC)))
return -ENOMEM;
 
rfcomm_sock_init(sk, NULL);
diff -puN 
net/bluetooth/sco.c~bluetooth-use-gfp_atomic-in-_sock_creates-sk_alloc 

[patch 1/2] bluetooth: guard bt_proto with rwlock

2006-09-30 Thread akpm
From: Masatake YAMATO [EMAIL PROTECTED]

I found that bt_proto manipulated in bt_sock_register is not guarded
from race condition.

Look at net/bluetooth/af_bluetooth.c:

static struct net_proto_family *bt_proto[BT_MAX_PROTO];

int bt_sock_register(int proto, struct net_proto_family *ops)
{
if (proto  0 || proto = BT_MAX_PROTO)
return -EINVAL;

if (bt_proto[proto])
return -EEXIST;

bt_proto[proto] = ops;
return 0;
}

Here bt_proto[proto] is set.

In other hand,

static int bt_sock_create(struct socket *sock, int proto)
{
int err = 0;

if (proto  0 || proto = BT_MAX_PROTO)
return -EINVAL;

#if defined(CONFIG_KMOD)
if (!bt_proto[proto]) {
request_module(bt-proto-%d, proto);
}
#endif
err = -EPROTONOSUPPORT;
if (bt_proto[proto]  try_module_get(bt_proto[proto]-owner)) {
err = bt_proto[proto]-create(sock, proto);
module_put(bt_proto[proto]-owner);
}
return err;
}

bt_proto[proto] is referred.

So I wrote a patch which guards bt_proto with rwlock.

Signed-off-by: Masatake YAMATO [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 net/bluetooth/af_bluetooth.c |   33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff -puN net/bluetooth/af_bluetooth.c~bluetooth-guard-bt_proto-with-rwlock 
net/bluetooth/af_bluetooth.c
--- a/net/bluetooth/af_bluetooth.c~bluetooth-guard-bt_proto-with-rwlock
+++ a/net/bluetooth/af_bluetooth.c
@@ -26,6 +26,7 @@
 
 #include linux/module.h
 
+#include linux/spinlock.h
 #include linux/types.h
 #include linux/list.h
 #include linux/errno.h
@@ -53,30 +54,44 @@
 /* Bluetooth sockets */
 #define BT_MAX_PROTO   8
 static struct net_proto_family *bt_proto[BT_MAX_PROTO];
+static DEFINE_RWLOCK(bt_proto_rwlock);
 
 int bt_sock_register(int proto, struct net_proto_family *ops)
 {
+   int err;
+
if (proto  0 || proto = BT_MAX_PROTO)
return -EINVAL;
 
-   if (bt_proto[proto])
-   return -EEXIST;
+   err = -EEXIST;
 
-   bt_proto[proto] = ops;
-   return 0;
+   write_lock(bt_proto_rwlock);
+   if (bt_proto[proto] == NULL) {
+   err = 0;
+   bt_proto[proto] = ops;
+   }
+   write_unlock(bt_proto_rwlock);
+
+   return err;
 }
 EXPORT_SYMBOL(bt_sock_register);
 
 int bt_sock_unregister(int proto)
 {
+   int err;
+
if (proto  0 || proto = BT_MAX_PROTO)
return -EINVAL;
 
-   if (!bt_proto[proto])
-   return -ENOENT;
+   err = -ENOENT;
+   write_lock(bt_proto_rwlock);
+   if (bt_proto[proto]) {
+   err = 0;
+   bt_proto[proto] = NULL;
+   }
+   write_unlock(bt_proto_rwlock);
 
-   bt_proto[proto] = NULL;
-   return 0;
+   return err;
 }
 EXPORT_SYMBOL(bt_sock_unregister);
 
@@ -93,10 +108,12 @@ static int bt_sock_create(struct socket 
}
 #endif
err = -EPROTONOSUPPORT;
+   read_lock(bt_proto_rwlock);
if (bt_proto[proto]  try_module_get(bt_proto[proto]-owner)) {
err = bt_proto[proto]-create(sock, proto);
module_put(bt_proto[proto]-owner);
}
+   read_unlock(bt_proto_rwlock);
return err; 
 }
 
_
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netdev-2.6.git frozen

2006-09-30 Thread Jeff Garzik
Similar to David's recent announcement, netdev-2.6.git is now closed to 
new features.


Though really, [my fault] I should have posted this as soon as the merge 
window opened.  The stuff that goes into each new release, when the 
merge window opens, should be stuff that has already been through a 
round of testing in -mm.


The updates presented in netdev -- e100, e1000, ixgb and sky2 -- will go 
upstream tomorrow.  Everything else will be fixed.


And the message for the future is:  don't wait until the merge window 
opens, to submit your test.  Doing so means it doesn't get the normal 
amount of testing and review, and is thus discouraged.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netdev-2.6.git frozen

2006-09-30 Thread Jeff Garzik

Jeff Garzik wrote:
Similar to David's recent announcement, netdev-2.6.git is now closed to 
new features.


Though really, [my fault] I should have posted this as soon as the merge 
window opened.  The stuff that goes into each new release, when the 
merge window opens, should be stuff that has already been through a 
round of testing in -mm.


The updates presented in netdev -- e100, e1000, ixgb and sky2 -- will go 
upstream tomorrow.  Everything else will be fixed.


s/fixed/fixes/

I wish everything was fixed, after tomorrow :)


And the message for the future is:  don't wait until the merge window 
opens, to submit your test.  Doing so means it doesn't get the normal 


s/test/code/

Apparently I need my morning Pepsi.



amount of testing and review, and is thus discouraged.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html