Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-12 Thread Bill Fink
On Fri, 07 Sep 2007, jamal wrote:

 On Fri, 2007-07-09 at 10:31 +0100, James Chapman wrote:
  Not really. I used 3-year-old, single CPU x86 boxes with e100 
  interfaces. 
  The idle poll change keeps them in polled mode. Without idle 
  poll, I get twice as many interrupts as packets, one for txdone and one 
  for rx. NAPI is continuously scheduled in/out.
 
 Certainly faster than the machine in the paper (which was about 2 years
 old in 2005).
 I could never get ping -f to do that for me - so things must be getting
 worse with newer machines then.
 
  No. Since I did a flood ping from the machine under test, the improved 
  latency meant that the ping response was handled more quickly, causing 
  the next packet to be sent sooner. So more packets were transmitted in 
  the allotted time (10 seconds).
 
 ok.
 
  With current NAPI:
  rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 
  1.611/1.421 ms
  
  With idle poll changes:
  rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 
  1.175/1.236 ms
 
 Not bad in terms of latency. The deviation certainly looks better.
 
  But the CPU has done more work. 
 
 I am going to be the devil's advocate[1]:

So let me be the angel's advocate.  :-)

 If the problem i am trying to solve is reduce cpu use at lower rate,
 then this is not the right answer because your cpu use has gone up.
 Your latency numbers have not improved that much (looking at the avg)
 and your throughput is not that much higher. Will i be willing to pay
 more cpu (of an already piggish cpu use by NAPI at that rate with 2
 interupts per packet)?

I view his results much more favorably.  With current NAPI, the average
RTT is 104% higher than the minimum, the deviation is 4.659 ms, and the
maximum RTT is 101.727 ms.  With his patch, the average RTT is only 24%
higher than the minimum, the deviation is only 0.689 ms, and the maximum
RTT is 28.371 ms.  The average RTT improved by 39%, the deviation was
6.8 times smaller, and the maximum RTT was 3.6 times smaller.  So in
every respect the latency was significantly better.

The throughput increased from 6200 packets to 8510 packets or an increase
of 37%.  The only negative is that the CPU utilization increased from
62% to 100% or an increase of 61%, so the CPU increase was greater than
the increase in the amount of work performed (17.6% greater than what
one would expect purely from the increased amount of work).

You can't always improve on all metrics of a workload.  Sometimes there
are tradeoffs to be made to be decided by the user based on what's most
important to that user and his specific workload.  And the suggested
ethtool option (defaulting to current behavior) would enable the user
to make that decision.

-Bill

P.S.  I agree that some tests run in parallel with some CPU hogs also
  running might be beneficial and enlightening.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add IP1000A Driver

2007-09-12 Thread Stephen Hemminger
On Wed, 12 Sep 2007 13:35:43 +0800
黃建興-Jesse [EMAIL PROTECTED] wrote:

 
  -Original Message-
  From: Stephen Hemminger [mailto:[EMAIL PROTECTED] 
  Sent: Tuesday, September 11, 2007 10:42 PM
  To: Jesse Huang
  Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; netdev@vger.kernel.org;
 [EMAIL PROTECTED]
  Subject: Re: [PATCH] Add IP1000A Driver
 
 
  Who will be listed as maintainer of this device?
  A good way to show that is to add an entry to MAINTAINERS file.
 
 
 Ok, Should I generate a patch to modify MAINTAINERS file?

Yes, can be included with patch or separate, it doesn't matter.

 
  + * Current Maintainer:
  + *
  + *   Sorbica Shieh.
  + *   10F, No.47, Lane 2, Kwang-Fu RD.
  + *   Sec. 2, Hsin-Chu, Taiwan, R.O.C.
  + *   http://www.icplus.com.tw
  + *   [EMAIL PROTECTED]
  + */
 
  Names only, no physical addresses please.
 
 Should I remove those two lins?
 10F, No.47, Lane 2, Kwang-Fu RD.
 Sec. 2, Hsin-Chu, Taiwan, R.O.C.

It is your option, but many times people and companies move locations
and this gets out of date.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] dgrs: remove from build, config, and maintainer list

2007-09-12 Thread Nathanael Nerode
From: Nathanael Nerode

Stop building and configuring driver for Digi RightSwitch, which was 
never actually sold to anyone, and remove it from MAINTAINERS.

In response to an investigation into the firmware of the Digi Rightswitch 
driver, Andres Salomon discovered:

 Dear Andres:

 After further research, we found that this product was killed in place
 and never reached the market.  We would like to request that this not be
 included.  

Since the product never reached market, clearly nobody is using this orphaned 
driver.

Signed-off-by: Nathanael Nerode [EMAIL PROTECTED]

---

This is patch 1 of 2 for removing the Digi Rightswitch (dgrs).

Patch 2 would be the patch to remove the actual files.  However, that would
be around 400K, which doesn't seem suitable for a mailing list -- and this 
length seems quite unnecessary, given that it would consist solely of full-file 
deletions.  I'm not quite sure what to do about this.  Please advise.

These are the files to be deleted:
./Documentation/networking/dgrs.txt
./drivers/net/dgrs.c
./drivers/net/dgrs.h
./drivers/net/dgrs_asstruct.h
./drivers/net/dgrs_bcomm.h
./drivers/net/dgrs_es4h.h
./drivers/net/dgrs_ether.h
./drivers/net/dgrs_firmware.c (this is the very large one)
./drivers/net/dgrs_i82596.h
./drivers/net/dgrs_plx9060.h

diff -upr linux-2.6.22.6/drivers/net/Kconfig 
linux-2.6-deleted/drivers/net/Kconfig
--- linux-2.6.22.6/drivers/net/Kconfig  2007-08-31 02:21:01.0 -0400
+++ linux-2.6-deleted/drivers/net/Kconfig   2007-09-12 03:28:11.0 
-0400
@@ -1447,21 +1447,6 @@ config TC35815
depends on NET_PCI  PCI  MIPS
select MII
 
-config DGRS
-   tristate Digi Intl. RightSwitch SE-X support
-   depends on NET_PCI  (PCI || EISA)
-   ---help---
- This is support for the Digi International RightSwitch series of
- PCI/EISA Ethernet switch cards. These include the SE-4 and the SE-6
- models.  If you have a network card of this type, say Y and read the
- Ethernet-HOWTO, available from
- http://www.tldp.org/docs.html#howto.  More specific
- information is contained in file:Documentation/networking/dgrs.txt.
-
- To compile this driver as a module, choose M here and read
- file:Documentation/networking/net-modules.txt.  The module
- will be called dgrs.
-
 config EEPRO100
tristate EtherExpressPro/100 support (eepro100, original Becker 
driver)
depends on NET_PCI  PCI
diff -upr linux-2.6.22.6/drivers/net/Makefile 
linux-2.6-deleted/drivers/net/Makefile
--- linux-2.6.22.6/drivers/net/Makefile 2007-08-31 02:21:01.0 -0400
+++ linux-2.6-deleted/drivers/net/Makefile  2007-09-12 03:28:31.0 
-0400
@@ -38,7 +38,6 @@ obj-$(CONFIG_CASSINI) += cassini.o
 obj-$(CONFIG_MACE) += mace.o
 obj-$(CONFIG_BMAC) += bmac.o
 
-obj-$(CONFIG_DGRS) += dgrs.o
 obj-$(CONFIG_VORTEX) += 3c59x.o
 obj-$(CONFIG_TYPHOON) += typhoon.o
 obj-$(CONFIG_NE2K_PCI) += ne2k-pci.o 8390.o
diff -upr linux-2.6.22.6/MAINTAINERS linux-2.6-deleted/MAINTAINERS
--- linux-2.6.22.6/MAINTAINERS  2007-08-31 02:21:01.0 -0400
+++ linux-2.6-deleted/MAINTAINERS   2007-09-12 03:27:26.0 -0400
@@ -1234,12 +1234,6 @@ L:   [EMAIL PROTECTED]
 W: http://www.digi.com
 S: Orphaned
 
-DIGI RIGHTSWITCH NETWORK DRIVER
-P: Rick Richardson
-L: netdev@vger.kernel.org
-W: http://www.digi.com
-S: Orphaned
-
 DIRECTORY NOTIFICATION
 P: Stephen Rothwell
 M: [EMAIL PROTECTED]

-- 
Nathanael Nerode  [EMAIL PROTECTED]

[Insert famous quote here]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Clean up owner field in sock_lock_t

2007-09-12 Thread David Miller
From: John Heffner [EMAIL PROTECTED]
Date: Tue, 11 Sep 2007 14:01:31 -0400

 I don't know why the owner field is a (struct sock_iocb *).  I'm assuming
 it's historical.  Can someone check this out?  Did I miss some alternate
 usage?

AIO used it somehow in net/socket.c and I believe there was some
intention to access this sock_iocb deeper in the call chain.

None of that materialized of course :)

 These patches are against net-2.6.24.

Thanks a lot, I'll add these patches.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: xfrm audit calls

2007-09-12 Thread David Miller
From: Joy Latten [EMAIL PROTECTED]
Date: Tue, 11 Sep 2007 19:03:14 -0500

 This patch modifies the current ipsec audit layer
 by breaking it up into purpose driven audit calls.
 
 So far, the only audit calls made are when add/delete
 an SA/policy. It had been discussed to give each 
 key manager it's own calls to do this, but I found
 there to be much redundnacy since they did the exact 
 same things, except for how they got auid and sid, so I 
 combined them. The below audit calls can be made by any 
 key manager. Hopefully, this is ok.
 
 I compiled and tested with CONFIG_AUDITSYSCALLS on and off.
 
 Signed-off-by: Joy Latten [EMAIL PROTECTED]

Patch applied, thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Tue, 11 Sep 2007 14:56:13 +0200

 When the periodic IP route cache flush is done (every 600 seconds on 
 default configuration), some hosts suffer a lot and eventually trigger
 the soft lockup message.
 
 dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
 eventually freeing some (less than 1%) of them, while holding the 
 dst_lock spinlock for the whole scan.
 
 Then it rearms a timer to redo the full thing 1/10 s later...
 The slowdown can last one minute or so, depending on how active are
 the tcp sessions.
 
 This second version of the patch converts the processing from a softirq
 based one to a workqueue.
 
 Even if the list of entries in garbage_list is huge, host is still
 responsive to softirqs and can make progress.
 
 Instead of reseting gc timer to 0.1 second if one entry was freed in a
 gc run, we do this if more than 10% of entries were freed.

I like this patch a lot, some minor fix is needed though:

 + __builtin_prefetch(next-next, 1, 0);

Please use prefetch() instead of a direct explicit
call to a gcc-specific routine :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:09:36 -0600

 
 This slightly improves code safetly and clarity.
 
 Later network namespace patches touch this code so this is a
 preliminary cleanup.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/16] net: Don't implement dev_ifname32 inline

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:13:04 -0600

 
 The current implementation of dev_ifname makes maintenance difficult
 because updates to the implementation of the ioctl have to made in two
 places.  So this patch updates dev_ifname32 to do a classic 32/64
 structure conversion and call sys_ioctl like the rest of the
 compat calls do.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/16] net: Basic network namespace infrastructure.

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:15:34 -0600

 
 This is the basic infrastructure needed to support network
 namespaces.  This infrastructure is:
 - Registration functions to support initializing per network
   namespace data when a network namespaces is created or destroyed.
 
 - struct net.  The network namespace data structure.
   This structure will grow as variables are made per network
   namespace but this is the minimal starting point.
 
 - Functions to grab a reference to the network namespace.
   I provide both get/put functions that keep a network namespace
   from being freed.  And hold/release functions serve as weak references
   and will warn if their count is not zero when the data structure
   is freed.  Useful for dealing with more complicated data structures
   like the ipv4 route cache.
 
 - A list of all of the network namespaces so we can iterate over them.
 
 - A slab for the network namespace data structure allowing leaks
   to be spotted.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

I realize there are some discussions about naming and fixing
some races, but I applied this anyways so we can make some
forward progress.

We can make name changes and fixes on top of this initial work.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/16] net: Add a network namespace parameter to tasks

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:17:03 -0600

 
 This is the network namespace from which all which all sockets
 and anything else under user control ultimately get their network
 namespace parameters.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/16] net: Add a network namespace tag to struct net_device

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:18:12 -0600

 
 Please note that network devices do not increase the count
 count on the network namespace.  The are inside the network
 namespace and so the network namespace tag is in the nature
 of a back pointer and so getting and putting the network namespace
 is unnecessary.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/16] net: Add a network namespace parameter to struct sock

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:21:37 -0600

 
 Sockets need to get a reference to their network namespace,
 or possibly a simple hold if someone registers on the network
 namespace notifier and will free the sockets when the namespace
 is going to be destroyed.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.

2007-09-12 Thread Steve Wise

RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.

Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all this.
Without doing full ND, rdma address resolution fails in the presence of
dropped arp bcast packets.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/core/addr.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index c5c33d3..5381c80 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -161,8 +161,7 @@ static void addr_send_arp(struct sockadd
if (ip_route_output_key(rt, fl))
return;
 
-   arp_send(ARPOP_REQUEST, ETH_P_ARP, rt-rt_gateway, rt-idev-dev,
-rt-rt_src, NULL, rt-idev-dev-dev_addr, NULL);
+   neigh_event_send(rt-u.dst.neighbour, NULL);
ip_rt_put(rt);
 }
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Christoph Hellwig
This looks nice in general, getting things out of softirq context is
always good.

On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote:
  #if RT_CACHE_DEBUG = 2
  static atomic_t   dst_total = ATOMIC_INIT(0);
  #endif
 -static unsigned long dst_gc_timer_expires;
 -static unsigned long dst_gc_timer_inc = DST_GC_MAX;
 -static void dst_run_gc(unsigned long);
 +static struct {
 + spinlock_t  lock;
 + struct dst_entry*list;
 + unsigned long   timer_inc;
 + unsigned long   timer_expires;
 +} dst_garbage = {
 + .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock),
 + .timer_inc = DST_GC_MAX,
 +};

Can you please et rid of this useless struct?  It just complicates
the code and means we can't use the proper DEFINE_SPINLOCK initializer.

 +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);

This should be static.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] net: Make /proc/net per network namespace

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:20:36 -0600

 
 This patch makes /proc/net per network namespace.  It modifies the global
 variables proc_net and proc_net_stat to be per network namespace.
 The proc_net file helpers are modified to take a network namespace argument,
 and all of their callers are fixed to pass init_net for that argument.
 This ensures that all of the /proc/net files are only visible and
 usable in the initial network namespace until the code behind them
 has been updated to be handle multiple network namespaces.
 
 Making /proc/net per namespace is necessary as at least some files
 in /proc/net depend upon the set of network devices which is per
 network namespace, and even more files in /proc/net have contents
 that are relevant to a single network namespace.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Patch applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/16] net: Make socket creation namespace safe.

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:23:01 -0600

 
 This patch passes in the namespace a new socket should be created in
 and has the socket code do the appropriate reference counting.  By
 virtue of this all socket create methods are touched.  In addition
 the socket create methods are modified so that they will fail if
 you attempt to create a socket in a non-default network namespace.
 
 Failing if we attempt to create a socket outside of the default
 network namespace ensures that as we incrementally make the network stack
 network namespace aware we will not export functionality that someone
 has not audited and made certain is network namespace safe.
 Allowing us to partially enable network namespaces before all of the
 exotic protocols are supported.
 
 Any protocol layers I have missed will fail to compile because I now
 pass an extra parameter into the socket creation code.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Patch applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] dgrs: remove from build, config, and maintainer list

2007-09-12 Thread maximilian attems
On Wed, 12 Sep 2007, Nathanael Nerode wrote:

 From: Nathanael Nerode
 
 Stop building and configuring driver for Digi RightSwitch, which was 
 never actually sold to anyone, and remove it from MAINTAINERS.
 
 In response to an investigation into the firmware of the Digi Rightswitch 
 driver, Andres Salomon discovered:

search the netdev archive for this month before sending
out duplicate patches.

jgarzik was on the kernel summit, so i'm waiting on his reply
to the complete removal patch.

-- 
maks
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Eric Dumazet
On Wed, 12 Sep 2007 02:05:25 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Tue, 11 Sep 2007 14:56:13 +0200
 
  When the periodic IP route cache flush is done (every 600 seconds on 
  default configuration), some hosts suffer a lot and eventually trigger
  the soft lockup message.
  
  dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
  eventually freeing some (less than 1%) of them, while holding the 
  dst_lock spinlock for the whole scan.
  
  Then it rearms a timer to redo the full thing 1/10 s later...
  The slowdown can last one minute or so, depending on how active are
  the tcp sessions.
  
  This second version of the patch converts the processing from a softirq
  based one to a workqueue.
  
  Even if the list of entries in garbage_list is huge, host is still
  responsive to softirqs and can make progress.
  
  Instead of reseting gc timer to 0.1 second if one entry was freed in a
  gc run, we do this if more than 10% of entries were freed.
 
 I like this patch a lot, some minor fix is needed though:

Thank you

I also spoted a missing static before 
DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);
 no need to stress Adrian on this :)

 
  +   __builtin_prefetch(next-next, 1, 0);
 
 Please use prefetch() instead of a direct explicit
 call to a gcc-specific routine :-)

Unfortunatly, there is no equivalent for this one. 
This gives on my Opterons a nice prefetchnta

prefetch(addr) is more like __builtin_prefetch(addr, 0, 3)

I would like to avoid to zap L2 cache with useless data.

__builtin_prefetch() is included from gcc 3.1 (2002), so every 
platform should support it, as linux-2.6 requires gcc 3.2 at least.

I guess you are going to tell me to first publish a patch to lkml :)

Thank you

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] NET_SCHED: Rate table fixes

2007-09-12 Thread Jesper Dangaard Brouer

This set of patches, aim at fixing an issue with the rate table used
by the rate based schedulers.

Currently we use the lower-boundry value, which result in
under-estimating the actual bandwidth usage.  The patches will change
this to use the upper-boundry L2T (length to time) value.

The patches include both changes to the kernel and iproute2 userspace
utility. The kernel changes, only adds flexibility to allow userspace
to do the rate table alignment. The patches has been splitup in
cleanup and actual functional change patches.

The patches also moves the overhead calculation (currently only used
by HTB) into the kernel, which makes it more precise (as it won't miss-align
the contents of the rate table).

This should raise some questions,
 1. How does the current/old rate table mapping look like.
 2. How does new aligned rate table mapping look like.
 3. What happens when only the TC util is changed and used on a old kernel.

Lets look at how the layout of the rate tables looks like:

Illustrating the rate table array:
  Legend description
rtab[x]   : Array index x of rtab[x]
xmit_sz   : Transmit size contained in rtab[x] (normally transmit time)
maps[a-b] : Packet sizes from a to b, will map into rtab[x]

(1) Current/old rate table mapping (cell_log:3):
  rtab[0]:=xmit_sz:0  maps[0-7]
  rtab[1]:=xmit_sz:8  maps[8-15]
  rtab[2]:=xmit_sz:16 maps[16-23]
  rtab[3]:=xmit_sz:24 maps[24-31]
  rtab[4]:=xmit_sz:32 maps[32-39]
  rtab[5]:=xmit_sz:40 maps[40-47]
  rtab[6]:=xmit_sz:48 maps[48-55]

The above illustrates that we are using the lower-boundry transmit
size (xmit_sz).

(2) New iproute rate table mapping, with kernel cell_align support.
  rtab[0]:=xmit_sz:8  maps[0-8]
  rtab[1]:=xmit_sz:16 maps[9-16]
  rtab[2]:=xmit_sz:24 maps[17-24]
  rtab[3]:=xmit_sz:32 maps[25-32]
  rtab[4]:=xmit_sz:40 maps[33-40]
  rtab[5]:=xmit_sz:48 maps[41-48]
  rtab[6]:=xmit_sz:56 maps[49-56]

The above illustrates that we are using the upper-boundry transmit
size (xmit_sz), when mapping packets sizes.

The interesting question is what about compatibility.  If a old
iproute utility is used on a new kernel, we simply get the old rate
table (lower-bound) alignment. The interesting case is what happens
with a new iproute util on a old kernel. The table below, shows that
what happens is that we use the upper-bound+1byte.  I believe that
this is a good and acceptable solution.

(3) New TC util on a kernel WITHOUT support for cell_align
  rtab[0]:=xmit_sz:8 maps[0-7]
  rtab[1]:=xmit_sz:16 maps[8-15]
  rtab[2]:=xmit_sz:24 maps[16-23]
  rtab[3]:=xmit_sz:32 maps[24-31]
  rtab[4]:=xmit_sz:40 maps[32-39]
  rtab[5]:=xmit_sz:48 maps[40-47]
  rtab[6]:=xmit_sz:56 maps[48-55]

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] [NET_SCHED]: Cleanup L2T macros and handle oversized packets

2007-09-12 Thread Jesper Dangaard Brouer
commit a28343c933f6cfc3df1be86e0ebe8d99fa8d5f77
Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
Date:   Wed Sep 12 10:01:00 2007 +0200

[NET_SCHED]: Cleanup L2T macros and handle oversized packets

Change L2T (length to time) macros, in all rate based schedulers, to
call a common function qdisc_l2t() that does the rate table lookup.
This function handles if the packet size lookup is larger than the
rate table, which often occurs with TSO enabled.

Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8a67f24..4ebd615 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -302,4 +302,16 @@ drop:
return NET_XMIT_DROP;
 }
 
+/* Length to Time (L2T) lookup in a qdisc_rate_table, to determine how
+   long it will take to send a packet given its size.
+ */
+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
+{
+   int slot = pktlen;
+   slot = rtab-rate.cell_log;
+   if (slot  255)
+   return (rtab-data[255]*(slot  8) + rtab-data[slot  0xFF]);
+   return rtab-data[slot];
+}
+
 #endif
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 6085be5..46deb5f 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -21,8 +21,8 @@
 #include net/act_api.h
 #include net/netlink.h
 
-#define L2T(p,L)   ((p)-tcfp_R_tab-data[(L)(p)-tcfp_R_tab-rate.cell_log])
-#define L2T_P(p,L) ((p)-tcfp_P_tab-data[(L)(p)-tcfp_P_tab-rate.cell_log])
+#define L2T(p,L)   qdisc_l2t((p)-tcfp_R_tab, L)
+#define L2T_P(p,L) qdisc_l2t((p)-tcfp_P_tab, L)
 
 #define POL_TAB_MASK 15
 static struct tcf_common *tcf_police_ht[POL_TAB_MASK + 1];
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index e38c283..aed2af2 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -175,7 +175,7 @@ struct cbq_sched_data
 };
 
 
-#define L2T(cl,len)((cl)-R_tab-data[(len)(cl)-R_tab-rate.cell_log])
+#define L2T(cl,len)qdisc_l2t((cl)-R_tab,len)
 
 
 static __inline__ unsigned cbq_hash(u32 h)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 246a2f9..5e608a6 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -132,10 +132,8 @@ struct htb_class {
 static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate,
   int size)
 {
-   int slot = size  rate-rate.cell_log;
-   if (slot  255)
-   return (rate-data[255]*(slot  8) + rate-data[slot  0xFF]);
-   return rate-data[slot];
+   long result = qdisc_l2t(rate, size);
+   return result;
 }
 
 struct htb_sched {
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 8c2639a..b0d8109 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -115,8 +115,8 @@ struct tbf_sched_data
struct qdisc_watchdog watchdog; /* Watchdog timer */
 };
 
-#define L2T(q,L)   ((q)-R_tab-data[(L)(q)-R_tab-rate.cell_log])
-#define L2T_P(q,L) ((q)-P_tab-data[(L)(q)-P_tab-rate.cell_log])
+#define L2T(q,L)   qdisc_l2t((q)-R_tab,L)
+#define L2T_P(q,L) qdisc_l2t((q)-P_tab,L)
 
 static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch)
 {

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)

2007-09-12 Thread Jesper Dangaard Brouer
commit ef065a43b8900fbc0763eac0fa0a9a8a00c8aaa2
Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
Date:   Tue Sep 11 16:17:46 2007 +0200

[IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)

 Extend the tc_ratespec struct, with two parameters: 1) cell_align
 that allow adjusting the alignment of the rate table. 2) overhead
 that allow adding a packet overhead before the lookup in the kernel.

 This is done in order to, add support to changing the rate table to
 use the upper-boundry L2T (length to time) value. Currently we use the
 lower-boundry, which result in under-estimating the actual bandwidth
 usage.

Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 268c515..919af93 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -77,8 +77,8 @@ struct tc_ratespec
 {
unsigned char   cell_log;
unsigned char   __reserved;
-   unsigned short  feature;
-   short   addend;
+   unsigned short  overhead;
+   short   cell_align;
unsigned short  mpu;
__u32   rate;
 };

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel

2007-09-12 Thread Jesper Dangaard Brouer
commit 07a74a2613440fc1a68d0faa7235ed7027532d78
Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
Date:   Tue Sep 11 16:59:58 2007 +0200

[IPROUTE2]: Overhead calculation is now done in the kernel.

The only current user is HTB. HTB overhead argument is now passed on
to the kernel (in the struct tc_ratespec). Also correct the data
types.

Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]

diff --git a/tc/q_htb.c b/tc/q_htb.c
index 53e3f78..310d36d 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -107,8 +107,9 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int 
argc, char **argv, str
__u32 rtab[256],ctab[256];
unsigned buffer=0,cbuffer=0;
int cell_log=-1,ccell_log = -1;
-   unsigned mtu, mpu;
-   unsigned char mpu8 = 0, overhead = 0;
+   unsigned mtu;
+   unsigned short mpu = 0;
+   unsigned short overhead = 0;
struct rtattr *tail;
 
memset(opt, 0, sizeof(opt)); mtu = 1600; /* eth packet len */
@@ -127,12 +128,12 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int 
argc, char **argv, str
}
} else if (matches(*argv, mpu) == 0) {
NEXT_ARG();
-   if (get_u8(mpu8, *argv, 10)) {
+   if (get_u16(mpu, *argv, 10)) {
explain1(mpu); return -1;
}
} else if (matches(*argv, overhead) == 0) {
NEXT_ARG();
-   if (get_u8(overhead, *argv, 10)) {
+   if (get_u16(overhead, *argv, 10)) {
explain1(overhead); return -1;
}
} else if (matches(*argv, quantum) == 0) {
@@ -206,9 +207,11 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int 
argc, char **argv, str
if (!buffer) buffer = opt.rate.rate / get_hz() + mtu;
if (!cbuffer) cbuffer = opt.ceil.rate / get_hz() + mtu;
 
-/* encode overhead and mpu, 8 bits each, into lower 16 bits */
-   mpu = (unsigned)mpu8 | (unsigned)overhead  8;
-   opt.ceil.mpu = mpu; opt.rate.mpu = mpu;
+   opt.ceil.overhead = overhead;
+   opt.rate.overhead = overhead;
+
+   opt.ceil.mpu = mpu;
+   opt.rate.mpu = mpu;
 
if ((cell_log = tc_calc_rtable(opt.rate.rate, rtab, cell_log, mtu, 
mpu))  0) {
fprintf(stderr, htb: failed to calculate rate table.\n);
diff --git a/tc/tc_core.c b/tc/tc_core.c
index 58155fb..1ab0ba0 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -73,8 +73,6 @@ int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, 
unsigned mtu,
   unsigned mpu)
 {
int i;
-   unsigned overhead = (mpu  8)  0xFF;
-   mpu = mpu  0xFF;
 
if (mtu == 0)
mtu = 2047;
@@ -86,8 +84,6 @@ int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, 
unsigned mtu,
}
for (i=0; i256; i++) {
unsigned sz = (icell_log);
-   if (overhead)
-   sz += overhead;
if (sz  mpu)
sz = mpu;
rtab[i] = tc_calc_xmittime(bps, sz);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] [IPROUTE2]: Cleanup: tc_calc_rtable()

2007-09-12 Thread Jesper Dangaard Brouer
commit e3bad6e344303fec9916d1420aade98a2e6c79cc
Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
Date:   Wed Sep 5 10:47:47 2007 +0200

[IPROUTE2]: Cleanup: tc_calc_rtable().

Change tc_calc_rtable() to take a tc_ratespec struct as an
argument. (cell_log still needs to be passed on as a parameter,
because -1 indicate that the cell_log needs to be computed by the
function.).

Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]

diff --git a/tc/m_police.c b/tc/m_police.c
index 5d2528b..acdfd22 100644
--- a/tc/m_police.c
+++ b/tc/m_police.c
@@ -263,22 +263,20 @@ int act_parse_police(struct action_util *a,int *argc_p, 
char ***argv_p, int tca_
}
 
if (p.rate.rate) {
-   if ((Rcell_log = tc_calc_rtable(p.rate.rate, rtab, Rcell_log, 
mtu, mpu))  0) {
+   p.rate.mpu = mpu;
+   if (tc_calc_rtable(p.rate, rtab, Rcell_log, mtu)  0) {
fprintf(stderr, TBF: failed to calculate rate 
table.\n);
return -1;
}
p.burst = tc_calc_xmittime(p.rate.rate, buffer);
-   p.rate.cell_log = Rcell_log;
-   p.rate.mpu = mpu;
}
p.mtu = mtu;
if (p.peakrate.rate) {
-   if ((Pcell_log = tc_calc_rtable(p.peakrate.rate, ptab, 
Pcell_log, mtu, mpu))  0) {
+   p.peakrate.mpu = mpu;
+   if (tc_calc_rtable(p.peakrate, ptab, Pcell_log, mtu)  0) {
fprintf(stderr, POLICE: failed to calculate peak rate 
table.\n);
return -1;
}
-   p.peakrate.cell_log = Pcell_log;
-   p.peakrate.mpu = mpu;
}
 
tail = NLMSG_TAIL(n);
diff --git a/tc/q_cbq.c b/tc/q_cbq.c
index f2b4ce8..df98312 100644
--- a/tc/q_cbq.c
+++ b/tc/q_cbq.c
@@ -137,12 +137,11 @@ static int cbq_parse_opt(struct qdisc_util *qu, int argc, 
char **argv, struct nl
if (allot  (avpkt*3)/2)
allot = (avpkt*3)/2;
 
-   if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, allot, mpu))  
0) {
+   r.mpu = mpu;
+   if (tc_calc_rtable(r, rtab, cell_log, allot)  0) {
fprintf(stderr, CBQ: failed to calculate rate table.\n);
return -1;
}
-   r.cell_log = cell_log;
-   r.mpu = mpu;
 
if (ewma_log  0)
ewma_log = TC_CBQ_DEF_EWMA;
@@ -336,12 +335,11 @@ static int cbq_parse_class_opt(struct qdisc_util *qu, int 
argc, char **argv, str
unsigned pktsize = wrr.allot;
if (wrr.allot  (lss.avpkt*3)/2)
wrr.allot = (lss.avpkt*3)/2;
-   if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, pktsize, 
mpu))  0) {
+   r.mpu = mpu;
+   if (tc_calc_rtable(r, rtab, cell_log, pktsize)  0) {
fprintf(stderr, CBQ: failed to calculate rate 
table.\n);
return -1;
}
-   r.cell_log = cell_log;
-   r.mpu = mpu;
}
if (ewma_log  0)
ewma_log = TC_CBQ_DEF_EWMA;
diff --git a/tc/q_htb.c b/tc/q_htb.c
index 310d36d..e24ad6d 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -213,19 +213,17 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int 
argc, char **argv, str
opt.ceil.mpu = mpu;
opt.rate.mpu = mpu;
 
-   if ((cell_log = tc_calc_rtable(opt.rate.rate, rtab, cell_log, mtu, 
mpu))  0) {
+   if (tc_calc_rtable(opt.rate, rtab, cell_log, mtu)  0) {
fprintf(stderr, htb: failed to calculate rate table.\n);
return -1;
}
opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer);
-   opt.rate.cell_log = cell_log;
 
-   if ((ccell_log = tc_calc_rtable(opt.ceil.rate, ctab, cell_log, mtu, 
mpu))  0) {
+   if (tc_calc_rtable(opt.ceil, ctab, ccell_log, mtu)  0) {
fprintf(stderr, htb: failed to calculate ceil rate table.\n);
return -1;
}
opt.cbuffer = tc_calc_xmittime(opt.ceil.rate, cbuffer);
-   opt.ceil.cell_log = ccell_log;
 
tail = NLMSG_TAIL(n);
addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
diff --git a/tc/q_tbf.c b/tc/q_tbf.c
index 1fc05f4..c7b4f0f 100644
--- a/tc/q_tbf.c
+++ b/tc/q_tbf.c
@@ -170,21 +170,20 @@ static int tbf_parse_opt(struct qdisc_util *qu, int argc, 
char **argv, struct nl
opt.limit = lim;
}
 
-   if ((Rcell_log = tc_calc_rtable(opt.rate.rate, rtab, Rcell_log, mtu, 
mpu))  0) {
+   opt.rate.mpu = mpu;
+   if (tc_calc_rtable(opt.rate, rtab, Rcell_log, mtu)  0) {
fprintf(stderr, TBF: failed to calculate rate table.\n);
return -1;
}
opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer);
-   opt.rate.cell_log = Rcell_log;
-   opt.rate.mpu = mpu;
+
if (opt.peakrate.rate) {
-   if ((Pcell_log = 

[PATCH 6/6] [IPROUTE2]: Change the rate table calc of transmit cost to use upper bound value

2007-09-12 Thread Jesper Dangaard Brouer
commit 2e3edbef7913ac43899c8258ee59d9032778cee1
Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
Date:   Wed Sep 5 15:24:51 2007 +0200

[IPROUTE2]: Change the rate table calc of transmit cost to use upper bound 
value.

Patrick McHardy, Cite: 'its better to overestimate than underestimate
to stay in control of the queue'.

Illustrating the rate table array:
 Legend description
   rtab[x]   : Array index x of rtab[x]
   xmit_sz   : Transmit size contained in rtab[x] (normally transmit time)
   maps[a-b] : Packet sizes from a to b, will map into rtab[x]

Current/old rate table mapping (cell_log:3):
 rtab[0]:=xmit_sz:0  maps[0-7]
 rtab[1]:=xmit_sz:8  maps[8-15]
 rtab[2]:=xmit_sz:16 maps[16-23]
 rtab[3]:=xmit_sz:24 maps[24-31]
 rtab[4]:=xmit_sz:32 maps[32-39]
 rtab[5]:=xmit_sz:40 maps[40-47]
 rtab[6]:=xmit_sz:48 maps[48-55]

New rate table mapping, with kernel cell_align support.
 rtab[0]:=xmit_sz:8  maps[0-8]
 rtab[1]:=xmit_sz:16 maps[9-16]
 rtab[2]:=xmit_sz:24 maps[17-24]
 rtab[3]:=xmit_sz:32 maps[25-32]
 rtab[4]:=xmit_sz:40 maps[33-40]
 rtab[5]:=xmit_sz:48 maps[41-48]
 rtab[6]:=xmit_sz:56 maps[49-56]

New TC util on a kernel WITHOUT support for cell_align
 rtab[0]:=xmit_sz:8 maps[0-7]
 rtab[1]:=xmit_sz:16 maps[8-15]
 rtab[2]:=xmit_sz:24 maps[16-23]
 rtab[3]:=xmit_sz:32 maps[24-31]
 rtab[4]:=xmit_sz:40 maps[32-39]
 rtab[5]:=xmit_sz:48 maps[40-47]
 rtab[6]:=xmit_sz:56 maps[48-55]

Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]

diff --git a/tc/tc_core.c b/tc/tc_core.c
index c713a18..752b07c 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -84,11 +84,12 @@ int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int 
cell_log, unsigned mt
cell_log++;
}
for (i=0; i256; i++) {
-   unsigned sz = (icell_log);
+   unsigned sz = ((i+1)cell_log);
if (sz  mpu)
sz = mpu;
rtab[i] = tc_calc_xmittime(bps, sz);
}
+   r-cell_align=-1; // Due to the sz calc
r-cell_log=cell_log;
return cell_log;
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net/bluetooth/hci_sock.c:352: error: storage size of 'ctv' isn't known

2007-09-12 Thread Robert P. J. Day

  latest git pull, make allyesconfig on i386:

  ...
  CC  net/bluetooth/hci_sock.o
net/bluetooth/hci_sock.c: In function ‘hci_sock_cmsg’:
net/bluetooth/hci_sock.c:352: error: storage size of ‘ctv’ isn’t known
net/bluetooth/hci_sock.c:352: warning: unused variable ‘ctv’
make[2]: *** [net/bluetooth/hci_sock.o] Error 1
make[1]: *** [net/bluetooth] Error 2
make: *** [net] Error 2


rday

p.s. dumb question -- what locale should i be using to get those
quotes to not make such a mess of my screen?  thanks.
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Eric Dumazet
On Wed, 12 Sep 2007 11:00:54 +0100
Christoph Hellwig [EMAIL PROTECTED] wrote:

 This looks nice in general, getting things out of softirq context is
 always good.

I am preparing a patch to net/ipv4/route.c to migrate rt_check_expire()
as well.

 
 On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote:
   #if RT_CACHE_DEBUG = 2
   static atomic_t dst_total = ATOMIC_INIT(0);
   #endif
  -static unsigned long dst_gc_timer_expires;
  -static unsigned long dst_gc_timer_inc = DST_GC_MAX;
  -static void dst_run_gc(unsigned long);
  +static struct {
  +   spinlock_t  lock;
  +   struct dst_entry*list;
  +   unsigned long   timer_inc;
  +   unsigned long   timer_expires;
  +} dst_garbage = {
  +   .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock),
  +   .timer_inc = DST_GC_MAX,
  +};
 
 Can you please et rid of this useless struct?  It just complicates
 the code and means we can't use the proper DEFINE_SPINLOCK initializer.

When using the standard DEFINE_SPINLOCK initializer, the lock is in the 
data section, while list is in bss section.

This 'useless struct' makes lock/list being on the same cache line, so 
reduces latency of __dst_free(). I wish more structures in kernel be used
instead of relying on random placement of the linker...

 
  +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);
 
 This should be static.

Yes I agree

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] NET_SCHED: Rate table fixes

2007-09-12 Thread Patrick McHardy

Jesper Dangaard Brouer wrote:

This set of patches, aim at fixing an issue with the rate table used
by the rate based schedulers.



ACK for all the patches :)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] net: Initialize the network namespace of network devices.

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:24:21 -0600

 
 Except for carefully selected pseudo devices all network
 interfaces should start out in the initial network namespace.
 Ultimately it will be register_netdev that examines what
 dev-nd_net is set to and places a device in a network namespace.
 
 This patch modifies alloc_netdev to initialize the network
 namespace a device is in with the initial network namespace.
 This gets it right for the vast majority of devices so their
 drivers need not be modified and for those few pseudo devices
 that need something different they can change this parameter
 before calling register_netdevice.
 
 The network namespace parameter on a network device is not
 reference counted as the devices are inside of a network namespace
 and cannot remain in that namespace past the lifetime of the
 network namespace.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Move the definition of pr_err() into kernel.h

2007-09-12 Thread Stephen Hemminger
On Tue, 11 Sep 2007 09:56:05 -0500
Emil Medve [EMAIL PROTECTED] wrote:

 Other pr_*() macros are already defined in kernel.h, but pr_err() was defined
 multiple times in several other places
 
 Signed-off-by: Emil Medve [EMAIL PROTECTED]

pr_error seems better than pr_err

Please add the full set:
pr_alert
pr_critical
pr_error
pr_warn
pr_notice


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/16] net: Make packet reception network namespace safe

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:25:43 -0600

 
 This patch modifies every packet receive function
 registered with dev_add_pack() to drop packets if they
 are not from the initial network namespace.
 
 This should ensure that the various network stacks do
 not receive packets in a anything but the initial network
 namespace until the code has been converted and is ready
 for them.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)

2007-09-12 Thread Stephen Hemminger
On Wed, 12 Sep 2007 12:14:14 +0200
Jesper Dangaard Brouer [EMAIL PROTECTED] wrote:

 commit ef065a43b8900fbc0763eac0fa0a9a8a00c8aaa2
 Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
 Date:   Tue Sep 11 16:17:46 2007 +0200
 
 [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)
 
  Extend the tc_ratespec struct, with two parameters: 1) cell_align
  that allow adjusting the alignment of the rate table. 2) overhead
  that allow adding a packet overhead before the lookup in the kernel.
 
  This is done in order to, add support to changing the rate table to
  use the upper-boundry L2T (length to time) value. Currently we use the
  lower-boundry, which result in under-estimating the actual bandwidth
  usage.
 
 Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]
 

Okay, but don't need a special patch to do it. I perodically sync
up the headers before each release.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/16] net: Make device event notification network namespace safe

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:27:11 -0600

 
 Every user of the network device notifiers is either a protocol
 stack or a pseudo device.  If a protocol stack that does not have
 support for multiple network namespaces receives an event for a
 device that is not in the initial network namespace it quite possibly
 can get confused and do the wrong thing.
 
 To avoid problems until all of the protocol stacks are converted
 this patch modifies all netdev event handlers to ignore events on
 devices that are not in the initial network namespace.
 
 As the rest of the code is made network namespace aware these
 checks can be removed.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel

2007-09-12 Thread Stephen Hemminger
On Wed, 12 Sep 2007 12:14:39 +0200
Jesper Dangaard Brouer [EMAIL PROTECTED] wrote:

 commit 07a74a2613440fc1a68d0faa7235ed7027532d78
 Author: Jesper Dangaard Brouer [EMAIL PROTECTED]
 Date:   Tue Sep 11 16:59:58 2007 +0200
 
 [IPROUTE2]: Overhead calculation is now done in the kernel.
 
 The only current user is HTB. HTB overhead argument is now passed on
 to the kernel (in the struct tc_ratespec). Also correct the data
 types.
 
 Signed-off-by: Jesper Dangaard Brouer [EMAIL PROTECTED]

How is this binary compatable with older kernels?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/16] net: Support multiple network namespaces with netlink

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:28:27 -0600

 
 Each netlink socket will live in exactly one network namespace,
 this includes the controlling kernel sockets.
 
 This patch updates all of the existing netlink protocols
 to only support the initial network namespace.  Request
 by clients in other namespaces will get -ECONREFUSED.
 As they would if the kernel did not have the support for
 that netlink protocol compiled in.
 
 As each netlink protocol is updated to be multiple network
 namespace safe it can register multiple kernel sockets
 to acquire a presence in the rest of the network namespaces.
 
 The implementation in af_netlink is a simple filter implementation
 at hash table insertion and hash table look up time.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [POWERPC] ucc_geth: fix module removal

2007-09-12 Thread Anton Vorontsov
- uccf should be set to NULL to not double-free memory on
  subsequent calls;
- ind_hash_q and group_hash_q lists should be initialized in the
  probe() function, instead of struct_init() (called by open()),
  otherwise there will be an oops if ucc_geth_driver removed
  prior 'ifconfig ethX up';
- add unregister_netdev();
- reorder geth_remove() steps.

Signed-off-by: Anton Vorontsov [EMAIL PROTECTED]
---
 drivers/net/ucc_geth.c |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 9a38dfe..bc2b3bf 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2080,8 +2080,10 @@ static void ucc_geth_memclean(struct ucc_geth_private 
*ugeth)
if (!ugeth)
return;
 
-   if (ugeth-uccf)
+   if (ugeth-uccf) {
ucc_fast_free(ugeth-uccf);
+   ugeth-uccf = NULL;
+   }
 
if (ugeth-p_thread_data_tx) {
qe_muram_free(ugeth-thread_dat_tx_offset);
@@ -2312,10 +2314,6 @@ static int ucc_struct_init(struct ucc_geth_private 
*ugeth)
ug_info = ugeth-ug_info;
uf_info = ug_info-uf_info;
 
-   /* Create CQs for hash tables */
-   INIT_LIST_HEAD(ugeth-group_hash_q);
-   INIT_LIST_HEAD(ugeth-ind_hash_q);
-
if (!((uf_info-bd_mem_part == MEM_PART_SYSTEM) ||
  (uf_info-bd_mem_part == MEM_PART_MURAM))) {
if (netif_msg_probe(ugeth))
@@ -3949,6 +3947,10 @@ static int ucc_geth_probe(struct of_device* ofdev, const 
struct of_device_id *ma
ugeth = netdev_priv(dev);
spin_lock_init(ugeth-lock);
 
+   /* Create CQs for hash tables */
+   INIT_LIST_HEAD(ugeth-group_hash_q);
+   INIT_LIST_HEAD(ugeth-ind_hash_q);
+
dev_set_drvdata(device, dev);
 
/* Set the dev-base_addr to the gfar reg region */
@@ -4002,9 +4004,10 @@ static int ucc_geth_remove(struct of_device* ofdev)
struct net_device *dev = dev_get_drvdata(device);
struct ucc_geth_private *ugeth = netdev_priv(dev);
 
-   dev_set_drvdata(device, NULL);
-   ucc_geth_memclean(ugeth);
+   unregister_netdev(dev);
free_netdev(dev);
+   ucc_geth_memclean(ugeth);
+   dev_set_drvdata(device, NULL);
 
return 0;
 }
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] phy: implement release function

2007-09-12 Thread Anton Vorontsov
Lately I've got this nice badness on mdio bus removal:

Device 'e0103120:06' does not have a release() function, it is broken and must 
be fixed.
[ cut here ]
Badness at drivers/base/core.c:107
NIP: c015c1a8 LR: c015c1a8 CTR: c0157488
REGS: c34bdcf0 TRAP: 0700   Not tainted  (2.6.23-rc5-g9ebadfbb-dirty)
MSR: 00029032 EE,ME,IR,DR  CR: 24088422  XER: 
...
[c34bdda0] [c015c1a8] device_release+0x78/0x80 (unreliable)
[c34bddb0] [c01354cc] kobject_cleanup+0x80/0xbc
[c34bddd0] [c01365f0] kref_put+0x54/0x6c
[c34bdde0] [c013543c] kobject_put+0x24/0x34
[c34bddf0] [c015c384] put_device+0x1c/0x2c
[c34bde00] [c0180e84] mdiobus_unregister+0x2c/0x58
...

Though actually there is nothing broken, it just device
subsystem core expects another pattern of resource managment.

This patch implement phy device's release function, thus
we're getting rid of this badness.

Also small hidden bug fixed, hope none other introduced. ;-)

Signed-off-by: Anton Vorontsov [EMAIL PROTECTED]
---
 drivers/net/phy/mdio_bus.c   |9 +
 drivers/net/phy/phy_device.c |   13 +
 include/linux/phy.h  |1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index fc2f0e6..c30196d 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -91,9 +91,12 @@ int mdiobus_register(struct mii_bus *bus)
 
err = device_register(phydev-dev);
 
-   if (err)
+   if (err) {
printk(KERN_ERR phy %d failed to register\n,
i);
+   phy_device_free(phydev);
+   phydev = NULL;
+   }
}
 
bus-phy_map[i] = phydev;
@@ -110,10 +113,8 @@ void mdiobus_unregister(struct mii_bus *bus)
int i;
 
for (i = 0; i  PHY_MAX_ADDR; i++) {
-   if (bus-phy_map[i]) {
+   if (bus-phy_map[i])
device_unregister(bus-phy_map[i]-dev);
-   kfree(bus-phy_map[i]);
-   }
}
 }
 EXPORT_SYMBOL(mdiobus_unregister);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index e275df8..80c283c 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -44,6 +44,17 @@ static struct phy_driver genphy_driver;
 extern int mdio_bus_init(void);
 extern void mdio_bus_exit(void);
 
+void phy_device_free(struct phy_device *phydev)
+{
+   kfree(phydev);
+}
+EXPORT_SYMBOL(phy_device_free);
+
+static void phy_device_release(struct device *dev)
+{
+   phy_device_free(to_phy_device(dev));
+}
+
 struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id)
 {
struct phy_device *dev;
@@ -54,6 +65,8 @@ struct phy_device* phy_device_create(struct mii_bus *bus, int 
addr, int phy_id)
if (NULL == dev)
return (struct phy_device*) PTR_ERR((void*)-ENOMEM);
 
+   dev-dev.release = phy_device_release;
+
dev-speed = 0;
dev-duplex = -1;
dev-pause = dev-asym_pause = 0;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 2a65978..9ec1363 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -398,6 +398,7 @@ int phy_mii_ioctl(struct phy_device *phydev,
 int phy_start_interrupts(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
 struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int 
phy_id);
+void phy_device_free(struct phy_device *phydev);
 
 extern struct bus_type mdio_bus_type;
 #endif /* __PHY_H */
-- 
1.5.0.6
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix e100 on systems that have cache incoherent DMA

2007-09-12 Thread James Chapman

David Acker wrote:

Jeff Garzik wrote:

David Acker wrote:
Let me know if there is any other information I can provide you.  I 
will look through the code to see what could be going on with your 
machine.  I will also look into reproducing these results with a 
newer kernel.  This may be tricky since compulab's patches are pretty 
stale and don't always apply easily.



pktgen outputs for the various cases modified/unmodified[/others?] 
would be nice, if you have a spot of time.


Jeff


I am not familiar with pktgen but I seem to have it working for a simple 
test.
I edited the 1-1 example from 
ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/ .  
The results with and without the patch are below. 


It looks like you ran pktgen on the embedded system, which exercised 
only the transmit path. Auke indicated that the lockup was in the RU. 
Have you run pktgen on a test system to fire packets at the embedded 
system at max rate? Also test what happens when you fire packets in both 
directions simultaneously.


--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] sk98lin: ethtool perm_addr build fix

2007-09-12 Thread Stephen Hemminger
Deal with API changes while sk98lin was removed.
ethtool_ops no longer has a perm_addr hook.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 drivers/net/sk98lin/skethtool.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skethtool.c b/drivers/net/sk98lin/skethtool.c
index 3646069..5a6da89 100644
--- a/drivers/net/sk98lin/skethtool.c
+++ b/drivers/net/sk98lin/skethtool.c
@@ -616,7 +616,6 @@ const struct ethtool_ops SkGeEthtoolOps = {
.get_pauseparam = getPauseParams,
.set_pauseparam = setPauseParams,
.get_link   = ethtool_op_get_link,
-   .get_perm_addr  = ethtool_op_get_perm_addr,
.get_sg = ethtool_op_get_sg,
.set_sg = setScatterGather,
.get_tx_csum= ethtool_op_get_tx_csum,
-- 
1.5.2.5

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3]: sk98lin: neuter device to only SysKonnect boards

2007-09-12 Thread Stephen Hemminger
The skge driver works better for all boards except older SysKonnect
boards.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 drivers/net/sk98lin/skge.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index bf21862..7dc9c9e 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5168,10 +5168,17 @@ err_out:
 #endif
 
 static struct pci_device_id skge_pci_tbl[] = {
+#ifdef SK98LIN_ALL_DEVICES
{ PCI_VENDOR_ID_3COM, 0x1700, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
{ PCI_VENDOR_ID_3COM, 0x80eb, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
+#endif
+#ifdef GENESIS
+   /* Generic SysKonnect SK-98xx Gigabit Ethernet Server Adapter */
{ PCI_VENDOR_ID_SYSKONNECT, 0x4300, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
+#endif
+   /* Generic SysKonnect SK-98xx V2.0 Gigabit Ethernet Adapter */  
{ PCI_VENDOR_ID_SYSKONNECT, 0x4320, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
+#ifdef SK98LIN_ALL_DEVICES
 /* DLink card does not have valid VPD so this driver gags
  * { PCI_VENDOR_ID_DLINK, 0x4c00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
  */
@@ -5180,6 +5187,7 @@ static struct pci_device_id skge_pci_tbl[] = {
{ PCI_VENDOR_ID_CNET, 0x434e, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
{ PCI_VENDOR_ID_LINKSYS, 0x1032, PCI_ANY_ID, 0x0015, },
{ PCI_VENDOR_ID_LINKSYS, 0x1064, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
+#endif
{ 0 }
 };
 
-- 
1.5.2.5

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/16] net: Make the device list and device lookups per namespace.

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:35:46 -0600

 
 This patch makes most of the generic device layer network
 namespace safe.  This patch makes dev_base_head a
 network namespace variable, and then it picks up
 a few associated variables.  The functions:
 dev_getbyhwaddr
 dev_getfirsthwbytype
 dev_get_by_flags
 dev_get_by_name
 __dev_get_by_name
 dev_get_by_index
 __dev_get_by_index
 dev_ioctl
 dev_ethtool
 dev_load
 wireless_process_ioctl
 
 were modified to take a network namespace argument, and
 deal with it.
 
 vlan_ioctl_set and brioctl_set were modified so their
 hooks will receive a network namespace argument.
 
 So basically anthing in the core of the network stack that was
 affected to by the change of dev_base was modified to handle
 multiple network namespaces.  The rest of the network stack was
 simply modified to explicitly use init_net the initial network
 namespace.  This can be fixed when those components of the network
 stack are modified to handle multiple network namespaces.
 
 For now the ifindex generator is left global.
 
 Fundametally ifindex numbers are per namespace, or else
 we will have corner case problems with migration when
 we get that far.
 
 At the same time there are assumptions in the network stack
 that the ifindex of a network device won't change.  Making
 the ifindex number global seems a good compromise until
 the network stack can cope with ifindex changes when
 you change namespaces, and the like.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/16] net: Factor out __dev_alloc_name from dev_alloc_name

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:36:56 -0600

 
 When forcibly changing the network namespace of a device
 I need something that can generate a name for the device
 in the new namespace without overwriting the old name.
 
 __dev_alloc_name provides me that functionality.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/16] net: Implement network device movement between namespaces

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:38:46 -0600

 
 This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
 a network device is local to a single network namespace and
 should never be moved.  Useful for pseudo devices that we
 need an instance in each network namespace (like the loopback
 device) and for any device we find that cannot handle multiple
 network namespaces so we may trap them in the initial network
 namespace.
 
 This patch introduces the function dev_change_net_namespace
 a function used to move a network device from one network
 namespace to another.  To the network device nothing
 special appears to happen, to the components of the network
 stack it appears as if the network device was unregistered
 in the network namespace it is in, and a new device
 was registered in the network namespace the device
 was moved to.
 
 This patch sets up a namespace device destructor that
 upon the exit of a network namespace moves all of the
 movable network devices  to the initial network namespace
 so they are not lost.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:43:44 -0600

 
 The simplest thing to implement is moving network devices between
 namespaces.  However with the same attribute IFLA_NET_NS_PID we can
 easily implement creating devices in the destination network
 namespace as well.  However that is a little bit trickier so this
 patch sticks to what is simple and easy.
 
 A pid is used to identify a process that happens to be a member
 of the network namespace we want to move the network device to.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace

2007-09-12 Thread David Miller

I added the following patch to net-2.6.24 to kill a warning
since net_alloc() has no users (yet).

commit f444fa9b5d70b3d431e1554e0975e012514c39f3
Author: David S. Miller [EMAIL PROTECTED](none)
Date:   Wed Sep 12 14:01:08 2007 +0200

[NET]: #if 0 out net_alloc() for now.

We will undo this once it is actually used.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f259a9b..1fc513c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -32,10 +32,12 @@ void net_unlock(void)
mutex_unlock(net_list_mutex);
 }
 
+#if 0
 static struct net *net_alloc(void)
 {
return kmem_cache_alloc(net_cachep, GFP_KERNEL);
 }
+#endif
 
 static void net_free(struct net *net)
 {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] sk98lin: restore driver

2007-09-12 Thread Stephen Hemminger
This reverts commit e1abecc48938fbe1966ea6e78267fc673fa59295.

The driver works on some hardware that skge doesn't handle yet.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
Patch too large for mailing list. Download from:

http://developer.osdl.org/shemminger/patches/sk98lin-2.6.23-restore.patch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
Hello,

as already described old temporary sockets (client is gone) of lockd aren't
closed after some time. So, with enough clients and some time gone, there
are 80 open dangling sockets and you start getting messages of the form:

lockd: too many open TCP sockets, consider increasing the number of nfsd 
threads.

If I understand the code then the intention was that the server closes
temporary sockets after about 6 to 12 minutes:

a timer is started which calls svc_age_temp_sockets every 6 minutes.

svc_age_temp_sockets:
if a socket is marked OLD it gets closed.
sockets which are not marked as OLD are marked OLD

every time the sockets receives something OLD is cleared.

But svc_age_temp_sockets never closes any socket though because it only
closes sockets with svsk-sk_inuse == 0. This seems to be a bug.

Here is a patch against 2.6.22.6 which changes the test to
svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine
here. Unused sockets get closed (after 6 to 12 minutes)

Signed-off-by: Wolfgang Walter [EMAIL PROTECTED]

--- ../linux-2.6.22.6/net/sunrpc/svcsock.c  2007-08-27 18:10:14.0 
+0200
+++ net/sunrpc/svcsock.c2007-09-11 11:07:13.0 +0200
@@ -1572,7 +1575,7 @@
 
if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
continue;
-   if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
svsk-sk_flags))
+   if (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, 
svsk-sk_flags))
continue;
atomic_inc(svsk-sk_inuse);
list_move(le, to_be_aged);


As svc_age_temp_sockets did not do anything before this change may trigger
hidden bugs.

To be true I don't see why this check

(atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags))

is needed at all (it can only be an optimation) as this fields change after
the check. In svc_tcp_accept there is no such check when a temporary socket
is closed.


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-12 Thread jamal
On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote:
 On Fri, 07 Sep 2007, jamal wrote:

  I am going to be the devil's advocate[1]:
 
 So let me be the angel's advocate.  :-)

I think this would make you God's advocate ;-
(http://en.wikipedia.org/wiki/God%27s_advocate)

 I view his results much more favorably.  

The challenge is, under _low traffic_: bad bad CPU use.
Thats what is at stake, correct?

Lets bury the stats for a sec ...

1) Has that CPU situation improved? No, it has gotten worse.
2) Was there a throughput problem? No. 
Remember, this is _low traffic and the complaint is not NAPI doesnt do
high throughput. I am not willing to spend 34% more cpu to get a few
hundred pps (under low traffic!). 
3)Latency improvement is good. But is 34% cost worthwile for the corner
case of low traffic?

Heres an analogy:
I went to buy bread and complained that 66cents was too much for such
a tiny sliced loaf.
You tell me you have solved my problem: asking me to pay a dollar
because you made the bread slices crispier. I was complaining on the _66
cents price_ not on the crispiness of the slices ;- Crispier slices are
good - but am i, the person who was complaining about price, willing to
pay 40-50% more? People are bitching about NAPI abusing CPU, is the 
answer to abuse more CPU than NAPI?;-
The answer could be I am not solving that problem anymore - at least
thats what James is saying;-

Note: I am not saying theres no problem - just saying the result is not
addressing the problem.

 You can't always improve on all metrics of a workload.  

But you gotta try to be consistent. 
If, for example, one packet size/rate got negative results but the next
got positive results - thats lacking consistency. 

 Sometimes there
 are tradeoffs to be made to be decided by the user based on what's most
 important to that user and his specific workload.  And the suggested
 ethtool option (defaulting to current behavior) would enable the user
 to make that decision.

And the challenge is:
What workload is willing to invest that much cpu for low traffic?
Can you name one? One that may come close is database benchmarks for
latency - but those folks wouldnt touch this with a mile-long pole if
you told them their cpu use is going to get worse than what NAPI (that
big bad CPU hog under low traffic) is giving them.

 
 P.S.  I agree that some tests run in parallel with some CPU hogs also
   running might be beneficial and enlightening.

indeed.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] net: Make /proc/net per network namespace

2007-09-12 Thread Daniel Lezcano

David Miller wrote:

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:20:36 -0600


This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]


Patch applied, thanks.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers



Hi Dave,

it seems the fs/proc/proc_net.c was not added to the git repository.

Regards.

-- Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Eric Dumazet
On Wed, 12 Sep 2007 04:12:00 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Wed, 12 Sep 2007 12:08:45 +0200
 
  Unfortunatly, there is no equivalent for this one. 
  This gives on my Opterons a nice prefetchnta
  
  prefetch(addr) is more like __builtin_prefetch(addr, 0, 3)
  
  I would like to avoid to zap L2 cache with useless data.
  
  __builtin_prefetch() is included from gcc 3.1 (2002), so every 
  platform should support it, as linux-2.6 requires gcc 3.2 at least.
  
  I guess you are going to tell me to first publish a patch to lkml :)
 
 Basically, yes :-)  You won't be the only person to find this
 useful.

OK, let's try a normal prefetch(), I'll change it later when/if a 
new generic macro is added. I added the missing 'static' and a comment
about the struct {} dst_garbage. I also corrected spelling error on
patch title (collection)

Thank you

[PATCH] NET : convert IP route cache garbage collection from softirq processing 
to a workqueue

When the periodic IP route cache flush is done (every 600 seconds on 
default configuration), some hosts suffer a lot and eventually trigger
the soft lockup message.

dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
eventually freeing some (less than 1%) of them, while holding the 
dst_lock spinlock for the whole scan.

Then it rearms a timer to redo the full thing 1/10 s later...
The slowdown can last one minute or so, depending on how active are
the tcp sessions.

This second version of the patch converts the processing from a softirq
based one to a workqueue.

Even if the list of entries in garbage_list is huge, host is still
responsive to softirqs and can make progress.

Instead of resetting gc timer to 0.1 second if one entry was freed in a
gc run, we do this if more than 10% of entries were freed.


Before patch :

Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
Aug 16 06:21:37 SRV1 kernel: 
Aug 16 06:21:37 SRV1 kernel: Call Trace:
Aug 16 06:21:37 SRV1 kernel:  IRQ  [802286f0] 
wake_up_process+0x10/0x20
Aug 16 06:21:37 SRV1 kernel:  [80251e09] softlockup_tick+0xe9/0x110
Aug 16 06:21:37 SRV1 kernel:  [803cd380] dst_run_gc+0x0/0x140
Aug 16 06:21:37 SRV1 kernel:  [802376f3] run_local_timers+0x13/0x20
Aug 16 06:21:37 SRV1 kernel:  [802379c7] 
update_process_times+0x57/0x90
Aug 16 06:21:37 SRV1 kernel:  [80216034] 
smp_local_timer_interrupt+0x34/0x60
Aug 16 06:21:37 SRV1 kernel:  [802165cc] 
smp_apic_timer_interrupt+0x5c/0x80
Aug 16 06:21:37 SRV1 kernel:  [8020a816] 
apic_timer_interrupt+0x66/0x70
Aug 16 06:21:37 SRV1 kernel:  [803cd3d3] dst_run_gc+0x53/0x140
Aug 16 06:21:37 SRV1 kernel:  [803cd3c6] dst_run_gc+0x46/0x140
Aug 16 06:21:37 SRV1 kernel:  [80237148] run_timer_softirq+0x148/0x1c0
Aug 16 06:21:37 SRV1 kernel:  [8023340c] __do_softirq+0x6c/0xe0
Aug 16 06:21:37 SRV1 kernel:  [8020ad6c] call_softirq+0x1c/0x30
Aug 16 06:21:37 SRV1 kernel:  EOI  [8020cb34] do_softirq+0x34/0x90
Aug 16 06:21:37 SRV1 kernel:  [802331cf] local_bh_enable_ip+0x3f/0x60
Aug 16 06:21:37 SRV1 kernel:  [80422913] _spin_unlock_bh+0x13/0x20
Aug 16 06:21:37 SRV1 kernel:  [803dfde8] 
rt_garbage_collect+0x1d8/0x320
Aug 16 06:21:37 SRV1 kernel:  [803cd4dd] dst_alloc+0x1d/0xa0
Aug 16 06:21:37 SRV1 kernel:  [803e1433] 
__ip_route_output_key+0x573/0x800
Aug 16 06:21:37 SRV1 kernel:  [803c02e2] sock_common_recvmsg+0x32/0x50
Aug 16 06:21:37 SRV1 kernel:  [803e16dc] 
ip_route_output_flow+0x1c/0x60
Aug 16 06:21:37 SRV1 kernel:  [80400160] tcp_v4_connect+0x150/0x610
Aug 16 06:21:37 SRV1 kernel:  [803ebf07] 
inet_bind_bucket_create+0x17/0x60
Aug 16 06:21:37 SRV1 kernel:  [8040cd16] 
inet_stream_connect+0xa6/0x2c0
Aug 16 06:21:37 SRV1 kernel:  [80422981] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel:  [803c0bdf] lock_sock_nested+0xcf/0xe0
Aug 16 06:21:37 SRV1 kernel:  [80422981] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel:  [803be551] sys_connect+0x71/0xa0
Aug 16 06:21:37 SRV1 kernel:  [803eee3f] tcp_setsockopt+0x1f/0x30
Aug 16 06:21:37 SRV1 kernel:  [803c030f] 
sock_common_setsockopt+0xf/0x20
Aug 16 06:21:37 SRV1 kernel:  [803be4bd] sys_setsockopt+0x9d/0xc0
Aug 16 06:21:37 SRV1 kernel:  [8028881e] sys_ioctl+0x5e/0x80
Aug 16 06:21:37 SRV1 kernel:  [80209c4e] system_call+0x7e/0x83

After patch : (RT_CACHE_DEBUG set to 2 to get following traces)

dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us

Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace

2007-09-12 Thread Eric W. Biederman
David Miller [EMAIL PROTECTED] writes:

 I added the following patch to net-2.6.24 to kill a warning
 since net_alloc() has no users (yet).

Reasonable, and thanks for merging these.

Having a solid place to start helps a lot.

I will see if I can get the /proc races fixed shortly.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] net: Make /proc/net per network namespace

2007-09-12 Thread David Miller
From: Daniel Lezcano [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:12:04 +0200

 it seems the fs/proc/proc_net.c was not added to the git repository.

Fixed, thanks for catching that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel

2007-09-12 Thread Jesper Dangaard Brouer
On Wed, 2007-09-12 at 13:05 +0200, Stephen Hemminger wrote:

 How is this binary compatable with older kernels?

It will be binary compatable, as I use/rename some unused variables in
struct tc_ratespec.

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:16:56 +0200

 OK, let's try a normal prefetch(), I'll change it later when/if a 
 new generic macro is added. I added the missing 'static' and a comment
 about the struct {} dst_garbage. I also corrected spelling error on
 patch title (collection)

I sorted out the conflicts with the network namespace stuff
I just checked in and added your patch to net-2.6.24

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-12 Thread David Miller
From: Johannes Berg [EMAIL PROTECTED]
Date: Thu, 06 Sep 2007 17:19:55 +0200

 
 Oh btw. Can we stick a might_sleep() into dev_close() *before* the test
 whether the device is up? That way, we'd have seen the bug, but
 apparently nobody before Florian ever did a 'ip link set wmaster0 down'
 while the other interfaces were still open.

I've added this to net-2.6.24
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NETLINK]: Introduce nested and byteorder flag to netlink attribute

2007-09-12 Thread Thomas Graf
This change allows the generic attribute interface to be used within
the netfilter subsystem where this flag was initially introduced.

The byte-order flag is yet unused, it's intended use is to
allow automatic byte order convertions for all atomic types.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/include/linux/netlink.h
===
--- net-2.6.24.orig/include/linux/netlink.h 2007-09-12 13:29:49.0 
+0200
+++ net-2.6.24/include/linux/netlink.h  2007-09-12 13:59:41.0 +0200
@@ -129,6 +129,20 @@
__u16   nla_type;
 };
 
+/*
+ * nla_type (16 bits)
+ * +---+---+---+
+ * | N | O | Attribute Type|
+ * +---+---+---+
+ * N := Carries nested attributes
+ * O := Payload stored in network byte order
+ *
+ * Note: The N and O flag are mutually exclusive.
+ */
+#define NLA_F_NESTED   (1  15)
+#define NLA_F_NET_BYTEORDER(1  14)
+#define NLA_TYPE_MASK  ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
+
 #define NLA_ALIGNTO4
 #define NLA_ALIGN(len) (((len) + NLA_ALIGNTO - 1)  ~(NLA_ALIGNTO - 1))
 #define NLA_HDRLEN ((int) NLA_ALIGN(sizeof(struct nlattr)))
Index: net-2.6.24/include/net/netlink.h
===
--- net-2.6.24.orig/include/net/netlink.h   2007-09-12 13:29:50.0 
+0200
+++ net-2.6.24/include/net/netlink.h2007-09-12 14:17:56.0 +0200
@@ -667,6 +667,15 @@
 }
 
 /**
+ * nla_type - attribute type
+ * @nla: netlink attribute
+ */
+static inline int nla_type(const struct nlattr *nla)
+{
+   return nla-nla_type  NLA_TYPE_MASK;
+}
+
+/**
  * nla_data - head of payload
  * @nla: netlink attribute
  */
Index: net-2.6.24/net/ipv4/fib_frontend.c
===
--- net-2.6.24.orig/net/ipv4/fib_frontend.c 2007-09-12 13:29:51.0 
+0200
+++ net-2.6.24/net/ipv4/fib_frontend.c  2007-09-12 13:59:41.0 +0200
@@ -487,7 +487,7 @@
}
 
nlmsg_for_each_attr(attr, nlh, sizeof(struct rtmsg), remaining) {
-   switch (attr-nla_type) {
+   switch (nla_type(attr)) {
case RTA_DST:
cfg-fc_dst = nla_get_be32(attr);
break;
Index: net-2.6.24/net/ipv4/fib_semantics.c
===
--- net-2.6.24.orig/net/ipv4/fib_semantics.c2007-09-12 13:29:51.0 
+0200
+++ net-2.6.24/net/ipv4/fib_semantics.c 2007-09-12 13:59:41.0 +0200
@@ -743,7 +743,7 @@
int remaining;
 
nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) {
-   int type = nla-nla_type;
+   int type = nla_type(nla);
 
if (type) {
if (type  RTAX_MAX)
Index: net-2.6.24/net/ipv6/route.c
===
--- net-2.6.24.orig/net/ipv6/route.c2007-09-12 13:29:51.0 +0200
+++ net-2.6.24/net/ipv6/route.c 2007-09-12 13:59:41.0 +0200
@@ -1278,7 +1278,7 @@
int remaining;
 
nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) {
-   int type = nla-nla_type;
+   int type = nla_type(nla);
 
if (type) {
if (type  RTAX_MAX) {
Index: net-2.6.24/net/netlabel/netlabel_cipso_v4.c
===
--- net-2.6.24.orig/net/netlabel/netlabel_cipso_v4.c2007-09-12 
13:29:51.0 +0200
+++ net-2.6.24/net/netlabel/netlabel_cipso_v4.c 2007-09-12 13:59:41.0 
+0200
@@ -130,7 +130,7 @@
return -EINVAL;
 
nla_for_each_nested(nla, info-attrs[NLBL_CIPSOV4_A_TAGLST], nla_rem)
-   if (nla-nla_type == NLBL_CIPSOV4_A_TAG) {
+   if (nla_type(nla) == NLBL_CIPSOV4_A_TAG) {
if (iter = CIPSO_V4_TAG_MAXCNT)
return -EINVAL;
doi_def-tags[iter++] = nla_get_u8(nla);
@@ -192,13 +192,13 @@
nla_for_each_nested(nla_a,
info-attrs[NLBL_CIPSOV4_A_MLSLVLLST],
nla_a_rem)
-   if (nla_a-nla_type == NLBL_CIPSOV4_A_MLSLVL) {
+   if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSLVL) {
if (nla_validate_nested(nla_a,
NLBL_CIPSOV4_A_MAX,
netlbl_cipsov4_genl_policy) != 0)
goto add_std_failure;
nla_for_each_nested(nla_b, nla_a, nla_b_rem)
-   switch (nla_b-nla_type) {
+   switch 

Re: [NETLINK]: Introduce nested and byteorder flag to netlink attribute

2007-09-12 Thread David Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:41:45 +0200

 This change allows the generic attribute interface to be used within
 the netfilter subsystem where this flag was initially introduced.
 
 The byte-order flag is yet unused, it's intended use is to
 allow automatic byte order convertions for all atomic types.
 
 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied to net-2.6.24, thanks Thomas.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 0/3] fixes for the core network namespace

2007-09-12 Thread dlezcano
The following patches fixes some compilation errors and boot problems
related to the network namespace patchset.

They apply to net-2.6.24
-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 3/3] fix bad macro definition

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

The macro definition is bad. When calling next_net_device with 
parameter name dev, the resulting code is:
  struct net_device *dev = dev and that leads to an unexpected
behavior. Especially when llc_core is compiled in, the kernel panics
at boot time.
The patchset change macro definition with static inline functions as
they were defined before.

Signed-off-by: Benjamin Thery [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 include/linux/netdevice.h |   35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

Index: net-2.6.24/include/linux/netdevice.h
===
--- net-2.6.24.orig/include/linux/netdevice.h
+++ net-2.6.24/include/linux/netdevice.h
@@ -41,7 +41,8 @@
 #include linux/dmaengine.h
 #include linux/workqueue.h
 
-struct net;
+#include net/net_namespace.h
+
 struct vlan_group;
 struct ethtool_ops;
 struct netpoll_info;
@@ -739,23 +740,21 @@
list_for_each_entry_continue(d, (net)-dev_base_head, dev_list)
 #define net_device_entry(lh)   list_entry(lh, struct net_device, dev_list)
 
-#define next_net_device(d) \
-({ \
-   struct net_device *dev = d; \
-   struct list_head *lh;   \
-   struct net *net;\
-   \
-   net = dev-nd_net;  \
-   lh = dev-dev_list.next;\
-   lh == net-dev_base_head ? NULL : net_device_entry(lh);\
-})
-
-#define first_net_device(N)\
-({ \
-   struct net *NET = (N);  \
-   list_empty(NET-dev_base_head) ? NULL :\
-   net_device_entry(NET-dev_base_head.next);  \
-})
+static inline struct net_device *next_net_device(struct net_device *dev)
+{
+   struct list_head *lh;
+   struct net *net;
+
+   net = dev-nd_net;
+lh = dev-dev_list.next;
+   return lh == net-dev_base_head ? NULL : net_device_entry(lh);
+}
+
+static inline struct net_device *first_net_device(struct net *net)
+{
+   return list_empty(net-dev_base_head) ? NULL :
+   net_device_entry(net-dev_base_head.next);
+}
 
 extern int netdev_boot_setup_check(struct net_device *dev);
 extern unsigned long   netdev_boot_base(const char *prefix, int unit);

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

The core patchset of the network namespace sent by 
Eric Biederman does not do dynamic loopback creation.
So there is no call to alloc_netdev_mq which fills the
network namespace field of the netdevice.

This patch assign the loopback to the init network namespace.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 drivers/net/loopback.c |1 +
 1 file changed, 1 insertion(+)

Index: net-2.6.24/drivers/net/loopback.c
===
--- net-2.6.24.orig/drivers/net/loopback.c
+++ net-2.6.24/drivers/net/loopback.c
@@ -225,6 +225,7 @@
  | NETIF_F_LLTX
  | NETIF_F_NETNS_LOCAL,
.ethtool_ops= loopback_ethtool_ops,
+   .nd_net = init_net,
 };
 
 /* Setup and register the loopback device. */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-2.6.24][NETNS][patch 1/3] fix export symbols

2007-09-12 Thread dlezcano
From: Daniel Lezcano [EMAIL PROTECTED]

Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig

Signed-off-by: Mark Nelson [EMAIL PROTECTED]
Acked-by: Benjamin Thery [EMAIL PROTECTED]
---
 fs/proc/proc_net.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: net-2.6.24/fs/proc/proc_net.c
===
--- net-2.6.24.orig/fs/proc/proc_net.c
+++ net-2.6.24/fs/proc/proc_net.c
@@ -31,6 +31,7 @@
 {
return create_proc_info_entry(name,mode, net-proc_net, get_info);
 }
+EXPORT_SYMBOL_GPL(proc_net_create);
 
 struct proc_dir_entry *proc_net_fops_create(struct net *net,
const char *name, mode_t mode, const struct file_operations *fops)
@@ -42,12 +43,13 @@
res-proc_fops = fops;
return res;
 }
+EXPORT_SYMBOL_GPL(proc_net_fops_create);
 
 void proc_net_remove(struct net *net, const char *name)
 {
remove_proc_entry(name, net-proc_net);
 }
-
+EXPORT_SYMBOL_GPL(proc_net_remove);
 
 static struct proc_dir_entry *proc_net_shadow;
 

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.23-rc5] ipsec interfamily route handling fix

2007-09-12 Thread David Miller
From: Joakim Koskela [EMAIL PROTECTED]
Date: Thu, 6 Sep 2007 19:00:10 +0300

 This patch addresses a couple of issues related to interfamily ipsec
 modes. The problem is that the structure of the routing info changes
 with the family during the __xfrmX_bundle_create, which hasn't been
 taken properly into account. Seems that by coincidence it hasn't
 caused problems on 32bit platforms, but crashes for example on x86_64
 in 6-4 around line 209 of xfrm6_policy.c as rt doesn't point to a
 rt6_info anymore, but actually a struct rtable. With 64bit pointers,
 the rt-rt6i_node pointer seems to hit something usually not null in
 the rtable that rt now points to, making it go for the path_cookie
 assignment and subsequently crashing.
 
 Tested on both 32/64bit with all four (44/46/64/66) combinations of
 transformation. I'm still a bit worried about how for example nested
 transformations work with all of this and would appreciate if someone
 more familiar with the details of these structs could comment.
 
 Signed-off-by: Joakim Koskela [EMAIL PROTECTED]

This fix basically looks fine to me, but I'd like at least one
other person to review it too.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new NAPI interface broken

2007-09-12 Thread David Miller
From: Jan-Bernd Themann [EMAIL PROTECTED]
Date: Fri, 7 Sep 2007 11:37:02 +0200

 2) On SMP systems: after netif_rx_complete has been called on CPU1
(+interruts enabled), netif_rx_schedule could be called on CPU2 
(irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. 
In that case the device would be added to poll lists of CPU1 and CPU2
as net_rx_action would see NAPI_STATE_SCHED set.
This must not happen. It will be caught when netif_rx_complete is
called the second time (BUG() called)
 
 This would mean we have a problem on all SMP machines right now.

This is not a correct statement.

Only on your platform do network device interrupts get moved
around, no other platform does this.

Sparc64 doesn't, all interrupts stay in one location after
the cpu is initially choosen.

x86 and x86_64 specifically do not move around network
device interrupts, even though other device types do
get dynamic IRQ cpu distribution.

That's why you are the only person seeing this problem.

I agree that it should be fixed, but we should also fix the IRQ
distribution scheme used on powerpc platforms which is totally
broken in these cases.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-2.6.24][NETNS][patch 1/3] fix export symbols

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:38:12 +0200

 From: Daniel Lezcano [EMAIL PROTECTED]
 
 Add the appropriate EXPORT_SYMBOLS for proc_net_create,
 proc_net_fops_create and proc_net_remove to fix errors when
 compiling allmodconfig
 
 Signed-off-by: Mark Nelson [EMAIL PROTECTED]
 Acked-by: Benjamin Thery [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.

Why aren't you signing off on these patches?  Please
do so in the future.

Because From:  usually means you are the patch author, and I can't
tell who wrote these patches, you or these other people listed in the
signoff area.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:38:13 +0200

 From: Daniel Lezcano [EMAIL PROTECTED]
 
 The core patchset of the network namespace sent by 
 Eric Biederman does not do dynamic loopback creation.
 So there is no call to alloc_netdev_mq which fills the
 network namespace field of the netdevice.
 
 This patch assign the loopback to the init network namespace.
 
 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware

2007-09-12 Thread Eric W. Biederman

Stephen it looks like you weren't cc'd on the latest version
of the veth support.  So this patchset first reverts the old
version of the veth support you merged.  Then merges a tested
version of the veth support.

This reverts commit 4ed390ce43d1ec7c881721f312260df901d8390d.

Conflicts:

ip/ip.c
---
 ip/Makefile |2 +-
 ip/ip.c |4 +-
 ip/veth.c   |  196 ---
 ip/veth.h   |   17 -
 4 files changed, 2 insertions(+), 217 deletions(-)
 delete mode 100644 ip/veth.c
 delete mode 100644 ip/veth.h

diff --git a/ip/Makefile b/ip/Makefile
index 209c5c8..9a5bfe3 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -1,7 +1,7 @@
 IPOBJ=ip.o ipaddress.o iproute.o iprule.o \
 rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
 ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
-ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o veth.o
+ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/ip.c b/ip/ip.c
index 829fc64..4bdb83b 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -27,7 +27,6 @@
 #include SNAPSHOT.h
 #include utils.h
 #include ip_common.h
-#include veth.h
 
 int preferred_family = AF_UNSPEC;
 int show_stats = 0;
@@ -48,7 +47,7 @@ static void usage(void)
 Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n
ip [ -force ] [-batch filename\n
 where  OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n
-   maddr | mroute | monitor | xfrm | veth }\n
+   maddr | mroute | monitor | xfrm }\n
OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n
 -f[amily] { inet | inet6 | ipx | dnet | link } |\n
 -o[neline] | -t[imestamp] }\n);
@@ -78,7 +77,6 @@ static const struct cmd {
{ monitor,do_ipmonitor },
{ xfrm,   do_xfrm },
{ mroute, do_multiroute },
-   { veth,   do_veth },
{ help,   do_help },
{ 0 }
 };
diff --git a/ip/veth.c b/ip/veth.c
deleted file mode 100644
index d4eecc8..000
--- a/ip/veth.c
+++ /dev/null
@@ -1,196 +0,0 @@
-/*
- * veth.c ethernet tunnel
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * Authors:Pavel Emelianov, [EMAIL PROTECTED]
- *
- */
-
-#include stdio.h
-#include string.h
-#include unistd.h
-#include sys/types.h
-#include sys/socket.h
-#include linux/genetlink.h
-
-#include utils.h
-#include veth.h
-
-#define GENLMSG_DATA(glh)   ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
-#define NLA_DATA(na)((void *)((char*)(na) + NLA_HDRLEN))
-
-static int do_veth_help(void)
-{
-   fprintf(stderr, Usage: ip veth add DEVICE PEER_NAME\n);
-   fprintf(stderr,del DEVICE\n);
-   exit(-1);
-}
-
-static int genl_ctrl_resolve_family(const char *family)
-{
-   struct rtnl_handle rth;
-   struct nlmsghdr *nlh;
-   struct genlmsghdr *ghdr;
-   int ret = 0;
-   struct {
-   struct nlmsghdr n;
-   charbuf[4096];
-   } req;
-
-   memset(req, 0, sizeof(req));
-
-   nlh = req.n;
-   nlh-nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
-   nlh-nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-   nlh-nlmsg_type = GENL_ID_CTRL;
-
-   ghdr = NLMSG_DATA(req.n);
-   ghdr-cmd = CTRL_CMD_GETFAMILY;
-
-   if (rtnl_open_byproto(rth, 0, NETLINK_GENERIC)  0) {
-   fprintf(stderr, Cannot open generic netlink socket\n);
-   exit(1);
-   }
-
-   addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1);
-
-   if (rtnl_talk(rth, nlh, 0, 0, nlh, NULL, NULL)  0) {
-   fprintf(stderr, Error talking to the kernel\n);
-   goto errout;
-   }
-
-   {
-   struct rtattr *tb[CTRL_ATTR_MAX + 1];
-   struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
-   int len = nlh-nlmsg_len;
-   struct rtattr *attrs;
-
-   if (nlh-nlmsg_type !=  GENL_ID_CTRL) {
-   fprintf(stderr, Not a controller message, nlmsg_len=%d 

-   nlmsg_type=0x%x\n, nlh-nlmsg_len, 
nlh-nlmsg_type);
-   goto errout;
-   }
-
-   if (ghdr-cmd != CTRL_CMD_NEWFAMILY) {
-   fprintf(stderr, Unkown controller command %d\n, 
ghdr-cmd);
-   goto errout;
-   }
-
-   len -= NLMSG_LENGTH(GENL_HDRLEN);
-
-   if (len  0) {
-   fprintf(stderr, wrong controller message len %d\n, 
len);
-   return -1;
-   }
-
-   attrs = (struct rtattr 

Re: [PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware

2007-09-12 Thread Pavel Emelyanov
Eric W. Biederman wrote:
 Stephen it looks like you weren't cc'd on the latest version
 of the veth support.  So this patchset first reverts the old

He was. The latest version looks completely different from what
is reversed in this patch.

 version of the veth support you merged.  Then merges a tested
 version of the veth support.
 
 This reverts commit 4ed390ce43d1ec7c881721f312260df901d8390d.
 
 Conflicts:
 
   ip/ip.c
 ---
  ip/Makefile |2 +-
  ip/ip.c |4 +-
  ip/veth.c   |  196 
 ---
  ip/veth.h   |   17 -
  4 files changed, 2 insertions(+), 217 deletions(-)
  delete mode 100644 ip/veth.c
  delete mode 100644 ip/veth.h
 
 diff --git a/ip/Makefile b/ip/Makefile
 index 209c5c8..9a5bfe3 100644
 --- a/ip/Makefile
 +++ b/ip/Makefile
 @@ -1,7 +1,7 @@
  IPOBJ=ip.o ipaddress.o iproute.o iprule.o \
  rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
  ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
 -ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o veth.o
 +ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o
  
  RTMONOBJ=rtmon.o
  
 diff --git a/ip/ip.c b/ip/ip.c
 index 829fc64..4bdb83b 100644
 --- a/ip/ip.c
 +++ b/ip/ip.c
 @@ -27,7 +27,6 @@
  #include SNAPSHOT.h
  #include utils.h
  #include ip_common.h
 -#include veth.h
  
  int preferred_family = AF_UNSPEC;
  int show_stats = 0;
 @@ -48,7 +47,7 @@ static void usage(void)
  Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n
 ip [ -force ] [-batch filename\n
  where  OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n
 -   maddr | mroute | monitor | xfrm | veth }\n
 +   maddr | mroute | monitor | xfrm }\n
 OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] 
 |\n
  -f[amily] { inet | inet6 | ipx | dnet | link } |\n
  -o[neline] | -t[imestamp] }\n);
 @@ -78,7 +77,6 @@ static const struct cmd {
   { monitor,do_ipmonitor },
   { xfrm,   do_xfrm },
   { mroute, do_multiroute },
 - { veth,   do_veth },
   { help,   do_help },
   { 0 }
  };
 diff --git a/ip/veth.c b/ip/veth.c
 deleted file mode 100644
 index d4eecc8..000
 --- a/ip/veth.c
 +++ /dev/null
 @@ -1,196 +0,0 @@
 -/*
 - * veth.c   ethernet tunnel
 - *
 - *   This program is free software; you can redistribute it and/or
 - *   modify it under the terms of the GNU General Public License
 - *   as published by the Free Software Foundation; either version
 - *   2 of the License, or (at your option) any later version.
 - *
 - * Authors:  Pavel Emelianov, [EMAIL PROTECTED]
 - *
 - */
 -
 -#include stdio.h
 -#include string.h
 -#include unistd.h
 -#include sys/types.h
 -#include sys/socket.h
 -#include linux/genetlink.h
 -
 -#include utils.h
 -#include veth.h
 -
 -#define GENLMSG_DATA(glh)   ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
 -#define NLA_DATA(na)((void *)((char*)(na) + NLA_HDRLEN))
 -
 -static int do_veth_help(void)
 -{
 - fprintf(stderr, Usage: ip veth add DEVICE PEER_NAME\n);
 - fprintf(stderr,del DEVICE\n);
 - exit(-1);
 -}
 -
 -static int genl_ctrl_resolve_family(const char *family)
 -{
 - struct rtnl_handle rth;
 - struct nlmsghdr *nlh;
 - struct genlmsghdr *ghdr;
 - int ret = 0;
 - struct {
 - struct nlmsghdr n;
 - charbuf[4096];
 - } req;
 -
 - memset(req, 0, sizeof(req));
 -
 - nlh = req.n;
 - nlh-nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
 - nlh-nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
 - nlh-nlmsg_type = GENL_ID_CTRL;
 -
 - ghdr = NLMSG_DATA(req.n);
 - ghdr-cmd = CTRL_CMD_GETFAMILY;
 -
 - if (rtnl_open_byproto(rth, 0, NETLINK_GENERIC)  0) {
 - fprintf(stderr, Cannot open generic netlink socket\n);
 - exit(1);
 - }
 -
 - addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1);
 -
 - if (rtnl_talk(rth, nlh, 0, 0, nlh, NULL, NULL)  0) {
 - fprintf(stderr, Error talking to the kernel\n);
 - goto errout;
 - }
 -
 - {
 - struct rtattr *tb[CTRL_ATTR_MAX + 1];
 - struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
 - int len = nlh-nlmsg_len;
 - struct rtattr *attrs;
 -
 - if (nlh-nlmsg_type !=  GENL_ID_CTRL) {
 - fprintf(stderr, Not a controller message, nlmsg_len=%d 
 
 - nlmsg_type=0x%x\n, nlh-nlmsg_len, 
 nlh-nlmsg_type);
 - goto errout;
 - }
 -
 - if (ghdr-cmd != CTRL_CMD_NEWFAMILY) {
 - fprintf(stderr, Unkown controller command %d\n, 
 ghdr-cmd);
 - goto errout;
 - }
 -
 - len -= NLMSG_LENGTH(GENL_HDRLEN);
 -
 - if (len  0) {
 -  

Re: [net-2.6.24][NETNS][patch 3/3] fix bad macro definition

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:38:14 +0200

 From: Daniel Lezcano [EMAIL PROTECTED]
 
 The macro definition is bad. When calling next_net_device with 
 parameter name dev, the resulting code is:
 struct net_device *dev = dev and that leads to an unexpected
 behavior. Especially when llc_core is compiled in, the kernel panics
 at boot time.
 The patchset change macro definition with static inline functions as
 they were defined before.
 
 Signed-off-by: Benjamin Thery [EMAIL PROTECTED]
 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-2.6.24][NETNS][patch 1/3] fix export symbols

2007-09-12 Thread Daniel Lezcano

David Miller wrote:

From: [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:38:12 +0200


From: Daniel Lezcano [EMAIL PROTECTED]

Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig

Signed-off-by: Mark Nelson [EMAIL PROTECTED]
Acked-by: Benjamin Thery [EMAIL PROTECTED]


Applied to net-2.6.24, thanks.

Why aren't you signing off on these patches?  Please
do so in the future.

Because From:  usually means you are the patch author, and I can't
tell who wrote these patches, you or these other people listed in the
signoff area.



Sorry for that, I will take care of that next time. Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] [IPROUTE2] Introduce iplink_parse() routine

2007-09-12 Thread Eric W. Biederman
From: Pavel Emelyanov [EMAIL PROTECTED]
Date: Thu, 19 Jul 2007 13:32:31 +0400

This routine parses CLI attributes, describing generic link
parameters such as name, address, etc.

This is mostly copy-pasted from iplink_modify().

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
Acked-by: Patrick McHardy [EMAIL PROTECTED]
---
 include/utils.h |3 +
 ip/iplink.c |  127 +++---
 2 files changed, 76 insertions(+), 54 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index a3fd335..3fd851d 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -146,4 +146,7 @@ extern int cmdlineno;
 extern size_t getcmdline(char **line, size_t *len, FILE *in);
 extern int makeargs(char *line, char *argv[], int maxargs);
 
+struct iplink_req;
+int iplink_parse(int argc, char **argv, struct iplink_req *req,
+   char **name, char **type, char **link, char **dev);
 #endif /* __UTILS_H__ */
diff --git a/ip/iplink.c b/ip/iplink.c
index 4060845..64989b2 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -142,140 +142,159 @@ static int iplink_have_newlink(void)
 }
 #endif /* ! IPLINK_IOCTL_COMPAT */
 
-static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
+struct iplink_req {
+   struct nlmsghdr n;
+   struct ifinfomsgi;
+   charbuf[1024];
+};
+
+int iplink_parse(int argc, char **argv, struct iplink_req *req,
+   char **name, char **type, char **link, char **dev)
 {
+   int ret, len;
+   char abuf[32];
int qlen = -1;
int mtu = -1;
-   int len;
-   char abuf[32];
-   char *dev = NULL;
-   char *name = NULL;
-   char *link = NULL;
-   char *type = NULL;
-   struct link_util *lu = NULL;
-   struct {
-   struct nlmsghdr n;
-   struct ifinfomsgi;
-   charbuf[1024];
-   } req;
 
-   memset(req, 0, sizeof(req));
-
-   req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-   req.n.nlmsg_flags = NLM_F_REQUEST|flags;
-   req.n.nlmsg_type = cmd;
-   req.i.ifi_family = preferred_family;
+   ret = argc;
 
while (argc  0) {
if (strcmp(*argv, up) == 0) {
-   req.i.ifi_change |= IFF_UP;
-   req.i.ifi_flags |= IFF_UP;
+   req-i.ifi_change |= IFF_UP;
+   req-i.ifi_flags |= IFF_UP;
} else if (strcmp(*argv, down) == 0) {
-   req.i.ifi_change |= IFF_UP;
-   req.i.ifi_flags = ~IFF_UP;
+   req-i.ifi_change |= IFF_UP;
+   req-i.ifi_flags = ~IFF_UP;
} else if (strcmp(*argv, name) == 0) {
NEXT_ARG();
-   name = *argv;
+   *name = *argv;
} else if (matches(*argv, link) == 0) {
NEXT_ARG();
-   link = *argv;
+   *link = *argv;
} else if (matches(*argv, address) == 0) {
NEXT_ARG();
len = ll_addr_a2n(abuf, sizeof(abuf), *argv);
-   addattr_l(req.n, sizeof(req), IFLA_ADDRESS, abuf, len);
+   addattr_l(req-n, sizeof(*req), IFLA_ADDRESS, abuf, 
len);
} else if (matches(*argv, broadcast) == 0 ||
-  strcmp(*argv, brd) == 0) {
+   strcmp(*argv, brd) == 0) {
NEXT_ARG();
len = ll_addr_a2n(abuf, sizeof(abuf), *argv);
-   addattr_l(req.n, sizeof(req), IFLA_BROADCAST, abuf, 
len);
+   addattr_l(req-n, sizeof(*req), IFLA_BROADCAST, abuf, 
len);
} else if (matches(*argv, txqueuelen) == 0 ||
-  strcmp(*argv, qlen) == 0 ||
-  matches(*argv, txqlen) == 0) {
+   strcmp(*argv, qlen) == 0 ||
+   matches(*argv, txqlen) == 0) {
NEXT_ARG();
if (qlen != -1)
duparg(txqueuelen, *argv);
if (get_integer(qlen,  *argv, 0))
invarg(Invalid \txqueuelen\ value\n, *argv);
-   addattr_l(req.n, sizeof(req), IFLA_TXQLEN, qlen, 4);
+   addattr_l(req-n, sizeof(*req), IFLA_TXQLEN, qlen, 4);
} else if (strcmp(*argv, mtu) == 0) {
NEXT_ARG();
if (mtu != -1)
duparg(mtu, *argv);
if (get_integer(mtu, *argv, 0))
invarg(Invalid \mtu\ value\n, *argv);
-   addattr_l(req.n, sizeof(req), IFLA_MTU, mtu, 4);
+   

[PATCH 3/4] [IPROUTE2] Module for ip utility to support veth device

2007-09-12 Thread Eric W. Biederman
From: Pavel Emelyanov [EMAIL PROTECTED]
Date: Thu, 19 Jul 2007 13:33:56 +0400

The link_veth.so itself.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
Acked-by: Patrick McHardy [EMAIL PROTECTED]
---
 ip/Makefile|6 -
 ip/link_veth.c |   63 
 ip/veth.h  |   12 ++
 3 files changed, 80 insertions(+), 1 deletions(-)
 create mode 100644 ip/link_veth.c
 create mode 100644 ip/veth.h

diff --git a/ip/Makefile b/ip/Makefile
index 9a5bfe3..b46bce3 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -8,8 +8,9 @@ RTMONOBJ=rtmon.o
 ALLOBJ=$(IPOBJ) $(RTMONOBJ)
 SCRIPTS=ifcfg rtpr routel routef
 TARGETS=ip rtmon
+LIBS=link_veth.so
 
-all: $(TARGETS) $(SCRIPTS)
+all: $(TARGETS) $(SCRIPTS) $(LIBS)
 
 ip: $(IPOBJ) $(LIBNETLINK) $(LIBUTIL)
 
@@ -24,3 +25,6 @@ clean:
 
 LDLIBS += -ldl
 LDFLAGS+= -Wl,-export-dynamic
+
+%.so: %.c
+   $(CC) $(CFLAGS) -shared $ -o $@
diff --git a/ip/link_veth.c b/ip/link_veth.c
new file mode 100644
index 000..ded2cdd
--- /dev/null
+++ b/ip/link_veth.c
@@ -0,0 +1,63 @@
+/*
+ * link_veth.c veth driver module
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Pavel Emelianov [EMAIL PROTECTED]
+ *
+ */
+
+#include string.h
+
+#include utils.h
+#include ip_common.h
+#include veth.h
+
+#defineIFNAMSIZ16
+
+static void usage(void)
+{
+   printf(Usage: ip link add ... type veth 
+   [peer peer-name] [mac mac] [peer_mac mac]\n);
+}
+
+static int veth_parse_opt(struct link_util *lu, int argc, char **argv,
+   struct nlmsghdr *hdr)
+{
+   char *name, *type, *link, *dev;
+   int err, len;
+   struct rtattr * data;
+
+   if (strcmp(argv[0], peer) != 0) {
+   usage();
+   return -1;
+   }
+
+   data = NLMSG_TAIL(hdr);
+   addattr_l(hdr, 1024, VETH_INFO_PEER, NULL, 0);
+
+   hdr-nlmsg_len += sizeof(struct ifinfomsg);
+
+   err = iplink_parse(argc - 1, argv + 1, (struct iplink_req *)hdr,
+   name, type, link, dev);
+   if (err  0)
+   return err;
+
+   if (name) {
+   len = strlen(name) + 1;
+   if (len  IFNAMSIZ)
+   invarg(\name\ too long\n, *argv);
+   addattr_l(hdr, 1024, IFLA_IFNAME, name, len);
+   }
+
+   data-rta_len = (void *)NLMSG_TAIL(hdr) - (void *)data;
+   return argc - 1 - err;
+}
+
+struct link_util veth_link_util = {
+   .id = veth,
+   .parse_opt = veth_parse_opt,
+};
diff --git a/ip/veth.h b/ip/veth.h
new file mode 100644
index 000..aa2e6f9
--- /dev/null
+++ b/ip/veth.h
@@ -0,0 +1,12 @@
+#ifndef __NET_VETH_H__
+#define __NET_VETH_H__
+
+enum {
+   VETH_INFO_UNSPEC,
+   VETH_INFO_PEER,
+
+   __VETH_INFO_MAX
+#define VETH_INFO_MAX  (__VETH_INFO_MAX - 1)
+};
+
+#endif
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] [IPROUTE2] iproute2: link_veth support bug fixes.

2007-09-12 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED]
Date: Sat, 8 Sep 2007 10:17:43 -0600

This patch contains small compile and implementation
bug fixes for link_veth.c.

The compile fixes stop trying to build a shared object
when we can just as easily compile the code in.  Making
support of non arch/i386 architectures easier.

The documentation is fixed to not document the previous version
of the veth support.

The code is to initialize it's pointers before calling
iplink_parse, and we now set name = dev if name is not
passed.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 ip/Makefile|8 +++-
 ip/link_veth.c |   12 +---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/ip/Makefile b/ip/Makefile
index b46bce3..a98e1f3 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -3,14 +3,15 @@ IPOBJ=ip.o ipaddress.o iproute.o iprule.o \
 ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
 ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o
 
+IPOBJ += link_veth.o
+
 RTMONOBJ=rtmon.o
 
 ALLOBJ=$(IPOBJ) $(RTMONOBJ)
 SCRIPTS=ifcfg rtpr routel routef
 TARGETS=ip rtmon
-LIBS=link_veth.so
 
-all: $(TARGETS) $(SCRIPTS) $(LIBS)
+all: $(TARGETS) $(SCRIPTS)
 
 ip: $(IPOBJ) $(LIBNETLINK) $(LIBUTIL)
 
@@ -25,6 +26,3 @@ clean:
 
 LDLIBS += -ldl
 LDFLAGS+= -Wl,-export-dynamic
-
-%.so: %.c
-   $(CC) $(CFLAGS) -shared $ -o $@
diff --git a/ip/link_veth.c b/ip/link_veth.c
index ded2cdd..6f3931c 100644
--- a/ip/link_veth.c
+++ b/ip/link_veth.c
@@ -20,14 +20,16 @@
 
 static void usage(void)
 {
-   printf(Usage: ip link add ... type veth 
-   [peer peer-name] [mac mac] [peer_mac mac]\n);
+   printf(Usage: ip link add ... type veth peer { ... }\n);
 }
 
 static int veth_parse_opt(struct link_util *lu, int argc, char **argv,
struct nlmsghdr *hdr)
 {
-   char *name, *type, *link, *dev;
+   char *dev = NULL;
+   char *name = NULL;
+   char *link = NULL;
+   char *type = NULL;
int err, len;
struct rtattr * data;
 
@@ -46,6 +48,10 @@ static int veth_parse_opt(struct link_util *lu, int argc, 
char **argv,
if (err  0)
return err;
 
+   /* Allow ip link add dev and ip link add name */
+   if (!name)
+   name = dev;
+
if (name) {
len = strlen(name) + 1;
if (len  IFNAMSIZ)
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [IPROUTE2] Basic documentation for dynamic link creation/destruction.

2007-09-12 Thread Eric W. Biederman

This updates the usage to indicate that we have support link creation
and destruction in addition to just setting link parameters.

It's not really great documentation of the new netlink support
for link creations and removal but it is a start.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 ip/iplink.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 64989b2..541f3d6 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -38,7 +38,8 @@ static void usage(void) __attribute__((noreturn));
 
 void iplink_usage(void)
 {
-   fprintf(stderr, Usage: ip link set DEVICE { up | down |\n);
+   fprintf(stderr, Usage: ip link { set | add | replace | delete } DEVICE 
{\n);
+   fprintf(stderr, up | down |\n);
fprintf(stderr, arp { on | off } |\n);
fprintf(stderr, dynamic { on | off } |\n);
fprintf(stderr, multicast { on | off } 
|\n);
@@ -48,7 +49,9 @@ void iplink_usage(void)
fprintf(stderr, txqueuelen PACKETS |\n);
fprintf(stderr, name NEWNAME |\n);
fprintf(stderr, address LLADDR | broadcast 
LLADDR |\n);
-   fprintf(stderr, mtu MTU }\n);
+   fprintf(stderr, mtu MTU | \n);
+   fprintf(stderr, type TYPE [ TYPE specifc 
options]\n);
+   fprintf(stderr, }\n);
fprintf(stderr,ip link show [ DEVICE ]\n);
exit(-1);
 }
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [IPROUTE2] Add support for moving links between network namespaces

2007-09-12 Thread Eric W. Biederman

This adds support for setting the IFLA_NET_NS_PID attribute
on links allowing them to be moved between network namespaces.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 include/linux/if_link.h |1 +
 ip/iplink.c |9 +
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 23b3a8e..c948395 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -78,6 +78,7 @@ enum
IFLA_LINKMODE,
IFLA_LINKINFO,
 #define IFLA_LINKINFO IFLA_LINKINFO
+   IFLA_NET_NS_PID,
__IFLA_MAX
 };
 
diff --git a/ip/iplink.c b/ip/iplink.c
index 541f3d6..624c784 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -158,6 +158,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
char abuf[32];
int qlen = -1;
int mtu = -1;
+   pid_t netns_pid = -1;
 
ret = argc;
 
@@ -255,6 +256,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
} else
return on_off(dynamic);
 #endif
+   } else if (matches(*argv, netnspid) == 0) {
+   NEXT_ARG();
+   if (netns_pid != -1)
+   duparg(netnspid, *argv);
+   if (get_integer(netns_pid, *argv, 0))
+   invarg(Invalid \netnspid\ value\n, *argv);
+   addattr_l(req-n, sizeof(*req), IFLA_NET_NS_PID,
+   netns_pid, sizeof(netns_pid));
} else if (matches(*argv, type) == 0) {
NEXT_ARG();
*type = *argv;
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware

2007-09-12 Thread Eric W. Biederman
Pavel Emelyanov [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 Stephen it looks like you weren't cc'd on the latest version
 of the veth support.  So this patchset first reverts the old

 He was. The latest version looks completely different from what
 is reversed in this patch.

This is against the latest snapshot I could find.  My apologies
if I missed some of the communication.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm patch] really unexport do_softirq

2007-09-12 Thread David Miller
From: Adrian Bunk [EMAIL PROTECTED]
Date: Sun, 9 Sep 2007 22:25:40 +0200

 On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote:
 ...
  Changes since 2.6.23-rc3-mm1:
 ...
   git-net.patch
 ...
   git trees
 ...
 
 This hydra had more than one head...
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm patch] unexport raise_softirq_irqoff

2007-09-12 Thread David Miller
From: Christoph Hellwig [EMAIL PROTECTED]
Date: Sun, 9 Sep 2007 21:41:53 +0100

 On Sun, Sep 09, 2007 at 10:25:44PM +0200, Adrian Bunk wrote:
  On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote:
  ...
   Changes since 2.6.23-rc3-mm1:
  ...
git-net.patch
  ...
git trees
  ...
  
  raise_softirq_irqoff no longer has any modular user.
  
  Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 This should probably go in through Dave's tree as it's removing this
 rather annoying user.

Yep, I've just tossed it into my tree.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] make sctp_addto_param() static

2007-09-12 Thread David Miller
From: Adrian Bunk [EMAIL PROTECTED]
Date: Sun, 9 Sep 2007 22:25:50 +0200

 sctp_addto_param() can become static.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm patch] net/sctp/socket.c: make 3 variables static

2007-09-12 Thread David Miller
From: Adrian Bunk [EMAIL PROTECTED]
Date: Sun, 9 Sep 2007 22:25:54 +0200

 This patch makes the following needlessly globalvariables static:
 - sctp_memory_pressure
 - sctp_memory_allocated
 - sctp_sockets_allocated
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] veth: Cleanly handle a missing peer_tb argument on creation.

2007-09-12 Thread Eric W. Biederman

I was getting strange kernel crashes when attempting to
create veth devices when I did not specify a peer argument
to /bin/ip.

So this patch defaults peer_tb to all zeros and doesn't attempt to
reuse the netlink attributes for the primary link to create the
secondary link and now I can't reproduce the failures.

Given that some of the most interesting netlink attributes to specify
like a mac address or a network device name seem are generally
the wrong thing to do this seems like the right approach.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
---
 drivers/net/veth.c |   16 +++-
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9e6a746..d49bd2c 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -313,7 +313,7 @@ static int veth_newlink(struct net_device *dev,
struct net_device *peer;
struct veth_priv *priv;
char ifname[IFNAMSIZ];
-   struct nlattr *peer_tb[IFLA_MAX + 1], **tbp;
+   struct nlattr *peer_tb[IFLA_MAX + 1];
 
/*
 * create and register peer first
@@ -322,6 +322,7 @@ static int veth_newlink(struct net_device *dev,
 * skip it since no info from it is useful yet
 */
 
+   memset(peer_tb, 0, sizeof(peer_tb));
if (data != NULL  data[VETH_INFO_PEER] != NULL) {
struct nlattr *nla_peer;
 
@@ -336,21 +337,18 @@ static int veth_newlink(struct net_device *dev,
err = veth_validate(peer_tb, NULL);
if (err  0)
return err;
+   }
 
-   tbp = peer_tb;
-   } else
-   tbp = tb;
-
-   if (tbp[IFLA_IFNAME])
-   nla_strlcpy(ifname, tbp[IFLA_IFNAME], IFNAMSIZ);
+   if (peer_tb[IFLA_IFNAME])
+   nla_strlcpy(ifname, peer_tb[IFLA_IFNAME], IFNAMSIZ);
else
snprintf(ifname, IFNAMSIZ, DRV_NAME %%d);
 
-   peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, tbp);
+   peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, peer_tb);
if (IS_ERR(peer))
return PTR_ERR(peer);
 
-   if (tbp[IFLA_ADDRESS] == NULL)
+   if (peer_tb[IFLA_ADDRESS] == NULL)
random_ether_addr(peer-dev_addr);
 
err = register_netdevice(peer);
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm patch] make tcp_splice_data_recv() static

2007-09-12 Thread David Miller
From: Adrian Bunk [EMAIL PROTECTED]
Date: Sun, 9 Sep 2007 22:25:58 +0200

 On Fri, Aug 31, 2007 at 09:58:22PM -0700, Andrew Morton wrote:
 ...
  Changes since 2.6.23-rc3-mm1:
 ...
   git-block.patch
 ...
   git trees
 ...
 
 tcp_splice_data_recv() can become static.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

I'll let Jens or similar pick this one up since it
obviously won't apply to my tree.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new NAPI interface broken for POWER architecture?

2007-09-12 Thread Christoph Raisch


David Miller [EMAIL PROTECTED] wrote on 12.09.2007 14:50:04:

 From: Jan-Bernd Themann [EMAIL PROTECTED]
 Date: Fri, 7 Sep 2007 11:37:02 +0200

  2) On SMP systems: after netif_rx_complete has been called on CPU1
 (+interruts enabled), netif_rx_schedule could be called on CPU2
 (irq handler) before net_rx_action on CPU1 has checked
NAPI_STATE_SCHED.
 In that case the device would be added to poll lists of CPU1 and
CPU2
 as net_rx_action would see NAPI_STATE_SCHED set.
 This must not happen. It will be caught when netif_rx_complete is
 called the second time (BUG() called)
 
  This would mean we have a problem on all SMP machines right now.

 This is not a correct statement.

 Only on your platform do network device interrupts get moved
 around, no other platform does this.

 Sparc64 doesn't, all interrupts stay in one location after
 the cpu is initially choosen.

 x86 and x86_64 specifically do not move around network
 device interrupts, even though other device types do
 get dynamic IRQ cpu distribution.

 That's why you are the only person seeing this problem.

 I agree that it should be fixed, but we should also fix the IRQ
 distribution scheme used on powerpc platforms which is totally
 broken in these cases.

This is definitely not something we can change in the HEA device driver
alone.
It could also affect any other networking cards on POWER (e1000,s2io...).

Paul, Michael, Arndt, what is your opinion here?

Gruss / Regards
Christoph Raisch

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new NAPI interface broken for POWER architecture?

2007-09-12 Thread David Miller
From: Christoph Raisch [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 15:10:08 +0200

 This is definitely not something we can change in the HEA device driver
 alone.

And it shouldn't be, x86 implements the policy in irq balance
daemon, powerpc should do it wherever it would be appropriate
there.

 Paul, Michael, Arndt, what is your opinion here?

I'm all ears too :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


dscc4.c tests for #ifndef MODULE even though it must be modular

2007-09-12 Thread Robert P. J. Day

  from drivers/net/wan/dscc4.c:

=
#ifndef MODULE
static int __init dscc4_setup(char *str)
{
int *args[] = { debug, quartz, NULL }, **p = args;

while (*p  (get_option(str, *p) == 2))
p++;
return 1;
}

__setup(dscc4.setup=, dscc4_setup);
#endif
=

  but from drivers/net/wan/Kconfig:

...
config DSCC4
tristate Etinc PCISYNC serial board support
depends on HDLC  PCI  m
...

  if i read this correctly, doesn't the depends on of  m mean that
that Kconfig selection can be *at most* modular, so that that
preprocessor conditional can never be satisfied?  a quick test under
make menuconfig seems to confirm that.

  besides, the kernel parm being defined in that call to __setup()
really violates the spirit of defining kernel parms. :-)

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] veth: Cleanly handle a missing peer_tb argument on creation.

2007-09-12 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Wed, 12 Sep 2007 07:19:56 -0600

 
 I was getting strange kernel crashes when attempting to
 create veth devices when I did not specify a peer argument
 to /bin/ip.
 
 So this patch defaults peer_tb to all zeros and doesn't attempt to
 reuse the netlink attributes for the primary link to create the
 secondary link and now I can't reproduce the failures.
 
 Given that some of the most interesting netlink attributes to specify
 like a mac address or a network device name seem are generally
 the wrong thing to do this seems like the right approach.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

This looks mostly fine, can someone else who knows
veth a bit review this as well?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IPV4 : convert rt_check_expire() from softirq processing to workqueue

2007-09-12 Thread Eric Dumazet
On loaded/big hosts, rt_check_expire() if of litle use, because it
generally breaks out of its main loop because of a jiffies change.

It can take a long time (read : timer invocations) to actually
scan the whole hash table, freeing unused entries.

Converting it to use a workqueue instead of softirq is a nice
move because we can allow rt_check_expire() to do the scan
it is supposed to do, without hogging the CPU.

This has an impact on the average number of entries in cache, 
reducing ram usage. Cache is more responsive to parameter
changes (/proc/sys/net/ipv4/route/gc_timeout and
/proc/sys/net/ipv4/route/gc_interval)

Note: Maybe the default value of gc_interval (60 seconds)
is too high, since this means we actually need 5 (300/60)
invocations to scan the whole table.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 396c631..006d605 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -81,6 +81,7 @@
 #include linux/netdevice.h
 #include linux/proc_fs.h
 #include linux/init.h
+#include linux/workqueue.h
 #include linux/skbuff.h
 #include linux/inetdevice.h
 #include linux/igmp.h
@@ -136,7 +137,8 @@ static unsigned long rt_deadline;
 #define RTprint(a...)  printk(KERN_DEBUG a)
 
 static struct timer_list rt_flush_timer;
-static struct timer_list rt_periodic_timer;
+static void rt_check_expire(struct work_struct *work);
+static DECLARE_DELAYED_WORK(expires_work, rt_check_expire);
 static struct timer_list rt_secret_timer;
 
 /*
@@ -572,20 +574,19 @@ static inline int compare_keys(struct flowi *fl1, struct 
flowi *fl2)
(fl1-iif ^ fl2-iif)) == 0;
 }
 
-/* This runs via a timer and thus is always in BH context. */
-static void rt_check_expire(unsigned long dummy)
+static void rt_check_expire(struct work_struct *work)
 {
static unsigned int rover;
unsigned int i = rover, goal;
struct rtable *rth, **rthp;
-   unsigned long now = jiffies;
u64 mult;
 
mult = ((u64)ip_rt_gc_interval)  rt_hash_log;
if (ip_rt_gc_timeout  1)
do_div(mult, ip_rt_gc_timeout);
goal = (unsigned int)mult;
-   if (goal  rt_hash_mask) goal = rt_hash_mask + 1;
+   if (goal  rt_hash_mask)
+   goal = rt_hash_mask + 1;
for (; goal  0; goal--) {
unsigned long tmo = ip_rt_gc_timeout;
 
@@ -594,11 +595,11 @@ static void rt_check_expire(unsigned long dummy)
 
if (*rthp == 0)
continue;
-   spin_lock(rt_hash_lock_addr(i));
+   spin_lock_bh(rt_hash_lock_addr(i));
while ((rth = *rthp) != NULL) {
if (rth-u.dst.expires) {
/* Entry is expired even if it is in use */
-   if (time_before_eq(now, rth-u.dst.expires)) {
+   if (time_before_eq(jiffies, 
rth-u.dst.expires)) {
tmo = 1;
rthp = rth-u.dst.rt_next;
continue;
@@ -613,14 +614,10 @@ static void rt_check_expire(unsigned long dummy)
*rthp = rth-u.dst.rt_next;
rt_free(rth);
}
-   spin_unlock(rt_hash_lock_addr(i));
-
-   /* Fallback loop breaker. */
-   if (time_after(jiffies, now))
-   break;
+   spin_unlock_bh(rt_hash_lock_addr(i));
}
rover = i;
-   mod_timer(rt_periodic_timer, jiffies + ip_rt_gc_interval);
+   schedule_delayed_work(expires_work, ip_rt_gc_interval);
 }
 
 /* This can run from both BH and non-BH contexts, the latter
@@ -2993,17 +2990,14 @@ int __init ip_rt_init(void)
 
init_timer(rt_flush_timer);
rt_flush_timer.function = rt_run_flush;
-   init_timer(rt_periodic_timer);
-   rt_periodic_timer.function = rt_check_expire;
init_timer(rt_secret_timer);
rt_secret_timer.function = rt_secret_rebuild;
 
/* All the timers, started at system startup tend
   to synchronize. Perturb it a bit.
 */
-   rt_periodic_timer.expires = jiffies + net_random() % ip_rt_gc_interval +
-   ip_rt_gc_interval;
-   add_timer(rt_periodic_timer);
+   schedule_delayed_work(expires_work,
+   net_random() % ip_rt_gc_interval + ip_rt_gc_interval);
 
rt_secret_timer.expires = jiffies + net_random() % 
ip_rt_secret_interval +
ip_rt_secret_interval;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields
On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
 as already described old temporary sockets (client is gone) of lockd aren't
 closed after some time. So, with enough clients and some time gone, there
 are 80 open dangling sockets and you start getting messages of the form:
 
 lockd: too many open TCP sockets, consider increasing the number of nfsd 
 threads.

Thanks for working on this problem!

 If I understand the code then the intention was that the server closes
 temporary sockets after about 6 to 12 minutes:
 
   a timer is started which calls svc_age_temp_sockets every 6 minutes.
 
   svc_age_temp_sockets:
   if a socket is marked OLD it gets closed.
   sockets which are not marked as OLD are marked OLD
 
   every time the sockets receives something OLD is cleared.
 
 But svc_age_temp_sockets never closes any socket though because it only
 closes sockets with svsk-sk_inuse == 0. This seems to be a bug.
 
 Here is a patch against 2.6.22.6 which changes the test to
 svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine
 here. Unused sockets get closed (after 6 to 12 minutes)

So the fact that this changes the behavior means that sk_inuse is taking
on negative values.  This can't be right--how can something like
svc_sock_put() (which does an atomic_dec_and_test) work in that case?

I wish I had time today to figure out what's going on in this case.  But
from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
of anything without the stereotyped behavior--initializing to one,
atomic_inc()ing whenever someone takes a reference, and
atomic_dec_and_test()ing whenever someone drops it

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE2] Add support for moving links between network namespaces

2007-09-12 Thread Stephen Hemminger
On Wed, 12 Sep 2007 07:05:42 -0600
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 
 This adds support for setting the IFLA_NET_NS_PID attribute
 on links allowing them to be moved between network namespaces.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 ---
  include/linux/if_link.h |1 +
  ip/iplink.c |9 +
  2 files changed, 10 insertions(+), 0 deletions(-)

Please don't mix header file updates with command changes.
As a first step, I always install standard kernel santized headers.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [IPROUTE2] Revert Make ip utility veth driver aware

2007-09-12 Thread Eric W. Biederman
[EMAIL PROTECTED] (Eric W. Biederman) writes:

 Pavel Emelyanov [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 Stephen it looks like you weren't cc'd on the latest version
 of the veth support.  So this patchset first reverts the old

 He was. The latest version looks completely different from what
 is reversed in this patch.

 This is against the latest snapshot I could find.  My apologies
 if I missed some of the communication.

I was working against:
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

And the last I could find of the conversation about veth support was
in the thread announcing iproute-2-2.6.23-rc3, and Stephen Hemminger
asking for the latest version of the veth support to be sent on
Sept 1st.

So it is quite possible this has been resolved in private email,
and nothing public has been updated yet.

I just don't have a copy of anything newer, and I don't know where else
I would look for something newer.  So since I'm starting to use veth
I sent the patches I had to make it work.

The last round of veth support for iproute2 I could find was sent
on the 19th of July and David Miller, Patrick McHardy, and netdev
were copied but Stephen Hemminger wasn't.  Which is where my
assertion that Stephen hadn't been sent the latest version came from.

If you guys have already sorted this out and I just can't find the
code I'm overjoyed.  Otherwise the patches I sent should be enough
to get things sorted out, if I have figure out the current state of
confusion.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-12 Thread James Chapman

jamal wrote:

On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote:

On Fri, 07 Sep 2007, jamal wrote:



I am going to be the devil's advocate[1]:

So let me be the angel's advocate.  :-)


I think this would make you God's advocate ;-
(http://en.wikipedia.org/wiki/God%27s_advocate)

I view his results much more favorably.  


The challenge is, under _low traffic_: bad bad CPU use.
Thats what is at stake, correct?


By low traffic, I assume you mean a rate at which the NAPI driver 
doesn't stay in polled mode. The problem is that that rate is getting 
higher all the time, as interface and CPU speeds increase. This results 
in too many interrupts and NAPI thrashing in/out of polled mode very 
quickly.



Lets bury the stats for a sec ...


Yes please. We need an analysis of what happens to cpu usage, latency, 
pps etc when various factors are changed, e.g. input pps, NAPI busy-idle 
delay etc. The main purpose of my RFC wasn't to push a patch into the 
kernel right now, it was to highlight the issue and to find out if 
others were already working on it. The feedback has been good so far. I 
just need to find some time to do some testing. :)


People are bitching about NAPI abusing CPU, is the 
answer to abuse more CPU than NAPI?;-


Jamal, do you have more details? Are people saying NAPI gets too much of 
the CPU pie because they profiled it? Are they complaining that system 
behavior degrades too much under certain network traffic conditions? 
Mouse cursor movement jittery? Real-time apps such as music/video 
players starved of CPU? Is it possible they blame NAPI because they see 
tangible effects on their system, not because measured CPU usage is 
high? I say this because my music/video player and mouse cursor behave 
_much_ better with my NAPI changes during general use, despite the 
increase in measured cpu load. Even ftp can make my system's mouse 
cursor jitter...



The answer could be I am not solving that problem anymore - at least
thats what James is saying;-


I'm investigating whether the symptoms I describe above can be reduced 
or eliminated without resorting to hardware interrupt mitigation. 
Specifically, I want to do more testing on the idle polling scheme which 
seems to improve system behavior in my tests. This will involve more 
than doing a flood ping or two. :)



Sometimes there
are tradeoffs to be made to be decided by the user based on what's most
important to that user and his specific workload.  And the suggested
ethtool option (defaulting to current behavior) would enable the user
to make that decision.


And the challenge is:
What workload is willing to invest that much cpu for low traffic?
Can you name one? One that may come close is database benchmarks for
latency - but those folks wouldnt touch this with a mile-long pole if
you told them their cpu use is going to get worse than what NAPI (that
big bad CPU hog under low traffic) is giving them.


I agree with both of you. But we need more test results first to know 
whether it will be useful to offer NAPI idle polling as an _option_.


--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-12 Thread Stephen Hemminger
On Wed, 12 Sep 2007 14:50:01 +0100
James Chapman [EMAIL PROTECTED] wrote:

 jamal wrote:
  On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote:
  On Fri, 07 Sep 2007, jamal wrote:
  
  I am going to be the devil's advocate[1]:
  So let me be the angel's advocate.  :-)
  
  I think this would make you God's advocate ;-
  (http://en.wikipedia.org/wiki/God%27s_advocate)
  
  I view his results much more favorably.  
  
  The challenge is, under _low traffic_: bad bad CPU use.
  Thats what is at stake, correct?
 
 By low traffic, I assume you mean a rate at which the NAPI driver 
 doesn't stay in polled mode. The problem is that that rate is getting 
 higher all the time, as interface and CPU speeds increase. This results 
 in too many interrupts and NAPI thrashing in/out of polled mode very 
 quickly.

But if you compare this to non-NAPI driver the same softirq
overhead happens. The problem is that for many older devices disabling IRQ's
require an expensive non-cached PCI access. Smarter, newer devices
all use MSI which is pure edge triggered and with proper register
usage, NAPI should be no worse than non-NAPI.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE2] Add support for moving links between network namespaces

2007-09-12 Thread Eric W. Biederman
Stephen Hemminger [EMAIL PROTECTED] writes:

 On Wed, 12 Sep 2007 07:05:42 -0600
 [EMAIL PROTECTED] (Eric W. Biederman) wrote:

 
 This adds support for setting the IFLA_NET_NS_PID attribute
 on links allowing them to be moved between network namespaces.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 ---
  include/linux/if_link.h |1 +
  ip/iplink.c |9 +
  2 files changed, 10 insertions(+), 0 deletions(-)

 Please don't mix header file updates with command changes.
 As a first step, I always install standard kernel santized headers.

Sorry I didn't know.  Those changes are now in net-2.6.24
so installing sanitized headers should not change anything.

In please feel free to drop the if_link.h part, and if you want
I can resend that patch with those few lines deleted.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net-2.6.24 build problem

2007-09-12 Thread Stephen Hemminger
ERROR: xfrm_audit_state_delete [net/key/af_key.ko] undefined!
ERROR: xfrm_audit_state_add [net/key/af_key.ko] undefined!
ERROR: xfrm_audit_policy_add [net/key/af_key.ko] undefined!
ERROR: xfrm_audit_policy_delete [net/key/af_key.ko] undefined

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc5
# Wed Sep 12 15:12:02 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=-net
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
# CONFIG_TASK_DELAY_ACCT is not set
# CONFIG_TASK_XACCT is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
CONFIG_BLK_DEV_BSG=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUMM=y
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=m
CONFIG_X86_MCE_P4THERMAL=y
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_X86_REBOOTFIXUPS=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC000
CONFIG_HIGHMEM=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y

Re: [PATCH] [IPROUTE2] Add support for moving links between network namespaces

2007-09-12 Thread Stephen Hemminger
On Wed, 12 Sep 2007 08:06:08 -0600
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

 Stephen Hemminger [EMAIL PROTECTED] writes:
 
  On Wed, 12 Sep 2007 07:05:42 -0600
  [EMAIL PROTECTED] (Eric W. Biederman) wrote:
 
  
  This adds support for setting the IFLA_NET_NS_PID attribute
  on links allowing them to be moved between network namespaces.
  
  Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
  ---
   include/linux/if_link.h |1 +
   ip/iplink.c |9 +
   2 files changed, 10 insertions(+), 0 deletions(-)
 
  Please don't mix header file updates with command changes.
  As a first step, I always install standard kernel santized headers.
 
 Sorry I didn't know.  Those changes are now in net-2.6.24
 so installing sanitized headers should not change anything.
 
 In please feel free to drop the if_link.h part, and if you want
 I can resend that patch with those few lines deleted.
 
 Eric

I take care of fixing patches (as long as they aren't really damaged).
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Neil Brown
On Wednesday September 12, [EMAIL PROTECTED] wrote:
 Hello,
 
 as already described old temporary sockets (client is gone) of lockd aren't
 closed after some time. So, with enough clients and some time gone, there
 are 80 open dangling sockets and you start getting messages of the form:
 
 lockd: too many open TCP sockets, consider increasing the number of nfsd 
 threads.
 
 If I understand the code then the intention was that the server closes
 temporary sockets after about 6 to 12 minutes:
 
   a timer is started which calls svc_age_temp_sockets every 6 minutes.
 
   svc_age_temp_sockets:
   if a socket is marked OLD it gets closed.
   sockets which are not marked as OLD are marked OLD
 
   every time the sockets receives something OLD is cleared.
 
 But svc_age_temp_sockets never closes any socket though because it only
 closes sockets with svsk-sk_inuse == 0. This seems to be a bug.
 
 Here is a patch against 2.6.22.6 which changes the test to
 svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine
 here. Unused sockets get closed (after 6 to 12 minutes)
 
 Signed-off-by: Wolfgang Walter [EMAIL PROTECTED]
 
 --- ../linux-2.6.22.6/net/sunrpc/svcsock.c2007-08-27 18:10:14.0 
 +0200
 +++ net/sunrpc/svcsock.c  2007-09-11 11:07:13.0 +0200
 @@ -1572,7 +1575,7 @@
  
   if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
   continue;
 - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
 svsk-sk_flags))
 + if (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, 
 svsk-sk_flags))
   continue;
   atomic_inc(svsk-sk_inuse);
   list_move(le, to_be_aged);
 
 
 As svc_age_temp_sockets did not do anything before this change may trigger
 hidden bugs.
 
 To be true I don't see why this check
 
 (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags))
 
 is needed at all (it can only be an optimation) as this fields change after
 the check. In svc_tcp_accept there is no such check when a temporary socket
 is closed.

Thanks for looking into this.

I think the correct change is to test
if (atomic_read(svsk-sk_inuse)  1 || test_bit(SK_BUSY, 
svsk-sk_flags))
or even
if (atomic_read(svsk-sk_inuse) != 1 || test_bit(SK_BUSY, 
svsk-sk_flags))

sk_inuse contains a bias of '1' until SK_DEAD is set.  So a normal,
active socket will have an inuse count of 1 or more.  If it is exactly
1, then either it is SK_DEAD (in which case there is nothing for this
code to do), or it has no users, in which case it is appropriate to
close the socket if it is old.
Note that this test is for the socket should not be closed, so we
test if it is *not* 1, or  1.

The tests are needed because we don't want to close a socket that
might be inuse elsewhere.  The SK_BUSY bit combined with the sk_inuse
count combine to check if the socket is in use at all or not.

You change effectively disabled the test, as sk_inuse is never = 0
(except when SK_DEAD is set).

This bug has been present since 
commit aaf68cfbf2241d24d46583423f6bff5c47e088b3
Author: NeilBrown [EMAIL PROTECTED]
Date:   Thu Feb 8 14:20:30 2007 -0800

(i.e. it is my fault).
So it is in 2.6.21 and later and should probably go to .stable for .21
and .22.

Bruce:  for you :-)
---
Correctly close old nfsd/lockd sockets.

From: NeilBrown [EMAIL PROTECTED]

Commit aaf68cfbf2241d24d46583423f6bff5c47e088b3 added a bias
to sk_inuse, so this test for an unused socket now fails.  So no
sockets gets closed because they are old (they might get closed
if the client closed them).

This bug has existed since 2.6.21-rc1.

Thanks to Wolfgang Walter for finding and reporting the bug.


Cc: Wolfgang Walter [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./net/sunrpc/svcsock.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c  2007-09-12 16:05:23.0 +0200
+++ ./net/sunrpc/svcsock.c  2007-09-12 16:06:01.0 +0200
@@ -1592,7 +1592,8 @@ svc_age_temp_sockets(unsigned long closu
 
if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
continue;
-   if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
svsk-sk_flags))
+   if (atomic_read(svsk-sk_inuse)  1
+   || test_bit(SK_BUSY, svsk-sk_flags))
continue;
atomic_inc(svsk-sk_inuse);
list_move(le, to_be_aged);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] veth: Cleanly handle a missing peer_tb argument on creation.

2007-09-12 Thread Pavel Emelyanov
Eric W. Biederman wrote:
 I was getting strange kernel crashes when attempting to
 create veth devices when I did not specify a peer argument
 to /bin/ip.
 
 So this patch defaults peer_tb to all zeros and doesn't attempt to
 reuse the netlink attributes for the primary link to create the
 secondary link and now I can't reproduce the failures.
 
 Given that some of the most interesting netlink attributes to specify
 like a mac address or a network device name seem are generally
 the wrong thing to do this seems like the right approach.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 ---
  drivers/net/veth.c |   16 +++-
  1 files changed, 7 insertions(+), 9 deletions(-)
 
 diff --git a/drivers/net/veth.c b/drivers/net/veth.c
 index 9e6a746..d49bd2c 100644
 --- a/drivers/net/veth.c
 +++ b/drivers/net/veth.c
 @@ -313,7 +313,7 @@ static int veth_newlink(struct net_device *dev,
   struct net_device *peer;
   struct veth_priv *priv;
   char ifname[IFNAMSIZ];
 - struct nlattr *peer_tb[IFLA_MAX + 1], **tbp;
 + struct nlattr *peer_tb[IFLA_MAX + 1];
  
   /*
* create and register peer first
 @@ -322,6 +322,7 @@ static int veth_newlink(struct net_device *dev,
* skip it since no info from it is useful yet
*/
  
 + memset(peer_tb, 0, sizeof(peer_tb));
   if (data != NULL  data[VETH_INFO_PEER] != NULL) {
   struct nlattr *nla_peer;
  
 @@ -336,21 +337,18 @@ static int veth_newlink(struct net_device *dev,
   err = veth_validate(peer_tb, NULL);
   if (err  0)
   return err;
 + }
  
 - tbp = peer_tb;
 - } else
 - tbp = tb;

The intention of this part was to get the same parameters for
peer as for the first device if no peer argument was specified
for ip utility. Does it still work?

 -
 - if (tbp[IFLA_IFNAME])
 - nla_strlcpy(ifname, tbp[IFLA_IFNAME], IFNAMSIZ);
 + if (peer_tb[IFLA_IFNAME])
 + nla_strlcpy(ifname, peer_tb[IFLA_IFNAME], IFNAMSIZ);
   else
   snprintf(ifname, IFNAMSIZ, DRV_NAME %%d);
  
 - peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, tbp);
 + peer = rtnl_create_link(dev-nd_net, ifname, veth_link_ops, peer_tb);
   if (IS_ERR(peer))
   return PTR_ERR(peer);
  
 - if (tbp[IFLA_ADDRESS] == NULL)
 + if (peer_tb[IFLA_ADDRESS] == NULL)
   random_ether_addr(peer-dev_addr);
  
   err = register_netdevice(peer);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.24 build problem

2007-09-12 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 16:08:33 +0200

 ERROR: xfrm_audit_state_delete [net/key/af_key.ko] undefined!
 ERROR: xfrm_audit_state_add [net/key/af_key.ko] undefined!
 ERROR: xfrm_audit_policy_add [net/key/af_key.ko] undefined!
 ERROR: xfrm_audit_policy_delete [net/key/af_key.ko] undefined

I just checked in the following fix for this:

From 2c2d4ef06a1bdb25b721372ab63adde1523e34ec Mon Sep 17 00:00:00 2001
From: David S. Miller [EMAIL PROTECTED](none)
Date: Wed, 12 Sep 2007 16:17:36 +0200
Subject: [PATCH] [XFRM]: Add missing auditing symbol exports.

Signed-off-by: David S. Miller [EMAIL PROTECTED]
---
 net/xfrm/xfrm_policy.c |2 ++
 net/xfrm/xfrm_state.c  |2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index de0ff51..50682d3 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2341,6 +2341,7 @@ xfrm_audit_policy_add(struct xfrm_policy *xp, int result, 
u32 auid, u32 sid)
xfrm_audit_common_policyinfo(xp, audit_buf);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL_GPL(xfrm_audit_policy_add);
 
 void
 xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, u32 auid, u32 sid)
@@ -2357,6 +2358,7 @@ xfrm_audit_policy_delete(struct xfrm_policy *xp, int 
result, u32 auid, u32 sid)
xfrm_audit_common_policyinfo(xp, audit_buf);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL_GPL(xfrm_audit_policy_delete);
 #endif
 
 #ifdef CONFIG_XFRM_MIGRATE
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f64621c..15734ad 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1865,6 +1865,7 @@ xfrm_audit_state_add(struct xfrm_state *x, int result, 
u32 auid, u32 sid)
 (unsigned long)x-id.spi, (unsigned long)x-id.spi);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL_GPL(xfrm_audit_state_add);
 
 void
 xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 sid)
@@ -1883,4 +1884,5 @@ xfrm_audit_state_delete(struct xfrm_state *x, int result, 
u32 auid, u32 sid)
 (unsigned long)x-id.spi, (unsigned long)x-id.spi);
audit_log_end(audit_buf);
 }
+EXPORT_SYMBOL_GPL(xfrm_audit_state_delete);
 #endif /* CONFIG_AUDITSYSCALL */
-- 
1.5.2.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFT] sky2: receive hang check

2007-09-12 Thread Stephen Hemminger
Would some of the users of 2.6.23-rc5 or later who still experience
hangs please test this.
IT IS EXPERIMENTAL AND NOT TESTED YET.

I am sending it out to see if it detects anything.

--- a/drivers/net/sky2.c2007-09-12 14:52:18.0 +0200
+++ b/drivers/net/sky2.c2007-09-12 15:53:16.0 +0200
@@ -1304,6 +1304,7 @@ static int sky2_up(struct net_device *de
/* Register is number of 4K blocks on internal RAM buffer. */
ramsize = sky2_read8(hw, B2_E_0) * 4;
printk(KERN_INFO PFX %s: ram buffer %dK\n, dev-name, ramsize);
+   memset(sky2-check, 0, sizeof(sky2-check));
 
if (ramsize  0) {
u32 rxspace;
@@ -2446,11 +2447,42 @@ static void sky2_le_error(struct sky2_hw
sky2_write32(hw, Q_ADDR(q, Q_CSR), BMU_CLR_IRQ_CHK);
 }
 
-/* Check for lost IRQ once a second */
+static void sky2_rx_check(struct net_device *dev)
+{
+   struct sky2_port *sky2 = netdev_priv(dev);
+   struct sky2_hw *hw = sky2-hw;
+   unsigned port = sky2-port;
+   unsigned rxq = rxqaddr[port];
+   u32 mac_rp = sky2_read32(hw, SK_REG(port, RX_GMF_RP));
+   u8 mac_lev = sky2_read8(hw, SK_REG(port, RX_GMF_RLEV));
+   u8 fifo_rp = sky2_read8(hw, Q_ADDR(rxq, Q_RP));
+   u8 fifo_lev = sky2_read8(hw, Q_ADDR(rxq, Q_RL));
+
+   /* If not idle and MAC or PCI is stuck */
+   if (sky2-check.last != dev-last_rx 
+   ((mac_rp == sky2-check.mac_rp 
+mac_lev != 0  mac_lev = sky2-check.mac_lev) ||
+   /* Check if the PCI RX hang */
+   (fifo_rp == sky2-check.fifo_rp 
+fifo_lev != 0  fifo_lev = sky2-check.fifo_lev))) {
+
+   pr_info(PFX %s: receiver hang detected\n, dev-name);
+   schedule_work(hw-restart_work);
+   }
+
+   sky2-check.last = dev-last_rx;
+   sky2-check.mac_rp = mac_rp;
+   sky2-check.mac_lev = mac_lev;
+   sky2-check.fifo_rp = fifo_rp;
+   sky2-check.fifo_lev = fifo_lev;
+}
+
 static void sky2_watchdog(unsigned long arg)
 {
struct sky2_hw *hw = (struct sky2_hw *) arg;
+   int i;
 
+   /* Check for lost IRQ */
if (sky2_read32(hw, B0_ISRC)) {
struct net_device *dev = hw-dev[0];
 
@@ -2458,6 +2490,13 @@ static void sky2_watchdog(unsigned long 
__netif_rx_schedule(dev);
}
 
+   /* Check for stuck receiver */
+   if (sky2_read8(hw, B2_E_0) != 0)
+   for (i = 0; i  hw-ports; i++)
+   if (netif_running(hw-dev[i]))
+   sky2_rx_check(hw-dev[i]);
+
+
if (hw-active  0)
mod_timer(hw-watchdog_timer, round_jiffies(jiffies + HZ));
 }
--- a/drivers/net/sky2.h2007-09-05 14:21:59.0 +0200
+++ b/drivers/net/sky2.h2007-09-12 15:36:59.0 +0200
@@ -2017,6 +2017,14 @@ struct sky2_port {
u16  rx_tag;
struct vlan_group*vlgrp;
 #endif
+   struct {
+   unsigned long last;
+   u32 mac_rp;
+   u8  mac_lev;
+   u8  fifo_rp;
+   u8  fifo_lev;
+   } check;
+
 
dma_addr_t   rx_le_map;
dma_addr_t   tx_le_map;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields
On Wed, Sep 12, 2007 at 09:37:29AM -0400, bfields wrote:
 So the fact that this changes the behavior means that sk_inuse is taking
 on negative values.

Uh, no, I misread the tests, sorry.  I'm not awake.--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New NAPI API: Need for netif_napi_remove() ?!

2007-09-12 Thread David Miller
From: Kok, Auke [EMAIL PROTECTED]
Date: Mon, 10 Sep 2007 17:27:33 -0700

 hm, I spoke too soon, I think I can get by for now by just modifying 
 adapter-napi.poll when needed, and this would be clean enough for now. This 
 might change as I enable multiqueue in this driver later though.

Ok, let me know if things change.

The only reason it doesn't exist was the lack of any need.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][2/2] Add ICMPMsgStats MIB (RFC 4293)

2007-09-12 Thread David Miller
From: David Stevens [EMAIL PROTECTED]
Date: Tue, 11 Sep 2007 08:21:54 -0700

 So maybe it's not so bad -- I'll roll another per-interface version
 to see.

Let us know how it goes.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >