Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-30 Thread Lee Revell
On Thu, 2006-11-30 at 09:33 +, Christoph Hellwig wrote:
 On Wed, Nov 29, 2006 at 07:56:58PM -0600, Wenji Wu wrote:
  Yes, when CONFIG_PREEMPT is disabled, the problem won't happen. That is 
  why I put for 2.6 desktop, low-latency desktop in the uploaded paper. 
  This problem happens in the 2.6 Desktop and Low-latency Desktop.
 
 CONFIG_PREEMPT is only for people that are in for the feeling.  There is no
 real world advtantage to it and we should probably remove it again.

There certainly is a real world advantage for many applications.  Of
course it would be better if the latency requirements could be met
without kernel preemption but that's not the case now.

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 6735] New: network connection does not survive APM suspend and resume

2006-06-22 Thread Lee Revell
How in the heck did I get on the CC list for this? ;-)

Lee

On Thu, 2006-06-22 at 16:54 -0700, Andrew Morton wrote:
 [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=6735
  
 Summary: network connection does not survive APM suspend and
  resume
 
  ...
 
  Steps to reproduce:Push button to suspend ( APM ), push button again to 
  resume, 
  network connection is lost.
  
  Problem Description:The network connection will not survive suspend and 
  resume.
  Works correctly on 2.6.16.1. /etc/rc.d/init.d/network restart will 
  restore 
  the network. I tried compiling with the file via-rhine.c from 2.6.16.1 but 
  it 
  still fails. No error messages.
 
 This is a post-2.6.16 regression.
 
 It's probably unrelated to the device driver itself.
 
 Can anyone suggest where we should be looking?
 
 Thanks.
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] myri10ge - First half of the driver

2006-05-15 Thread Lee Revell
On Fri, 2006-05-12 at 01:53 +0200, Brice Goglin wrote:
 Francois Romieu wrote:
 
  +  spin_lock(mgp-cmd_lock);
  +  response-result = 0x;
  +  mb();
  +  myri10ge_pio_copy((void __iomem *) cmd_addr, buf, sizeof (*buf));
  +
  +  /* wait up to 2 seconds */
  
 
  You must not hold a spinlock for up to 2 seconds.

 
 We are working on reducing the delay to about 15ms. It only occurs when
 the driver is loaded or the link brought up.

I think 15ms is quite a long time to hold a spinlock also - most
spinlocks in the kernel are held for less than 500 microseconds.

Can't you use a mutex?

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16, sk98lin out of date

2006-02-13 Thread Lee Revell
On Mon, 2006-02-13 at 12:06 +0100, Mws wrote:
 hi,
 as i do have the same problem i may help you out.
 
 at first, syskonnect did send their kernel diffs/patches but they
 we're rejected caused
 by coding style, indention and some people thinking that things can be
 done better. 

Haha, they didn't like the LKML code review so they just stopped sending
patches?  Classic.  Remind me not to buy their gear.

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16, sk98lin out of date

2006-02-13 Thread Lee Revell
On Mon, 2006-02-13 at 21:34 +0100, Willy Tarreau wrote:
 On Mon, Feb 13, 2006 at 02:03:14PM -0500, Lee Revell wrote:
  On Mon, 2006-02-13 at 12:06 +0100, Mws wrote:
   hi,
   as i do have the same problem i may help you out.
   
   at first, syskonnect did send their kernel diffs/patches but they
   we're rejected caused
   by coding style, indention and some people thinking that things can be
   done better. 
  
  Haha, they didn't like the LKML code review so they just stopped sending
  patches?  Classic.  Remind me not to buy their gear.
 
 Lee, it's not always that simple. When you submit one driver, sometimes
 reviewers tell you that for whatever reason your driver's structure is
 wrong and it has to be changed a lot (and sometimes they're right of
 course). But when you don't have enough ressource to do the job twice,
 the best you can do is to maintain it out of tree, which is already a
 pain. I'm not saying that it is what happened with their driver, I don't
 know the history. However, I found your reaction somewhat hasty. I
 personally would prefer to offer time and help before deciding that
 I don't want anyone's products on this basis. It's not as if they
 did not release their driver's source!

True, that was a little harsh.  I just find it a little sad that all
these vendors insist on throwing away months of work rather than simply
researching what the linux kernel coding standards are ahead of time.

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC: 2.6 patch] CONFIG_FORCEDETH updates

2006-02-12 Thread Lee Revell
On Sun, 2006-02-12 at 18:52 +0100, Adrian Bunk wrote:
 This patch contains the following possible updates:
 - let FORCEDETH no longer depend on EXPERIMENTAL
 - remove the Reverse Engineered from the option text:
   for the user it's important which hardware the driver supports, not
   how it was developed

Is this driver as stable as one that was developed with proper
documentation?  I prefer to know that something as elementary as a fast
ethernet controller had to be reverse engineered so I can avoid
supporting a vendor so hostile to Linux.

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU latency regression in 2.6.16-rc1

2006-01-25 Thread Lee Revell
On Wed, 2006-01-25 at 23:56 +0100, Ingo Molnar wrote:
 
 yes, that would be a nice test. (I'm busy now with mutex stuff to be 
 able to do a working softirq-preemption patch, but i sent you my
 current patches off-list - if you want to give it a shot. Be warned
 though, there will likely be quite some merging work to do, so it's
 definitely not for the faint hearted.)
 

OK, I probably won't have time to test it this week either.

In the meantime can anyone explain briefly why such a heavy fix is
needed?  It seems like it would be simpler to make the route cache
flushing operate in batches of 100 routes, rather than invalidating the
whole thing in one shot.  This does seem to be the only softirq that
regularly runs for much more than 1ms.

Would this require major surgery on the networking subsystem?

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: My vote against eepro* removal

2006-01-20 Thread Lee Revell
On Fri, 2006-01-20 at 12:55 +0300, Evgeniy Polyakov wrote:
  Analysis of e100:
  * If I comment out the whole body of e100_watchdog except for the
timer re-registration, the delays are gone (so it is really the
body of e100_watchdog). However, this makes eth0 non-functional.
  * Commenting out parts of it, I found out that most of the time
goes into its first half: The code from mii_ethtool_gset to
mii_check_link (including) makes the big difference, as far as
I can tell especially mii_ethtool_gset.
 
 Each MDIO read can take upto 2 msecs (!) and at least 20 usecs in
 e100,
 and this runs in timer handler.
 Concider attaching (only compile tested) patch which moves e100
 watchdog
 into workqueue. 

Seems like the important question is, why does e100 need a watchdog if
eepro100 works fine without one?  Isn't the point of a watchdog in this
context to work around other bugs in the driver (or the hardware)?

Lee



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: My vote against eepro* removal

2006-01-20 Thread Lee Revell
On Fri, 2006-01-20 at 11:19 +0100, kus Kusche Klaus wrote:
 For a non-full preemption kernel, your patch moves the 500 us 
 piece of code from kernel to thread context, so it really 
 improves things. But is 500 us something to worry about in a
 non-full preemption kernel? 

Yes, absolutely.  Once exit_mmap (a latency regression which was
introduced in 2.6.14) and rt_run_flush/rt_garbage_collect (which have
always been problematic) are fixed, 500usecs will stick out like a sore
thumb even on a regular PREEMPT kernel.

Also, you should be able to capture this latency in /proc/latency trace
by configuring an -rt kernel with PREEMPT_DESKTOP and hard/softirq
preemption disabled.

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: My vote against eepro* removal

2006-01-20 Thread Lee Revell
On Fri, 2006-01-20 at 17:19 -0800, John Ronciak wrote:
 On 1/20/06, Lee Revell [EMAIL PROTECTED] wrote:
  Seems like the important question is, why does e100 need a watchdog if
  eepro100 works fine without one?  Isn't the point of a watchdog in this
  context to work around other bugs in the driver (or the hardware)?
 There are a number of things that the watchdog in e100 does.  It
 checks link (up, down), reads the hardware stats, adjusts the adaptive
 IFS and checks to 3 known hang conditions based on certain types of
 the hardware.  You might be able to get around without doing the
 work-arounds (as long as you don't' see hangs happening with the
 hardware being used) but the checking of the link and the stats are
 probably needed.

Why don't these cause excessive scheduling delays in eepro100 then?
Can't we just copy the eepro100 behavior?

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: My vote against eepro* removal

2006-01-20 Thread Lee Revell
On Fri, 2006-01-20 at 18:01 -0800, John Ronciak wrote:
 There is a timer routine in the eepro100 driver which does the check
 for link as well as a check for on of the hang conditions (with
 work-around).  It does the check for link in a different way than
 e100.  e100 uses mii call where eepro100 does it manually.  Another
 difference is that eepro100 doesn't get stats unless called by the
 system.  It's not in the timer routine at all.
 
 Can we try a couple of things? 1) just comment out all the check for
 link code in the e100 driver and give that a try and 2) just comment
 out the update stats call and see if that works.  These seem to be the
 differences and we need to know which one is causing the problem.

Heh, FWIW, Microsoft found this exact same bug with 2 different chipsets
in their latency testing (see section 4.4):

http://research.microsoft.com/~mbj/papers/tr-98-29.html

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] networking ipv4: remove total socket usage count from /proc/net/sockstat

2006-01-16 Thread Lee Revell
On Mon, 2006-01-16 at 15:04 -0500, Andy Gospodarek wrote:
 Printing the total number of sockets used in /proc/net/sockstat is out
 of place in a file that is supposed to contain information related to
 ipv4 sockets.  Removed output for total socket usage.
 

Um, you can't do that, it will break userspace.

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Lee Revell
On Fri, 2006-01-06 at 13:58 +0100, Andi Kleen wrote:
 Another CPU might be stuck in a long 
 running interrupt

Shouldn't a long running interrupt be considered a bug?

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Lee Revell
On Fri, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote:
 I have some servers that once in a while crashes when the ip route
 cache is flushed. After
 raising /proc/sys/net/ipv4/route/secret_interval (so that *no* 
 flush is done), I got better uptime for these servers. 

Argh, where is that documented?  I have been banging my head against
this for weeks - how do I keep the kernel from flushing 4096 routes at
once in softirq context causing huge (~8-20ms) latency problems?

I tried all the route related sysctls I could find and nothing worked...

Lee

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] latency tracer, 2.6.15-rc7

2006-01-01 Thread Lee Revell
On Fri, 2005-12-30 at 20:15 -0500, Lee Revell wrote: 
 On Fri, 2005-12-30 at 17:02 -0800, Linus Torvalds wrote:
  
  On Fri, 30 Dec 2005, Lee Revell wrote:
   
   It seems that the networking code's use of RCU can cause 10ms+
   latencies:
  
  Hmm. Is there a big jump at the 10ms mark? Do you have a 100Hz timer 
  source? 
  
  A latency jump at 10ms would tend to indicate a missed wakeup that 
  was picked up by the next timer tick.
 
 No there are no large jumps, it really seems that this was the network
 code causing an RCU callback to drop ~2K routes at once.  Specifically
 RCU invokes dst_rcu_free 2085 times in a single batch
 (call_rcu_bh(rt-u.dst.rcu_head, dst_rcu_free) is only called from
 rt_free() and rt_drop()).
 
 I have found that many of the paths in the network stack that can cause
 high latencies can be tuned via sysctls (for example
 net.ipv4.route.gc_thresh); this one may be the same.

On a related topic:

I thought I had solved the route cache flushing problem by tuning these
sysctls, but it does not seems to help.

The short version is that rt_run_flush and rt_garbage_collect can cause
15ms+ latencies when a lot of routes (up to 4096 it seems) are flushed
in one go:

$ grep local_bh_enable (rt_run_flush) /proc/latency_trace | wc -l
4096

$ grep local_bh_enable (rt_garbage_collect) /proc/latency_trace | wc
-l
4096

(rt_run_flush and rt_garbage_collect call spin_lock_bh/spin_unlock_bh
once for each flushed route so the above grep effectively counts the
number of routes flushed at once)

I reported this a year or so ago and it led to Ingo adding an option to
-rt (then called the voluntary preempt patch) to always run softirqs in
threads which makes rt_run_flush preemptible.

Anyway I thought I could work around this with mainline by tuning the
network stack to minimize the effect of route cache flushing on
scheduler latency using these sysctls to cause the route cache to be
flushed more often and/or limit the maximum size of the route cache:

$ sudo /sbin/sysctl -a | grep route
net.ipv4.route.gc_elasticity = 8
net.ipv4.route.gc_interval = 60
net.ipv4.route.gc_timeout = 300
net.ipv4.route.gc_min_interval_ms = 20
net.ipv4.route.gc_min_interval = 0
net.ipv4.route.max_size = 65536
net.ipv4.route.gc_thresh = 256
net.ipv4.route.max_delay = 10
net.ipv4.route.min_delay = 2

I tried lowering gc_min_interval_ms, gc_timeout, max_size, and gc_thresh
but rt_run_flush will still process up to 4096 (the size of the route
hash table?) routes at once.

(stop here if you don't care to interpret long latency_trace reports)

17ms+ latency caused by rt_run_flush then rt_garbage_collect processing
4096 routes:

preemption latency trace v1.1.5 on 2.6.15-rc7

 latency: 17286 us, #19154/19154, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0)
-  
| task: gtk-gnutella-8581 (uid:1000 nice:0 policy:0 rt_prio:0)
-  

 _--= CPU#
/ _-= irqs-off
   | / _= need-resched
   || / _---= hardirq/softirq 
   ||| / _--= preempt-depth   
    /  
   | delay 
   cmd pid | time  |   caller  
  \   /|   \   |   /   
epiphany-8742  0dns80us : __trace_start_sched_wakeup (try_to_wake_up)
epiphany-8742  0dns81us : __trace_start_sched_wakeup ...-8581 (73 0)
epiphany-8742  0dns72us : preempt_schedule (__trace_start_sched_wakeup)
epiphany-8742  0dns62us : preempt_schedule (try_to_wake_up)
epiphany-8742  0dns53us : preempt_schedule (__wake_up)
epiphany-8742  0dns54us : preempt_schedule (ep_poll_safewake)
epiphany-8742  0dnH56us : do_IRQ (c011223e b 0)
epiphany-8742  0d.h.6us : __do_IRQ (do_IRQ)
epiphany-8742  0d.h17us+: mask_and_ack_8259A (__do_IRQ)
epiphany-8742  0d.h.   12us : handle_IRQ_event (__do_IRQ)
epiphany-8742  0d.h.   13us : usb_hcd_irq (handle_IRQ_event)
epiphany-8742  0d.h.   13us : uhci_irq (usb_hcd_irq)
epiphany-8742  0d.h.   14us : via_driver_irq_handler (handle_IRQ_event)
epiphany-8742  0d.h.   16us : rhine_interrupt (handle_IRQ_event)
epiphany-8742  0d.h.   16us : ioread16 (rhine_interrupt)
epiphany-8742  0d.h.   17us : ioread8 (rhine_interrupt)
epiphany-8742  0d.h.   18us : iowrite16 (rhine_interrupt)
epiphany-8742  0d.h.   19us : ioread8 (rhine_interrupt)
epiphany-8742  0d.h.   20us : rhine_tx (rhine_interrupt)
epiphany-8742  0d.h1   21us : raise_softirq_irqoff (rhine_tx)
epiphany-8742  0d.h.   22us : ioread16 (rhine_interrupt)
epiphany-8742  0d.h.   23us : ioread8 (rhine_interrupt)
epiphany-8742  0d.h1   25us : note_interrupt (__do_IRQ)
epiphany-8742  0d.h1   25us : end_8259A_irq (__do_IRQ)
epiphany-8742  0d.h1   26us : enable_8259A_irq (end_8259A_irq)
epiphany-8742  0dnH5   28us : irq_exit (do_IRQ)
epiphany-8742  0dns5   28us  (2097760)
epiphany-8742  0dns4   29us : preempt_schedule