Re: posible latency issues in seq_read

2007-07-23 Thread Chris Friesen

Eric Dumazet wrote:

The problem is in established_get_next() and established_get_first() not 
allowing softirq processing, while scanning a possibly huge hash table, 
even if few sockets are hashed in.


As cond_resched_softirq() was added in linux-2.6.11, you probably *need* 
to check the diffs between linux-2.6.10 & linux-2.6.11


Thanks for the pointers to the likely culprits.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: posible latency issues in seq_read

2007-07-20 Thread Eric Dumazet

Chris Friesen a écrit :

Lee Revell wrote:

On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote:



We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
packets on our server.  Preempt is disabled, and NAPI is enabled.



Can you reproduce with a recent kernel?  Lots of latency issues have
been fixed since then.


Unfortunately I have to fix it on this version (the bug was found on 
shipped product), so if there was a difference I'd have to isolate the 
changes and backport them.  Also, I can't run the software that triggers 
the problem on a newer kernel as it has dependencies on various patches 
that are not in mainline.


Basically what I'd like to know is whether calling schedule() in 
seq_read() is safe or whether it would break assumptions made by 
seq_file users.




It wont help much. seq_read() is fine in itself.

The problem is in established_get_next() and established_get_first() not 
allowing softirq processing, while scanning a possibly huge hash table, even 
if few sockets are hashed in.


As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to 
check the diffs between linux-2.6.10 & linux-2.6.11


files :

include/linux/sched.h
net/core/sock.c  (__release_sock() latency)
net/ipv4/tcp_ipv4.c  (/proc/net/tcp latency)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: posible latency issues in seq_read

2007-07-20 Thread Chris Friesen

Lee Revell wrote:

On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote:



We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
packets on our server.  Preempt is disabled, and NAPI is enabled.



Can you reproduce with a recent kernel?  Lots of latency issues have
been fixed since then.


Unfortunately I have to fix it on this version (the bug was found on 
shipped product), so if there was a difference I'd have to isolate the 
changes and backport them.  Also, I can't run the software that triggers 
the problem on a newer kernel as it has dependencies on various patches 
that are not in mainline.


Basically what I'd like to know is whether calling schedule() in 
seq_read() is safe or whether it would break assumptions made by 
seq_file users.


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: posible latency issues in seq_read

2007-07-20 Thread Lee Revell

On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote:


We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
packets on our server.  Preempt is disabled, and NAPI is enabled.



Can you reproduce with a recent kernel?  Lots of latency issues have
been fixed since then.

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


posible latency issues in seq_read

2007-07-20 Thread Chris Friesen


We've run into an issue (on 2.6.10) where calling "lsof" triggers lost 
packets on our server.  Preempt is disabled, and NAPI is enabled.


It appears that for some reason the networking softirq is not being 
handled in a timely fashion, which means that the rx ring buffer fills 
up and packets overflow.


It appears that the problem path is:

seq_read
tcp_seq_next
established_get_next
read_lock/read_unlock

The issue appears to be related to the amount of time that this syscall 
takes.  While we're in the syscall we cannot run the softirqd thread, 
and so the rx buffer is not being cleaned.


The fact that there are kmalloc(GFP_KERNEL) calls in seq_read() seems to 
indicate that sleeping is safe, so would it be reasonable to call 
schedule() periodically (maybe based on elapsed time) to ensure that 
system latency is kept under control?


Thanks,

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/