Re: posible latency issues in seq_read
Eric Dumazet wrote: The problem is in established_get_next() and established_get_first() not allowing softirq processing, while scanning a possibly huge hash table, even if few sockets are hashed in. As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to check the diffs between linux-2.6.10 & linux-2.6.11 Thanks for the pointers to the likely culprits. Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: posible latency issues in seq_read
Chris Friesen a écrit : Lee Revell wrote: On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote: We've run into an issue (on 2.6.10) where calling "lsof" triggers lost packets on our server. Preempt is disabled, and NAPI is enabled. Can you reproduce with a recent kernel? Lots of latency issues have been fixed since then. Unfortunately I have to fix it on this version (the bug was found on shipped product), so if there was a difference I'd have to isolate the changes and backport them. Also, I can't run the software that triggers the problem on a newer kernel as it has dependencies on various patches that are not in mainline. Basically what I'd like to know is whether calling schedule() in seq_read() is safe or whether it would break assumptions made by seq_file users. It wont help much. seq_read() is fine in itself. The problem is in established_get_next() and established_get_first() not allowing softirq processing, while scanning a possibly huge hash table, even if few sockets are hashed in. As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to check the diffs between linux-2.6.10 & linux-2.6.11 files : include/linux/sched.h net/core/sock.c (__release_sock() latency) net/ipv4/tcp_ipv4.c (/proc/net/tcp latency) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: posible latency issues in seq_read
Lee Revell wrote: On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote: We've run into an issue (on 2.6.10) where calling "lsof" triggers lost packets on our server. Preempt is disabled, and NAPI is enabled. Can you reproduce with a recent kernel? Lots of latency issues have been fixed since then. Unfortunately I have to fix it on this version (the bug was found on shipped product), so if there was a difference I'd have to isolate the changes and backport them. Also, I can't run the software that triggers the problem on a newer kernel as it has dependencies on various patches that are not in mainline. Basically what I'd like to know is whether calling schedule() in seq_read() is safe or whether it would break assumptions made by seq_file users. Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: posible latency issues in seq_read
On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote: We've run into an issue (on 2.6.10) where calling "lsof" triggers lost packets on our server. Preempt is disabled, and NAPI is enabled. Can you reproduce with a recent kernel? Lots of latency issues have been fixed since then. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
posible latency issues in seq_read
We've run into an issue (on 2.6.10) where calling "lsof" triggers lost packets on our server. Preempt is disabled, and NAPI is enabled. It appears that for some reason the networking softirq is not being handled in a timely fashion, which means that the rx ring buffer fills up and packets overflow. It appears that the problem path is: seq_read tcp_seq_next established_get_next read_lock/read_unlock The issue appears to be related to the amount of time that this syscall takes. While we're in the syscall we cannot run the softirqd thread, and so the rx buffer is not being cleaned. The fact that there are kmalloc(GFP_KERNEL) calls in seq_read() seems to indicate that sleeping is safe, so would it be reasonable to call schedule() periodically (maybe based on elapsed time) to ensure that system latency is kept under control? Thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/