from:"Chris Friesen"

Re: where is the ethernet device number determined?

2016-05-12 Thread Chris Friesen


On 05/12/2016 02:19 PM, Cong Wang wrote:

On Thu, May 12, 2016 at 1:05 PM, Chris Friesen <cbf...@mail.usask.ca> wrote:



I hope this is a simple question...with legacy naming ethernet devices are
named ethX.  Where is that X determined?  I've been looking in
alloc_netdev_mqs() and friends, but haven't found it yet.


__dev_alloc_name()


Much appreciated.

Chris

where is the ethernet device number determined?

2016-05-12 Thread Chris Friesen



Hi,

I hope this is a simple question...with legacy naming ethernet devices are named 
ethX.  Where is that X determined?  I've been looking in alloc_netdev_mqs() and 
friends, but haven't found it yet.


Thanks,
Chris

[PATCH v4] route: do not cache fib route info on local routes with oif

2016-04-08 Thread Chris Friesen

For local routes that require a particular output interface we do not want
to cache the result.  Caching the result causes incorrect behaviour when
there are multiple source addresses on the interface.  The end result
being that if the intended recipient is waiting on that interface for the
packet he won't receive it because it will be delivered on the loopback
interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
interface as well.

This can be tested by running a program such as "dhcp_release" which
attempts to inject a packet on a particular interface so that it is
received by another program on the same board.  The receiving process
should see an IP_PKTINFO ipi_ifndex value of the source interface
(e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
will still appear on the loopback interface in tcpdump but the important
aspect is that the CMSG info is correct.

Sample dhcp_release command line:

   dhcp_release eth1 192.168.204.222 02:11:33:22:44:66

Signed-off-by: Allain Legacy <allain.leg...@windriver.com>
Signed off-by: Chris Friesen <chris.frie...@windriver.com>
---
 net/ipv4/route.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..b050cf9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,18 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
 */
if (fi && res->prefixlen < 4)
fi = NULL;
+   } else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
+  (orig_oif != dev_out->ifindex)) {
+   /* For local routes that require a particular output interface
+* we do not want to cache the result.  Caching the result
+* causes incorrect behaviour when there are multiple source
+* addresses on the interface, the end result being that if the
+* intended recipient is waiting on that interface for the
+* packet he won't receive it because it will be delivered on
+* the loopback interface and the IP_PKTINFO ipi_ifindex will
+* be set to the loopback interface as well.
+*/
+   fi = NULL;
}
 
fnhe = NULL;

Re: [PATCH v2] route: do not cache fib route info on local routes with oif

2016-04-08 Thread Chris Friesen


On 04/08/2016 01:14 PM, Julian Anastasov wrote:


Your patch is corrupted. I was in the same trap
some time ago but with different client:

 From Documentation/email-clients.txt:

Don't send patches with "format=flowed".  This can cause unexpected
and unwanted line breaks.

Anyways, the change looks good to me and I'll add my
Reviewed-by tag the next time.



Doh...forgot to turn off word wrapping.  New patch coming.

Chris

[PATCH v3] route: do not cache fib route info on local routes with oif

2016-04-08 Thread Chris Friesen

For local routes that require a particular output interface we do not want to
cache the result.  Caching the result causes incorrect behaviour when there are
multiple source addresses on the interface.  The end result being that if the
intended recipient is waiting on that interface for the packet he won't receive
it because it will be delivered on the loopback interface and the IP_PKTINFO
ipi_ifindex will be set to the loopback interface as well.

This can be tested by running a program such as "dhcp_release" which attempts
to inject a packet on a particular interface so that it is received by another
program on the same board.  The receiving process should see an IP_PKTINFO
ipi_ifndex value of the source interface (e.g., eth1) instead of the loopback
interface (e.g., lo).  The packet will still appear on the loopback interface
in tcpdump but the important aspect is that the CMSG info is correct.

Sample dhcp_release command line:

   dhcp_release eth1 192.168.204.222 02:11:33:22:44:66

Signed-off-by: Allain Legacy <allain.leg...@windriver.com>
Signed off-by: Chris Friesen <chris.frie...@windriver.com>
---
 net/ipv4/route.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..437a377 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,18 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
 */
if (fi && res->prefixlen < 4)
fi = NULL;
+   } else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
+  (orig_oif != dev_out->ifindex)) {
+   /* For local routes that require a particular output interface
+ * we do not want to cache the result.  Caching the result
+ * causes incorrect behaviour when there are multiple source
+ * addresses on the interface, the end result being that if the
+ * intended recipient is waiting on that interface for the
+ * packet he won't receive it because it will be delivered on
+ * the loopback interface and the IP_PKTINFO ipi_ifindex will
+ * be set to the loopback interface as well.
+*/
+   fi = NULL;
}
 
fnhe = NULL;

[PATCH v2] route: do not cache fib route info on local routes with oif

2016-04-08 Thread Chris Friesen


For local routes that require a particular output interface we do not want to
cache the result.  Caching the result causes incorrect behaviour when there are
multiple source addresses on the interface.  The end result being that if the
intended recipient is waiting on that interface for the packet he won't receive
it because it will be delivered on the loopback interface and the IP_PKTINFO
ipi_ifindex will be set to the loopback interface as well.

This can be tested by running a program such as "dhcp_release" which attempts
to inject a packet on a particular interface so that it is received by another
program on the same board.  The receiving process should see an IP_PKTINFO
ipi_ifndex value of the source interface (e.g., eth1) instead of the loopback
interface (e.g., lo).  The packet will still appear on the loopback interface
in tcpdump but the important aspect is that the CMSG info is correct.

Sample dhcp_release command line:

   dhcp_release eth1 192.168.204.222 02:11:33:22:44:66

Signed-off-by: Allain Legacy <allain.leg...@windriver.com>
Signed off-by: Chris Friesen <chris.frie...@windriver.com>
---
 net/ipv4/route.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..437a377 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,18 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,

 */
if (fi && res->prefixlen < 4)
fi = NULL;
+   } else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
+  (orig_oif != dev_out->ifindex)) {
+   /* For local routes that require a particular output interface
+* we do not want to cache the result.  Caching the result
+* causes incorrect behaviour when there are multiple source
+* addresses on the interface, the end result being that if the
+* intended recipient is waiting on that interface for the
+* packet he won't receive it because it will be delivered on
+* the loopback interface and the IP_PKTINFO ipi_ifindex will
+* be set to the loopback interface as well.
+*/
+   fi = NULL;
}

fnhe = NULL;

Re: [RFC PATCH] possible bug in handling of ipv4 route caching

2016-04-08 Thread Chris Friesen


On 04/07/2016 03:20 PM, Julian Anastasov wrote:

On Thu, 7 Apr 2016, Chris Friesen wrote:


Hi,

We think we may have found a bug in the handling of ipv4 route caching,
and are curious what you think.

For local routes that require a particular output interface we do not
want to cache the result.  Caching the result causes incorrect behaviour
when there are multiple source addresses on the interface.  The end
result being that if the intended recipient is waiting on that interface
for the packet he won't receive it because it will be delivered on the
loopback interface and the IP_PKTINFO ipi_ifindex will be set to the
loopback interface as well.



diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..e965d4b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,17 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
 */
if (fi && res->prefixlen < 4)
fi = NULL;
+   } else if ((type == RTN_LOCAL) && (orig_oif != 0)) {


So, we can be more specific. Can this work?:

} else if ((type == RTN_LOCAL) && (orig_oif != 0) &&
   (orig_oif != dev_out->ifindex)) {

I.e. we should allow to cache orig_oif=LOOPBACK_IFINDEX
but eth1 should not be cached.


Yes, we think that will work.  New patch to follow.

Chris

[RFC PATCH] possible bug in handling of ipv4 route caching

2016-04-07 Thread Chris Friesen

Hi,

We think we may have found a bug in the handling of ipv4 route caching,
and are curious what you think.

For local routes that require a particular output interface we do not
want to cache the result.  Caching the result causes incorrect behaviour
when there are multiple source addresses on the interface.  The end
result being that if the intended recipient is waiting on that interface
for the packet he won't receive it because it will be delivered on the
loopback interface and the IP_PKTINFO ipi_ifindex will be set to the
loopback interface as well.

This can be tested by running a program such as "dhcp_release" which
attempts to inject a packet on a particular interface so that it is
received by another program on the same board.  The receiving process
should see an IP_PKTINFO ipi_ifndex value of the source interface
(e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
will still appear on the loopback interface in tcpdump but the important
aspect is that the CMSG info is correct.

For what it's worth, here's a patch that we've applied locally to deal
with the issue.

Chris



Signed-off-by: Allain Legacy <allain.leg...@windriver.com>
Signed-off-by: Chris Friesen <chris.frie...@windriver.com>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 02c6229..e965d4b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,17 @@ static struct rtable *__mkroute_output(const struct 
fib_result *res,
 */
if (fi && res->prefixlen < 4)
fi = NULL;
+   } else if ((type == RTN_LOCAL) && (orig_oif != 0)) {
+   /* For local routes that require a particular output interface
+ * we do not want to cache the result.  Caching the result
+ * causes incorrect behaviour when there are multiple source
+ * addresses on the interface, the end result being that if the
+ * intended recipient is waiting on that interface for the
+ * packet he won't receive it because it will be delivered on
+ * the loopback interface and the IP_PKTINFO ipi_ifindex will
+ * be set to the loopback interface as well.
+*/
+   fi = NULL;
}
 
fnhe = NULL;

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Chris Friesen

I've done some further digging, and it appears that one of the problems 
we may be facing is very high instantaneous traffic rates.


Instrumentation showed up to 222K packets/sec for short periods (at 
least 1.1 ms, possibly longer), although the long-term average is down 
around 14-16K packets/sec.


If I bump the rx ring size up to 4096, we can handle all the packets and 
we still have 44% idle on cpu0 and 27% idle on cpu1.


Is there anything else we can do to minimize the latency of network 
packet processing and avoid having to crank the rx ring size up so high?


Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Chris Friesen


Ben Greear wrote:

Chris Friesen wrote:

Is there anything else we can do to minimize the latency of network 
packet processing and avoid having to crank the rx ring size up so high?



Why is it such a big deal to crank up the rx queue length?  Seems like
a perfectly normal way to handle bursts like this...


It means that the latency for handling those packets is higher than it 
could be.  Draining 4096 packets from the queue will take a while.


Ideally we'd like to bring the latency down as much as possible, and 
then bump up the rx queue length to handle the rest.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Chris Friesen


Eric Dumazet wrote:

Chris Friesen a écrit :

I've done some further digging, and it appears that one of the 
problems we may be facing is very high instantaneous traffic rates.


Instrumentation showed up to 222K packets/sec for short periods (at 
least 1.1 ms, possibly longer), although the long-term average is down 
around 14-16K packets/sec.



Instrumentation done where exactly ?


I added some code to e1000_clean_rx_irq() to track rx_fifo drops, total 
packets received, and an accurate timestamp.


If rx_fifo errors changed, it would dump the information.

Is there anything else we can do to minimize the latency of network 
packet processing and avoid having to crank the rx ring size up so high?


You have some tasks that disable softirqs too long. Sometimes, bumping 
RX ring size is OK (but you will still have delays), sometimes it is not 
an option, since 4096 is the limit on current hardware.


I added some instrumentation to take timestamps in __do_softirq() as 
well.  Based on these timestamps, I can see the following code sequence:


2374604616 usec, start processing softirqs in __do_softirq()
2374610337 usec, log values in e1000_clean_rx_irq()
2374611411 usec, log values in e1000_clean_rx_irq()

In between the successive calls to e1000_clean_rx_irq() the rx_fifo 
counts went up.


Does anyone have any patchsets to track down what softirqs are taking a 
long time, and/or who's disabling softirqs?


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Chris Friesen


Jarek Poplawski wrote:


IMHO, checking this with a current stable, which probably you are going
to do some day, anyway, should be 100% acceptable: giving some input to
netdev, while still working for yourself.


While I would love to do this, it's not that simple.

Some of our hardware is not supported on mainline, so we need per-kernel 
version patches to even bring up the blade.  The blades netboot via a 
jumbo-frame network, so kernel extensions are needed to handle setting 
the MTU before mounting the rootfs.  The blade in question uses CKRM 
which doesn't exist for newer kernels so the problem may simply be 
hidden by scheduling differences.  The userspace application uses other 
kernel features that are not in mainline (and likely some of them won't 
ever be in mainline--I know because I've tried).


Basically, the number of changes required for our environment makes it 
difficult to just boot up the latest kernel.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Chris Friesen


Radoslaw Szkodzinski (AstralStorm) wrote:

On Tue, 15 Jan 2008 08:47:07 -0600
Chris Friesen [EMAIL PROTECTED] wrote:


Some of our hardware is not supported on mainline, so we need per-kernel 
version patches to even bring up the blade.  The blades netboot via a 
jumbo-frame network, so kernel extensions are needed to handle setting 
the MTU before mounting the rootfs.



Why? Can't you use a small initramfs to set it up?


Sure, if we changed our build system, engineered a suitable small 
userspace, etc.  At this point it's easier to maintain a small patch to 
the kernel dhcp parsing code so that we can specify mtu.


The blade in question uses CKRM 
which doesn't exist for newer kernels so the problem may simply be 
hidden by scheduling differences.



Current spiritual successor is Ingo's realtime patchset I guess.


I think the group scheduling stuff for CFS is closer, but there are 
design and API differences that would require us to rework our system.


The userspace application uses other 
kernel features that are not in mainline (and likely some of them won't 
ever be in mainline--I know because I've tried).



What would these be? Some proc or sysfs files that were removed or
renamed?
Maybe they can be worked around w/o changing the application at all, or
very minor changes.


No, more than proc/sysfs.  Things like notification of process state 
change (think like SIGCHLD but sent to arbitrary processes), additional 
messaging protocol families, oom-killer protection, dirty page 
monitoring, backwards compatibility for dcbz on the ppc970, nested 
alternate signal stacks, and many others.  Between our kernel vendor's 
patches and our own, there are something like 600 patches applied to the 
mainline kernel.



Also, be sure to check if there are pauses with other CPU hogs.


With the sctp hash patch applied we're now sitting with 45% cpu free on 
one cpu, and about 25% free on the other, and we're still seeing 
periodic bursts of rx fifo loss.  It's wierd.  Still trying to figure 
out what's going on.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen


Ray Lee wrote:

On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote:



After a recent userspace app change, we've started seeing packets being
dropped by the ethernet hardware (e1000, NAPI is enabled).  The
error/dropped/fifo counts are going up in ethtool:



Can you reproduce it with a simple userspace cpu hog? (Two, really,
one per cpu.)
Can you reproduce it with the newer e1000?


Hmm...good questions and I haven't checked either.  The first one is 
relatively straightforward.  The second is a bit trickier...last time I 
tried the latest e1000 driver the card wouldn't boot (we use netboot).



Can you reproduce it with git head?


Unfortunately, I don't think I'll be able to try this.  We require 
kernel mods for our userspace to run, and I doubt I'd be able to get the 
time to port all the changes forward to git head.



If the answer to the first one is yes, the last no, then bisect until
you get a kernel that doesn't show the problem. Backport the fix,
unless the fix happens to be CFS. However, I suspect that your
userpace app is just starving the system from time to time.


It's conceivable that userspace is starving the kernel, but we have do 
about 45% idle on one cpu, and 7-10% idle on the other.


We also have an odd situation where on an initial test run after bootup 
we have 18-24% idle on cpu1, but resetting the test tool drops that to 
the 7-10% I mentioned above.


Based on profiling and instrumentation it seems like the cost of 
sctp_endpoint_lookup_assoc() more than triples, which means that the 
amount of time that bottom halves are disabled in that function also 
triples.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen

David Miller wrote:

From: Chris Friesen [EMAIL PROTECTED]
Date: Fri, 11 Jan 2008 08:59:26 -0600

I'd love to work on newer kernels, but we have a commitment to our 
customers to support multiple releases for a significant amount of time.

And by asking here for people to dig into it for you, you are asking
people for free help providing that support.

I'm not asking people to spend significant amounts of time...more like 
if anyone has any ideas off the top of their heads.

That's why there is such negative backlash to asking questions about
such ancient kernel here, you're asking us to do your work, for free.

I hadn't realized that you felt this strongly about asking questions 
regarding old kernels.

How close to bleeding edge do we need to be for it to be considered 
acceptable to ask questions on netdev?

Given that the embedded space tends to be perpetually stuck on older 
kernels (our current release is based on 2.6.14) do you have any 
suggestion on how we can obtain information that would be available on 
netdev if we were using newer kernels?

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen


Eric Dumazet wrote:

Chris Friesen a écrit :


Based on profiling and instrumentation it seems like the cost of 
sctp_endpoint_lookup_assoc() more than triples, which means that the 
amount of time that bottom halves are disabled in that function also 
triples.


Any idea of the size of sctp hash size you have ?
(your dmesg probably includes a message starting with SCTP: Hash tables 
configured...

How many concurrent sctp sockets are handled ?


Our lab is currently rebooting, but I'll try and get this once it's back up.

Maybe sctp_assoc_hashfn() is too weak for your use, and some chains are 
*really* long.


Based on the profiling information we're spending time in 
sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I 
can't see how the hash would be related.  I'm pretty new to SCTP though, 
so I may be missing something.


Here's the top results from readprofile, unfortunately these are 
aggregated across both cpus so they don't really show what's going on. 
The key thing is that sctp_endpoint_lookup_assoc() is the most expensive 
kernel routine on this entire system.


  3147 .power4_idle  22.4786
  1712 .native_idle  20.3810
  1234 .sctp_endpoint_lookup_assoc2.1725
  1212 ._spin_unlock_irqrestore   6.4468
   778 .do_futex  0.3791
   447 ._spin_unlock_irq  4.2981
   313 .fget  1.7784
   277 .fput  3.8472
   275 .kfree 0.7473
   234 .__kmalloc 0.5571
   131 SystemCall_common  0.3411
   130 .sctp_assoc_is_match   0.6373
   123 .lock_sock 0.4155
   119 .find_vma  0.6919
   116 .kmem_cache_alloc  0.3580
   111 .kmem_cache_free   0.3343
   106 .skb_release_data  0.4907
   102 .__copy_tofrom_user0.0724
   100 .exit_elf_binfmt   1.9231
   100 .do_select 0.0820


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-14 Thread Chris Friesen


Eric Dumazet wrote:

Chris Friesen a écrit :


Based on the profiling information we're spending time in 
sctp_endpoint_lookup_assoc() which doesn't actually use hashes, so I 
can't see how the hash would be related.  I'm pretty new to SCTP 
though, so I may be missing something.


Well, it does use hashes :)

   hash = sctp_assoc_hashfn(ep-base.bind_addr.port, rport);
   head = sctp_assoc_hashtable[hash];
   read_lock(head-lock);
   sctp_for_each_hentry(epb, node, head-chain) {
   /* maybe your machine is traversing here a *really* long 
chain */

   }



The latest released kernel doesn't have this code, it was only added in 
November.  The SCTP maintainer just pointed me to the patch, and made 
some other suggestions as well.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Chris Friesen


David Miller wrote:


You have to be kidding, coming here for help with a nearly
4 year old kernel.


I figured it couldn't hurt to ask...if I can't ask the original authors, 
who else is there?


I'd love to work on newer kernels, but we have a commitment to our 
customers to support multiple releases for a significant amount of time.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen


Hi all,

I've got an issue that's popped up with a deployed system running 
2.6.10.  I'm looking for some help figuring out why incoming network 
packets aren't being processed fast enough.


After a recent userspace app change, we've started seeing packets being 
dropped by the ethernet hardware (e1000, NAPI is enabled).  The 
error/dropped/fifo counts are going up in ethtool:


 rx_packets: 32180834
 rx_bytes: 5480756958
 rx_errors: 862506
 rx_dropped: 771345
 rx_length_errors: 0
 rx_over_errors: 0
 rx_crc_errors: 0
 rx_frame_errors: 0
 rx_fifo_errors: 91161
 rx_missed_errors: 91161

This link is receiving roughly 13K packets/sec, and we're dropping 
roughly 51 packets/sec due to fifo errors.


Increasing the rx descriptor ring size from 256 up to around 3000 or so 
seems to make the problem stop, but it seems to me that this is just a 
workaround for the latency in processing the incoming packets.


So, I'm looking for some suggestions on how to fix this or to figure out 
where the latency is coming from.


Some additional information:


1) Interrupts are being processed on both cpus:

[EMAIL PROTECTED]:/root cat /proc/interrupts
   CPU0   CPU1
 30:17037564530785  U3-MPIC Level eth0




2) top shows a fair amount of time processing softirqs, but very 
little time in ksoftirqd (or is that a sampling artifact?).



Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie
Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si
Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si
Mem:  4007812k total, 2199148k used,  1808664k free, 0k buffers
Swap:   0k total,   0k used,  0k free,   219844k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5375 root  15   0 2682m 1.8g 6640 S 99.9 46.7  31:17.68 
SigtranServices
 7696 root  17   0  6952 3212 1192 S  7.3  0.1   0:15.75 
schedmon.ppc210

 7859 root  16   0  2688 1228  964 R  0.7  0.0   0:00.04 top
 2956 root   8  -8 18940 7436 5776 S  0.3  0.2   0:01.35 blademtc
1 root  16   0  1660  620  532 S  0.0  0.0   0:30.62 init
2 root  RT   0 000 S  0.0  0.0   0:00.01 migration/0
3 root  15   0 000 S  0.0  0.0   0:00.55 ksoftirqd/0
4 root  RT   0 000 S  0.0  0.0   0:00.01 migration/1
5 root  15   0 000 S  0.0  0.0   0:00.43 ksoftirqd/1


3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300


So...anyone have any ideas/suggestions?

Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen


Kok, Auke wrote:


You're using 2.6.10... you can always replace the e1000 module with the
out-of-tree version from e1000.sf.net, this might help a bit - the version in 
the
2.6.10 kernel is very very old.


Do you have any reason to believe this would improve things?  It seems 
like the problem lies in the NAPI/softirq code rather than in the e1000 
driver itself, no?



it also appears that your app is eating up CPU time. perhaps setting the app to 
a
nicer nice level might mitigate things a bit.


If we're not handling the softirq work from ksoftirqd how would changing 
scheduler settings affect anything?


 Also turn off the in-kernel irq

mitigation, it just causes cache misses and you really need the network irq to 
sit
on a single cpu at most (if not all) the time to get the best performance. Use 
the
userspace irqbalance daemon instead to achieve this.


Using userspace irqbalance would be some effort to test and deploy 
properly.  However, as a quick test I tried setting the irq affinity for 
this device and it didn't help.


One thing that might be of interest is that it seems to be bursty rather 
than gradual.  Here are some timestamps (in seconds) along with the 
number of overruns on eth0:


6552.15  overruns:260097
6552.69  overruns:260097
6553.32  overruns:260097
6553.83  overruns:260097
6554.35  overruns:260097
6554.87  overruns:260097
6555.41  overruns:260097
6555.94  overruns:260097
6556.51  overruns:260097
6557.07  overruns:260282
6557.58  overruns:260282
6558.23  overruns:260282


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen


James Chapman wrote:


What's changed in your application? Any real-time threads in there?


From the top output below, looks like SigtranServices is consuming all

your CPU...


There are two cpus, and SigtranServices is multithreaded with many 
threads.  Most of these threads are affined to cpu0, a couple to cpu1. 
None of the threads are realtime.


Top is showing 37% idle on cpu0, and 6% idle on cpu1, so not all the cpu 
is being consumed.  However, I'm wondering if we're hitting bursty bits 
and we're just running out of time.


I'm going to try a system with MAX_SOFTIRQ_RESTART bumped up a bit, and 
also enable profiling.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-18 Thread Chris Friesen


Herbert Xu wrote:

Chris Friesen [EMAIL PROTECTED] wrote:

However, if I specifically try to print out one of the missing entries, 
it shows up:


[EMAIL PROTECTED]:/root /tmp/ip neigh show 192.168.24.81
192.168.24.81 dev bond2 lladdr 00:01:af:14:e9:8a REACHABLE



What about

ip -4 neigh show


Looks like that did it.  Why does specifying the family make a difference?

[EMAIL PROTECTED]:/root ip neigh show
10.41.18.1 dev eth6 lladdr 00:00:5e:00:01:01 STALE
172.24.137.0 dev bond0 lladdr 00:c0:8b:08:e4:88 REACHABLE
172.24.0.9 dev bond0 lladdr 00:07:e9:41:4b:b4 REACHABLE
10.41.18.101 dev eth6 lladdr 00:0e:0c:5e:95:bd REACHABLE
172.24.0.11 dev bond0 lladdr 00:03:cc:51:06:5e STALE
172.24.132.1 dev bond0 lladdr 00:01:af:14:e9:88 REACHABLE
172.24.0.15 dev bond0 lladdr 00:0e:0c:85:fd:d2 STALE
172.24.0.3 dev bond0 lladdr 00:01:af:14:c8:cc REACHABLE
172.24.0.5 dev bond0 lladdr 00:01:af:15:e0:6a STALE

[EMAIL PROTECTED]:/root ip -4 neigh show
192.168.24.81 dev bond2 lladdr 00:01:af:14:e9:8a REACHABLE
172.24.132.2 dev bond0  FAILED
172.24.136.0 dev bond0 lladdr 00:c0:8b:07:b3:7e REACHABLE
10.41.18.1 dev eth6 lladdr 00:00:5e:00:01:01 STALE
172.24.137.0 dev bond0 lladdr 00:c0:8b:08:e4:88 REACHABLE
172.24.0.9 dev bond0 lladdr 00:07:e9:41:4b:b4 REACHABLE
10.41.18.101 dev eth6 lladdr 00:0e:0c:5e:95:bd REACHABLE
172.24.0.11 dev bond0 lladdr 00:03:cc:51:06:5e STALE
172.24.132.1 dev bond0 lladdr 00:01:af:14:e9:88 REACHABLE
172.24.0.15 dev bond0 lladdr 00:0e:0c:85:fd:d2 STALE
172.24.0.3 dev bond0 lladdr 00:01:af:14:c8:cc REACHABLE
172.24.0.5 dev bond0 lladdr 00:01:af:15:e0:6a STALE

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-17 Thread Chris Friesen


YOSHIFUJI Hideaki / 吉藤英明 wrote:

In article [EMAIL PROTECTED] (at Wed, 12 Dec 2007
15:57:08 -0600), Chris Friesen [EMAIL PROTECTED] says:



You may try other versions of this command

http://devresources.linux-foundation.org/dev/iproute2/download/


They appear to be numbered by kernel version, and the above version
is the most recent one for 2.6.14.  Will more recent ones (for
newer kernels) work with my kernel?



It should work; if it doesn't, please make a report.  Thanks.


I downloaded iproute2-2.6.23 and built it for my kernel.

I'm compiling for a different kernel than is actually running on the 
build system, so I had to add a line defining KERNEL_INCLUDE to the 
Makefile, and I had to add -I${KERNEL_INCLUDE} to the CFLAGS 
definition.  Someone might want to do something about that...


Anyways, the arp entry issue is still there.  The arp command gives a 
bunch of entries:


[EMAIL PROTECTED]:/root arp -n
Address  HWtype  HWaddress   Flags MaskIface
192.168.24.81ether   00:01:AF:14:E9:8A   C bond2
172.24.132.2 (incomplete)  bond0
172.24.136.0 ether   00:C0:8B:07:B3:7E   C bond0
172.24.137.0 (incomplete)  bond0
172.24.0.9   ether   00:07:E9:41:4B:B4   C bond0
10.41.18.101 ether   00:0E:0C:5E:95:BD   C eth6
172.24.0.11  ether   00:03:CC:51:06:5E   C bond0
172.24.132.1 ether   00:01:AF:14:E9:88   C bond0
172.24.0.15  ether   00:0E:0C:85:FD:D2   C bond0
172.24.0.3   ether   00:01:AF:14:C8:CC   C bond0
172.24.0.5   ether   00:01:AF:15:E0:6A   C bond0

The original ip command and the new one (/tmp/ip) both give the same 
results--some of the entries are missing.


[EMAIL PROTECTED]:/root ip neigh show all
172.24.137.0 dev bond0  FAILED
172.24.0.9 dev bond0 lladdr 00:07:e9:41:4b:b4 REACHABLE
10.41.18.101 dev eth6 lladdr 00:0e:0c:5e:95:bd REACHABLE
172.24.0.11 dev bond0 lladdr 00:03:cc:51:06:5e STALE
172.24.132.1 dev bond0 lladdr 00:01:af:14:e9:88 REACHABLE
172.24.0.15 dev bond0 lladdr 00:0e:0c:85:fd:d2 STALE
172.24.0.3 dev bond0 lladdr 00:01:af:14:c8:cc REACHABLE
172.24.0.5 dev bond0 lladdr 00:01:af:15:e0:6a STALE

[EMAIL PROTECTED]:/root /tmp/ip neigh show all
172.24.137.0 dev bond0  FAILED
172.24.0.9 dev bond0 lladdr 00:07:e9:41:4b:b4 REACHABLE
10.41.18.101 dev eth6 lladdr 00:0e:0c:5e:95:bd REACHABLE
172.24.0.11 dev bond0 lladdr 00:03:cc:51:06:5e STALE
172.24.132.1 dev bond0 lladdr 00:01:af:14:e9:88 REACHABLE
172.24.0.15 dev bond0 lladdr 00:0e:0c:85:fd:d2 STALE
172.24.0.3 dev bond0 lladdr 00:01:af:14:c8:cc REACHABLE
172.24.0.5 dev bond0 lladdr 00:01:af:15:e0:6a STALE


However, if I specifically try to print out one of the missing entries, 
it shows up:


[EMAIL PROTECTED]:/root /tmp/ip neigh show 192.168.24.81
192.168.24.81 dev bond2 lladdr 00:01:af:14:e9:8a REACHABLE


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-17 Thread Chris Friesen


Patrick McHardy wrote:


 From a kernel perspective there are only complete dumps, the
filtering is done by iproute. So the fact that it shows them
when querying specifically implies there is a bug in the
iproute neighbour filter. Does it work if you omit all
from the ip neigh show command?


Omitting all gives identical results.  It is still missing entries 
when compared with the output of arp.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-12 Thread Chris Friesen


I retested it on an x86 machine and am seeing similar problems.

First, arp gives the arp table as expected:

[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 arp -n
Address  HWtype  HWaddress   Flags Mask 
   Iface
172.24.0.9   ether   00:03:CC:51:06:5E   C 
   bond0
10.41.18.101 ether   00:0E:0C:5E:95:BD   C 
   eth6
172.24.137.0 ether   00:C0:8B:08:E4:88   C 
   bond0
172.24.136.0 ether   00:C0:8B:07:B3:7E   C 
   bond0
10.41.18.1   ether   00:00:5E:00:01:01   C 
   eth6
172.24.0.5   ether   00:01:AF:15:E0:6A   C 
   bond0
172.24.0.13  ether   00:0E:0C:85:FD:D2   C 
   bond0
172.24.0.3   ether   00:01:AF:14:C8:CC   C 
   bond0
172.24.132.1 ether   00:01:AF:14:E9:88   C 
   bond0
172.24.0.7   ether   00:07:E9:41:4B:B4   C 
   bond0
192.168.24.81ether   00:01:AF:14:E9:8A   C 
   bond2


ip neigh show gives nothing, but if I search for specific addresses 
from the arp table listing they show up:


[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 ip neigh show
[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 ip neigh show 
172.24.0.9

172.24.0.9 dev bond0 lladdr 00:03:cc:51:06:5e DELAY
[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 ip neigh show 
10.41.18.101

10.41.18.101 dev eth6 lladdr 00:0e:0c:5e:95:bd REACHABLE
[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 ip neigh show 
172.24.137.0

172.24.137.0 dev bond0 lladdr 00:c0:8b:08:e4:88 REACHABLE


Is this expected behaviour?

Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-12 Thread Chris Friesen


Eric Dumazet wrote:

Chris Friesen a écrit :

Is this expected behaviour?


Probably not... Still a 2.6.14 kernel ?


Yep.  Embedded hardware, so I'm unable to test with a more recent kernel.


Could you send the result of :

strace ip neigh show


I've attached two strace runs, one of ip neigh show and one of ip 
neigh show 10.41.18.101.


Chris
[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 strace ip neigh show
execve(/sbin/ip, [ip, neigh, show], [/* 14 vars */]) = 0
uname({sys=Linux, node=typhoon-base-unit0, ...}) = 0
brk(0)  = 0x806b000
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or directory)
open(/etc/ld.so.cache, O_RDONLY)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=58478, ...}) = 0
mmap2(NULL, 58478, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f56000
close(3)= 0
open(/lib/libresolv.so.2, O_RDONLY)   = 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\\0..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=75541, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f55000
mmap2(NULL, 71816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0xb7f43000
mmap2(0xb7f51000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe) = 0xb7f51000
mmap2(0xb7f53000, 6280, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f53000
close(3)= 0
open(/lib/libc.so.6, O_RDONLY)= 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\216O\1..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1407983, ...}) = 0
mmap2(NULL, 1146076, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0xb7e2b000
mmap2(0xb7f39000, 32768, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10d) = 0xb7f39000
mmap2(0xb7f41000, 7388, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f41000
close(3)= 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7e2a000
mprotect(0xb7f39000, 20480, PROT_READ)  = 0
set_thread_area({entry_number:-1 - 6, base_addr:0xb7e2a6b0, limit:1048575, 
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, 
useable:1}) = 0
munmap(0xb7f56000, 58478)   = 0
socket(PF_NETLINK, SOCK_RAW, 0) = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [32768], 4) = 0
bind(3, {sa_family=AF_NETLINK, pid=0, groups=}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=6150, groups=}, [12]) = 0
time(NULL)  = 1197465643
sendto(3, \24\0\0\0\22\0\1\3,\340_G\0\0\0\0\0\0\0\0, 20, 0, 
{sa_family=AF_NETLINK, pid=0, groups=}, 12) = 20
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=}, 
msg_iov(1)=[{\364\0\0\0\20\0\2\0,\340_G\6\30\0\0\0\0\1\0\1\0\0\0C\30..., 
16384}], msg_controllen=0, msg_flags=0}, 0) = 3544
brk(0)  = 0x806b000
brk(0x808c000)  = 0x808c000
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=}, 
msg_iov(1)=[{\24\0\0\0\3\0\2\0,\340_G\6\30\0\0\0\0\0\0\1\0\0\0C\30\0..., 
16384}], msg_controllen=0, msg_flags=0}, 0) = 20
sendto(3, \24\0\0\0\36\0\1\3-\340_G\0\0\0\0\0\0\0\0, 20, 0, 
{sa_family=AF_NETLINK, pid=0, groups=}, 12) = 20
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=}, 
msg_iov(1)=[{[EMAIL PROTECTED]..., 16384}], msg_controllen=0, msg_flags=0}, 
0) = 264
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=}, 
msg_iov(1)=[{[EMAIL PROTECTED]..., 16384}], msg_controllen=0, msg_flags=0}, 
0) = 20
exit_group(0)   = ?
Process 6150 detached






[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 ip neigh show 10.41.18.101
10.41.18.101 dev eth6 lladdr 00:0e:0c:5e:95:bd REACHABLE
[EMAIL PROTECTED]:/tftpboot/cnp/0-0-5-0/0-0-5-0 strace ip neigh show 
10.41.18.101
execve(/sbin/ip, [ip, neigh, show, 10.41.18.101], [/* 14 vars */]) = 0
uname({sys=Linux, node=typhoon-base-unit0, ...}) = 0
brk(0)  = 0x806b000
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or directory)
open(/etc/ld.so.cache, O_RDONLY)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=58478, ...}) = 0
mmap2(NULL, 58478, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fd5000
close(3)= 0
open(/lib/libresolv.so.2, O_RDONLY)   = 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\\0..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=75541, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7fd4000
mmap2(NULL, 71816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0xb7fc2000
mmap2(0xb7fd, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe) = 0xb7fd
mmap2(0xb7fd2000, 6280, PROT_READ|PROT_WRITE

Re: ip neigh show not showing arp cache entries?

2007-12-12 Thread Chris Friesen


Eric Dumazet wrote:


And what is the version of ip command you have on this machine ?

ip -V


iproute2-ss051107


You may try other versions of this command

http://devresources.linux-foundation.org/dev/iproute2/download/


They appear to be numbered by kernel version, and the above version is 
the most recent one for 2.6.14.  Will more recent ones (for newer 
kernels) work with my kernel?


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ip neigh show not showing arp cache entries?

2007-12-10 Thread Chris Friesen



I'm seeing some strange behaviour on a 2.6.14 ppc64 system.  If I run 
ip neigh show it prints out nothing, but if I run arp then I see the 
other nodes on the local network.



[EMAIL PROTECTED]:/root ip neigh show
[EMAIL PROTECTED]:/root arp -n
Address  HWtype  HWaddress   Flags Mask 
   Iface
172.24.132.0 ether   00:01:AF:14:E8:DA   C 
   bond0
172.24.132.1 (incomplete) 
   bond0
172.24.136.0 ether   00:C0:8B:07:B3:7E   C 
   bond0
172.24.132.4 ether   00:01:AF:14:E8:DA   C 
   bond0
172.24.132.2 ether   00:01:AF:14:E8:DA   C 
   bond0



Any ideas what's going on here?

Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-10 Thread Chris Friesen


Chris Friesen wrote:


I'm seeing some strange behaviour on a 2.6.14 ppc64 system.  If I run 
ip neigh show it prints out nothing, but if I run arp then I see the 
other nodes on the local network.



[EMAIL PROTECTED]:/root ip neigh show
[EMAIL PROTECTED]:/root arp -n
Address  HWtype  HWaddress   Flags MaskIface
172.24.132.0 ether   00:01:AF:14:E8:DA   Cbond0
172.24.132.1 (incomplete)bond0
172.24.136.0 ether   00:C0:8B:07:B3:7E   Cbond0
172.24.132.4 ether   00:01:AF:14:E8:DA   Cbond0
172.24.132.2 ether   00:01:AF:14:E8:DA   Cbond0


Any ideas what's going on here?


I've got some further information.  If I look for a specific address, it 
seems to work:


[EMAIL PROTECTED]:/root ip neigh show 172.24.136.0
172.24.136.0 dev bond0 lladdr 00:c0:8b:07:b3:7e REACHABLE


In the above scenario, the arp cache lists the device as reachable via 
bond0.  If I search the arp cache to see whether the address is 
reachable from one of bond0's slave devices, should it come back 
positive or negative?


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

when using arp monitoring with bonding, why use broadcast arps?

2007-12-04 Thread Chris Friesen



We have a network with a number of nodes using bonding with arp 
monitoring.  The arp interval is set to 100ms.


Unfortunately, the bonding code sends the arp packets to the hardware 
broadcast address, which means that the number of these arp packets seen 
by each node goes up with the number of nodes on the network.


One of the nodes has a fairly low-powered cpu and handles most things in 
microengine code, but arp packets get handled in software.  All these 
broadcast arps slow this node down noticeably.


Is there any particular reason why the bonding code couldn't use unicast 
arp packets if the arp_ip_target has a valid entry in the sender's arp 
table?


Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] [e1000 VLAN] Disable vlan hw accel when promiscuous mode

2007-11-12 Thread Chris Friesen


David Miller wrote:


When you select VLAN, you by definition are asking for non-VLAN
traffic to be elided.  It is like plugging the ethernet cable
into one switch or another.


For max functionality it seems like the raw eth device should show 
everything on the wire in promiscuous mode.


If we want to sniff only the traffic for a specific vlan, we can sniff 
the vlan device.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

question on sky2 driver panic

2007-10-15 Thread Chris Friesen



Hi,

We're using Yukon-XL (0xb3) rev 3 hardware with a vendor-supplied 
2.6.14.  BAsed on suggestions here, I backported the sky2 driver (v1.10 
from 2.6.20.6) to 2.6.14.


Unfortunately, when I booted this I got the following:


skb_over_panic: text:d00d4e14 len:60 put:60 
head:c00264920770 data:c00264920720 tail:c00264920720 
end:c002649207a0 dev:NULL
kernel BUG in skb_over_panic at 
/usr/local/src/2.6.14/gd/linux/net/core/skbuff.c:94!

Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2
Modules linked in: tipc bond1 bond0 ipmi_devintf ipmi_msghandler
NIP: C020D7E8 XER:  LR: C020D7E4 CTR: 
C01C210C

REGS: c0025c2aefe0 TRAP: 0700   Not tainted  (2.6.14-pne)
MSR: 90029032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 28008022
DAR:  DSISR: c0025c2af1c0
TASK: cfec2940[2107] 'insmod' THREAD: c0025c2ac000 CPU: 0
GPR00: C020D7E4 C0025C2AF300 C041DA08 009C
GPR04: 90009032  0030 C037C428
GPR08:  C037EEF0 C043AD68 C043AC88
GPR12: 0010 C0374000  100D47E8
GPR16: 100D55A0   
GPR20:    0001
GPR24:  B6AA7FFF  
GPR28: 003C CFEF5B10 C03BB778 003C
NIP [c020d7e8] .skb_over_panic+0x50/0x68
LR [c020d7e4] .skb_over_panic+0x4c/0x68
Call Trace:
[c0025c2af300] [c020d7e4] .skb_over_panic+0x4c/0x68 (unreliable)
[c0025c2af390] [d00d4e20] .named_prepare_buf+0x298/0x2a8 [tipc]
[c0025c2af450] [d00d4e90] .named_publish+0x60/0xe4 [tipc]
[c0025c2af4e0] [d00d80a8] .nametbl_publish+0x128/0x198 [tipc]
[c0025c2af590] [d00de7dc] .tipc_publish+0xe8/0x188 [tipc]
[c0025c2af650] [d00d7f4c] .nametbl_publish_rsv+0x30/0x64 [tipc]
[c0025c2af6e0] [d00d2600] .cfg_init+0x120/0x150 [tipc]
[c0025c2af7a0] [d00e31ac] .process_signal_queue+0xa4/0x100 
[tipc]

[c0025c2af8/0x1ec [tipc]
[c0025c2afcf0] [c00685ec] .sys_init_module+0x28c/0x510
[c0025c2afd90] [c0009b9c] syscall_exit+0x0/0x18



Now granted it looks like this was triggered by tipc, but is there 
anything that you can think of in the sky2 driver that may have been 
related?  Maybe due to the fragmented buffer handling?  The link would 
have been using an mtu of 9KB.


Thanks,

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: question on sky2 driver panic

2007-10-15 Thread Chris Friesen


Stephen Hemminger wrote:


Maybe TIPC can't handle fragmented receive buffers.  The sky2 driver
generates skb's with header and multiple pages if MTU is big enough.
For 9K MTU that would be 1K of data + 2 4K pages.  The protocols are
supposed to be smart enough to handle this, but TIPC is rarely used.


We already had to modify tipc to handle fragmented receive buffers when 
we had memory allocation errors on the e1000 and moved to fragmented 
skbs in that driver.


Our version of the e1000 passes 200 bytes in the initial chunk, and the 
rest in fragments.  tipc currently handles that without any difficulty.


I was just checking more to see if there were any known issues in this 
area that have been fixed in more recent driver versions.


Chris

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

sk98lin, jumbo frames, and memory fragmentation

2007-10-01 Thread Chris Friesen



Hi all,

We're considering some hardware that uses the sk98lin network hardware, 
and we'll be using jumbo frames.  Looking at the driver, when using a 
9KB MTU it seems like it would end up trying to atomically allocate a 
16KB buffer.


Has anyone heard of this been a problem?  It would seem like trying to 
atomically allocate four physically contiguous pages could become tricky 
after the system has been running for a while.


The reason I ask is that we ran into this with the e1000.  Before they 
added the new jumbo frame code it was trying to atomically allocate 32KB 
buffers and we would start getting allocation failures after a month or 
so of uptime.


Any information anyone can provide would be appreciated.


Thanks,

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sk98lin, jumbo frames, and memory fragmentation

2007-10-01 Thread Chris Friesen


Stephen Hemminger wrote:


Adding fragmentation support to skge driver is on my list of
possible extensions. sky2 driver already supports it (yet one
more feature that the vendor sk98lin driver doesn't do).


Thanks for speaking up.  As I mentioned in my email to Jeff it looks 
like the sky2 driver is what I need (Marvel Yukon 88E8062).  However, 
I'm on 2.6.14 and it doesn't exist there...do you anticipate any issues 
if I were to backport it?


Thanks,

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: sk98lin, jumbo frames, and memory fragmentation

2007-10-01 Thread Chris Friesen


Jeff Garzik wrote:


The sk98lin driver is going away, please don't use it.

It's unmaintained and full of known bugs.


Okay...so it looks like the proper driver for the Marvell Yukon 88E8062 
is the sky2 driver, and this one does avoid order0 allocations.  Am I 
on track?


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Chris Friesen


Linus Torvalds wrote:

 - in other words, the *only* possible meaning for volatile is a purely 
   single-CPU meaning. And if you only have a single CPU involved in the 
   process, the volatile is by definition pointless (because even 
   without a volatile, the compiler is required to make the C code appear 
   consistent as far as a single CPU is concerned).


I assume you mean except for IO-related code and 'random' values like 
jiffies as you mention later on?  I assume other values set in 
interrupt handlers would count as random from a volatility perspective?


So anybody who argues for volatile fixing bugs is fundamentally 
incorrect. It does NO SUCH THING. By arguing that, such people only show 
that you have no idea what they are talking about.


What about reading values modified in interrupt handlers, as in your 
random case?  Or is this a bug where the user of atomic_read() is 
invalidly expecting a read each time it is called?


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Chris Friesen


Herbert Xu wrote:


But I have to say that I still don't know of a single place
where one would actually use the volatile variant.


Given that many of the existing users do currently have volatile, are 
you comfortable simply removing that behaviour from them?  Are you sure 
that you will not introduce any issues?


Forcing a re-read is only a performance penalty.  Removing it can cause 
behavioural changes.


I would be more comfortable making the default match the majority of the 
current implementations (ie: volatile semantics).  Then, if someone 
cares about performance they can explicitly validate the call path and 
convert it over to the non-volatile version.


Correctness before speed...

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 24/24] document volatile atomic_read() behavior

2007-08-09 Thread Chris Friesen


Segher Boessenkool wrote:


Anyway, what's the supposed advantage of *(volatile *) vs. using
a real volatile object?  That you can access that same object in
a non-volatile way?


That's my understanding.  That way accesses where you don't care about 
volatility may be optimised.


For instance, in cases where there are already other things controlling 
visibility (as are needed for atomic increment, for example) you don't 
need to make the access itself volatile.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: iproute2 showing wrong number of bytes on 64bit architectures.

2007-07-10 Thread Chris Friesen


David Miller wrote:


It used unsigned long ages ago, and ifconfig gets the bits
exported from /proc/net/dev output whereas we have to used
fixed data types in whatever we use over netlink so u32
was choosen.


It's rather ironic that the new-and-improved way of doing things is 
subject to rollover while the old way is not.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: AF_PACKET how to get the original netdev from a packet received from a bonded master

2007-04-19 Thread Chris Friesen


Chris Leech wrote:


Just to give you an idea of our motivation around this, we're looking
at layer 2 configuration protocols implemented from user space.


I'd like to second the intent of this patch.  We've been maintaining a 
patch against 2.6.10 for a while now that exports the original ingress 
device to userspace via ancilliary data.  We use it in combination with 
bonding.


Was never submitted to mainline because we're on an old kernel.

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

quick help with bonding?

2007-03-29 Thread Chris Friesen



Im doing some experimenting with a new network driver that receives 
jumbo frames into multiple separate pages that are then joined together 
in a single sk_buff using skb_fill_page_desc().


It behaved fairly well with standard networking, but its behaving 
strangely with bonding added to the mix.


Could someone either point me to the bonding high level design document 
(couldn't find one at the sourceforge project page) or else give me a 
quick overview of the code path followed by an incoming packet when 
bonding is involved?


Thanks,

Chris Friesen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] quick help with bonding?

2007-03-29 Thread Chris Friesen


Andy Gospodarek wrote:

Can you elaborate on what isn't going well with this driver/hardware?  


I have a ppc64 blade running a customized 2.6.10.  At init time, two of 
our gigE links (eth4 and eth5) are bonded together to form bond0.  This 
link has an MTU of 9000, and uses arp monitoring.  We're using an 
ethernet driver with a modified RX path for jumbo frames[1].  With the 
stock driver, it seems to work fine.


The problem is that eth5 seems to be bouncing up and down every 15 sec 
or so (see the attached log excerpt).  Also, ifconfig shows that only 
3 packets totalling 250 bytes have gone out eth5, when I know that the 
arp monitoring code from the bond layer is sending 10 arps/sec out the link.



eth5  Link encap:Ethernet  HWaddr 00:03:CC:51:01:3E
  inet6 addr: fe80::203:ccff:fe51:13e/64 Scope:Link
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
  RX packets:119325 errors:90283 dropped:90283 overruns:90283 
frame:0

  TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:8978310 (8.5 MiB)  TX bytes:250 (250.0 b)
  Base address:0x3840 Memory:9222-9224


I had initially suspected that it might be due to the u32 jiffies 
stuff in bonding.h, but changing that doesn't seem to fix the issue.


If I boot the system and then log in and manually create the bond link 
(rather than it happening at init time) then I don't see the problem.


If it matters at all, normally the system boots from eth4.  I'm going to 
try booting from eth6 and see if the problem still occurs.



Chris




[1] I'm not sure if I'm supposed to mention the specific driver, as it 
hasn't been officially released yet, so I'll keep this high-level. 
Normally for jumbo frames you need to allocate a large physically 
contiguous buffer.  With the modified driver, rather than receiving into 
a contiguous buffer the incoming packet is split across multiple pages 
which are then reassembled into an sk_buff and passed up the link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: ARP monitoring set to 100 
ms with 2 target(s): 172.24.136.0 172.24.137.0
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: ARP monitoring set to 100 
ms with 2 target(s): 172.25.136.0 172.25.137.0
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: Warning: failed to get 
speed/duplex from eth4, speed forced to 100Mbps, duplex forced to Full.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: enslaving eth4 as an 
active interface with an up link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: Warning: failed to get 
speed/duplex from eth5, speed forced to 100Mbps, duplex forced to Full.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: enslaving eth5 as an 
active interface with an up link.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 3 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth4 to be reset in 3 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth4 is 
now down.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: now running without 
any active interface !
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: link status 
definitely up for interface eth5
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth4
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth4 is 
now up
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 3 msec.
Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:54:09 base0-0-0-5-0-11-1 kernel: bonding: interface eth4 reset delay 
set to 600 msec.
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 3 msec.
Mar 29 20:54:59 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface 
eth5 to be reset in 3 msec.
Mar 29 20:55:15 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now down.
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled 
reset of interface eth5
Mar 29 20:55:30 base0-0-0-5-0-11-1 kernel: bonding: bond0: interface eth5 is 
now up

Re: [Bonding-devel] quick help with bonding?

2007-03-29 Thread Chris Friesen


Jay Vosburgh wrote:


2.6.10 is pretty old, and there have been a number of fixes to
the bonding ARP monitor since then, so it may be that it is simply
misbehaving (presuming that you're running the 2.6.10 bonding driver).
Are you in a position to test against a more recent kernel (and/or
bonding driver)?  Does the miimon misbehave in a similar fashion?


Testing a more recent kernel is problematic.  A new bonding driver could 
be possible, assuming the code hasn't changed too much.


I just did another experiment.  Normally we boot via eth4 (which then 
becomes part of the bond  with eth5 at init time).  If I boot via eth6 
instead, it appears as though the problem doesn't show up.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] quick help with bonding?

2007-03-29 Thread Chris Friesen


Andy Gospodarek wrote:


If you are looking for a decent source for patches you could consider
downloading the latest source-rpm from RHEL4/CentOS4.  The bonding
driver in those releases have been updated to much later code and I can
tell you from personal experience they work pretty well.  You may need
to do some backporting to get the latest arp-monitoring features, but
let me know if you need a hand with that, I might have some laying
around. ;)


I'm just about to load a kernel with a backport of bonding from 2.6.14. 
 I'll try it out and if it doesn't help I'll try the RHEL4 one.



Does eth6 use the same hardware/driver as eth4/5?  (Sorry if I missed
that in the thread, but didn't see if you indicated that it did.)


No, eth6 is an AMD-8111.

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] quick help with bonding?

2007-03-29 Thread Chris Friesen


Chris Friesen wrote:

Andy Gospodarek wrote:


If you are looking for a decent source for patches you could consider
downloading the latest source-rpm from RHEL4/CentOS4.  The bonding
driver in those releases have been updated to much later code and I can
tell you from personal experience they work pretty well.


I'm just about to load a kernel with a backport of bonding from 2.6.14. 
 I'll try it out and if it doesn't help I'll try the RHEL4 one.


No joy on the 2.6.14 backport, so I guess I'll try the RHEL4 route.

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bonding-devel] quick help with bonding?

2007-03-29 Thread Chris Friesen


Chris Friesen wrote:


No joy on the 2.6.14 backport, so I guess I'll try the RHEL4 route.


Bonding driver from 2.6.9-42.0.8.EL doesn't help at all, at least with 
the module parms I was using before.


Switching to miimon doesn't help either.

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ARP notify option

2007-03-06 Thread Chris Friesen


Stephen Hemminger wrote:


+arp_notify - BOOLEAN
+   Define mode for notification of address and device changes.
+   0 - (default): do nothing
+   1 - Generate gratuitous arp replies when device is brought up
+   or hardware address changes.


Did you consider using gratuitous arp requests instead?  I remember 
reading about some hardware that updated its arp cache on gratuitous 
requests but not gratuitous replies.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: jumbo frames and memory fragmentation

2006-07-17 Thread Chris Friesen


Herbert Xu wrote:

Chris Friesen [EMAIL PROTECTED] wrote:


Looking at the page-splitting code, it says 82571 and greater support 
packet-split  We're running the 82546GB device.  Looks like it 
won't help me.



Well, time to fork out for a new card then :)


I wish.  This is an embedded ATCA board.  I'm not going to get a respin 
just because it doesn't deal well with jumbo frames.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: jumbo frames and memory fragmentation

2006-06-30 Thread Chris Friesen


Herbert Xu wrote:


Either upgrade your kernel or backport the page-splitting code in the
current tree.  That's really the only sane solution for jumbo packets.


Looking at the page-splitting code, it says 82571 and greater support 
packet-split  We're running the 82546GB device.  Looks like it 
won't help me.


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: jumbo frames and memory fragmentation

2006-06-30 Thread Chris Friesen


Evgeniy Polyakov wrote:


It definitely will.
Packet split in hardware means separating data and headers into
different pages in different reads, while software page split means that 
skb has a list of fragments where part of the packet will be DMAed, so

jumbo frame will be converted into several pages.


Maybe I'm looking at the wrong code then.  Can you point me to where 
this software page split is handled?


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

jumbo frames and memory fragmentation

2006-06-29 Thread Chris Friesen



I'm running a system with multiple e1000 devices, using 9KB jumbo 
frames.  I'm running a modified 2.6.10 with e1000 driver 5.5.4-k2.


I'm a bit concerned about the behaviour of this driver with jumbo 
frames.  We ask for 9KB.  The driver then bumps that up to a 
power-of-two, so it calls dev_alloc_skb(16384).  That then bumps it up a 
bit to allow for its own overhead, so it appears that we end up asking 
for 32KB of physically contiguous memory for every packet coming in.  Ouch.


Add to that the fact that this version of the driver doesn't do 
copybreak, and it means that after we're up for a few days it starts 
complaining about not being able to allocate buffers.


Anyone have any suggestions on how to improve this?  Upgrading kernels 
isn't an option.  I could port back the copybreak stuff fairly easily.


Back in 2.4 some of the drivers used to retry buffer allocations using 
GFP_KERNEL once interrupts were reenabled.  I don't see many of them 
doing that anymore--would there be any benefit to that?


Thanks,

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-06-19 Thread Chris Friesen


Andi Kleen wrote:


Incoming packets are only time stamped
when someone asks for the timestamps.


Doesn't that add scheduling latency to the timestamps?  Or is is a flag 
that gets set to trigger timestamping at packet arrival?


Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

a few questions about NAPI

2005-07-13 Thread Chris Friesen


Hi guys,

I've got a few questions about NAPI.

The help text in the kernel suggests enabling NAPI if incoming packet 
rate is over 10Kpps.


If we're under that rate currently, but could grow above that rate, what 
would be the impact of enabling NAPI?  Why would we not enable it always 
if available?


Are there any gotchas with SMP machines?

Are there any other issues of which I need to be aware?

Thanks,

Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

54 matches

Mail list logo