Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-14 Thread Robert Olsson
jamal writes: Essentially the approach would be the same as Robert's old recycle patch where he doesnt recycle certain skbs - the only difference being in the case of forwarding, the recycle is done asynchronously at EOT whereas this is done synchronously upon return from host path.

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-14 Thread Patrick McManus
David S. Miller wrote: From: John Ronciak [EMAIL PROTECTED] Date: Wed, 7 Dec 2005 11:48:46 -0800 Copybreak probably shouldn't be used in routing use cases. I think even this is arguable, routers route a lot more than small 64-byte frames. Unfortunately, that is what everyone uses for packet

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-13 Thread jamal
On Fri, 2005-09-12 at 15:11 -0800, David S. Miller wrote: From: jamal [EMAIL PROTECTED] Date: Fri, 09 Dec 2005 16:30:24 -0500 indeed sounds interesting until you start hitting clones ;- so dont run a sniffer or do anything of the sort if you want to see some good numbers - otherwise I

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-13 Thread Jesse Brandeburg
On Tue, 13 Dec 2005, jamal wrote: On Mon, 2005-12-12 at 20:38 +0100, Robert Olsson wrote: jamal writes: Robert, what about just #1? Maybe thats the best compromise that would work for all. I've tried that before with flow test and got contribution from #2 0 prefetch 756 kpps 1

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-13 Thread jamal
On Tue, 2005-13-12 at 10:32 -0800, Jesse Brandeburg wrote: To help allay your concerns, someone in our lab is going to test routing between two ports on some older server hardware (1Ghz Pentium 3 class) today. I hope to have some results by tomorrow. that would be great. In our

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-12 Thread Robert Olsson
jamal writes: Could the Robert/Jesse also verify this? I normally dont get excited by an extra kpps these days;- Hello! Here is a summary. It compares #12 and #125 prefetches with different load and with and without copybreak. cpybrk loadprefetch tput kpps

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-12 Thread jamal
Ok, this makes things more interesting What worked for a XEON doesnt work the same way for an opteron. For me, the copybreak (in its capacity as adding extra cycles that make the prefetch look good) made things look good. Also, #125 gave a best answer. None of these were the case from

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-12 Thread Robert Olsson
jamal writes: Ok, this makes things more interesting What worked for a XEON doesnt work the same way for an opteron. For me, the copybreak (in its capacity as adding extra cycles that make the prefetch look good) made things look good. Also, #125 gave a best answer. None of

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-12 Thread Jeff Garzik
Jeff Kirsher wrote: e1000 driver update Signed-off-by: Jeff Kirsher [EMAIL PROTECTED] Signed-off-by: John Ronciak [EMAIL PROTECTED] Signed-off-by: Jesse Brandeburg [EMAIL PROTECTED] 2. Performance Enhancements - aggressive prefetch of rx_desc and skb-data just like we do for 10gig - align the

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-09 Thread jamal
On Thu, 2005-08-12 at 17:57 +0100, Eric Dumazet wrote: right, after i did this code, i realized that, and it is demonstrable that #4 hurts, if only a little. I'm sticking with my suggestion we go to #1,#2,#5 I would try another thing : #1,#2,#4bis #4bis

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-09 Thread jamal
On Thu, 2005-08-12 at 21:25 +0100, Robert Olsson wrote: David S. Miller writes: BTW, this is all related to SKB recycling. For example, if this is just a TCP ACK, we can do better than copybreak and just let the driver use the SKB again upon return from netif_receive_skb().

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Robert Olsson
David S. Miller writes: For the host bound case, copybreak is always a way due to how socket buffer accounting works. If you use a 1500 byte SKB for 64 bytes of data, this throws off all of the socket buffer accounting because you're consuming more of the socket limit per byte of data

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread David S. Miller
From: Robert Olsson [EMAIL PROTECTED] Date: Thu, 8 Dec 2005 10:20:43 +0100 Why not remove copybreak from the drivers and do eventual copybreak after we have looked up the packet. This way we can get copybreak for all drivers and we can do this only for packets with has destination to

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Andi Kleen
On Thu, Dec 08, 2005 at 01:35:11AM -0800, David S. Miller wrote: From: Robert Olsson [EMAIL PROTECTED] Date: Thu, 8 Dec 2005 10:20:43 +0100 Why not remove copybreak from the drivers and do eventual copybreak after we have looked up the packet. This way we can get copybreak for all

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread David S. Miller
From: Andi Kleen [EMAIL PROTECTED] Date: Thu, 8 Dec 2005 10:39:25 +0100 The problem is that there can be a quite long per CPU queue already before lookup - and without copybreak a lot of memory might be wasted in there. There is no queue, we go straight from driver RX handling all the way

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Eric Dumazet
Robert Olsson a écrit : David S. Miller writes: For the host bound case, copybreak is always a way due to how socket buffer accounting works. If you use a 1500 byte SKB for 64 bytes of data, this throws off all of the socket buffer accounting because you're consuming more of the socket

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Jens Laas
(05.12.08 kl.10:56) Eric Dumazet skrev följande till Robert Olsson: Robert Olsson a écrit : David S. Miller writes: This will lead to an extra alloc in case of copybreak but it could possible to avoid this with some function giving copybreak feedback to driver i.e via

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread jamal
On Wed, 2005-07-12 at 16:11 -0800, David S. Miller wrote: From: John Ronciak [EMAIL PROTECTED] Date: Wed, 7 Dec 2005 16:09:21 -0800 On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: I think Jesse's data and recommendation of only keeping the #1, #2 and #5 prefetches seem like the

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread jamal
On Wed, 2005-07-12 at 23:04 +0100, Eric Dumazet wrote: David S. Miller a écrit : Another try could be to do some benchmarks about NET_IP_ALIGN being a valid optimization nowadays : Maybe setting it to 0 in e1000 driver could give better results. Could somebody give it a try ? Ok, I tried

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Jesse Brandeburg
On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 08 Dec 2005 04:47:05 +0100 #4#5 as proposed in the patch can not be a win + prefetch(next_skb); + prefetch(next_skb-data - NET_IP_ALIGN); because at the time

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Eric Dumazet
Jesse Brandeburg a écrit : On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 08 Dec 2005 04:47:05 +0100 #4#5 as proposed in the patch can not be a win + prefetch(next_skb); + prefetch(next_skb-data - NET_IP_ALIGN);

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Rick Jones
Having it off by default is a bad idea from a socket perspective. When you have 64 byte data packets consuming 1500+ bytes of data storage, which is what you get with copybreak disabled, TCP spends all of it's time copying packet data around as the socket buffering limits on receive are hit quite

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread David S. Miller
From: Robert Olsson [EMAIL PROTECTED] Date: Thu, 8 Dec 2005 11:35:06 +0100 David S. Miller writes: It is not clear if we want to wait the whole netif_receive_skb() execution to get this status. That can take a long time to execute :-) The driver has to wait for full

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Andi Kleen
For example, if this is just a TCP ACK, we can do better than copybreak and just let the driver use the SKB again upon return from netif_receive_skb(). :-) That's a cool optimization. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread David S. Miller
From: Francois Romieu [EMAIL PROTECTED] Date: Fri, 9 Dec 2005 00:09:47 +0100 Rick Jones [EMAIL PROTECTED] : [...] Does it really need to be particularly aggressive about that? How often are there great streams of small packets sitting in a socket buffer? One really only cares when the

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Rick Jones
Francois Romieu wrote: Rick Jones [EMAIL PROTECTED] : [...] Does it really need to be particularly aggressive about that? How often are there great streams of small packets sitting in a socket buffer? One really only cares when the system starts getting memory challenged right? Until then

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Rick Jones
David S. Miller wrote: From: Francois Romieu [EMAIL PROTECTED] Date: Fri, 9 Dec 2005 00:09:47 +0100 Rick Jones [EMAIL PROTECTED] : [...] Does it really need to be particularly aggressive about that? How often are there great streams of small packets sitting in a socket buffer? One really

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Jesse Brandeburg
On 12/6/05, Robert Olsson [EMAIL PROTECTED] wrote: jamal writes: Results: kernel 2.6.11.7: 446Kpps kernel 2.6.14: 452kpps kernel 2.6.14 with e1000-6.2.15: 470Kpps Kernel 2.6.14 with e1000-6.2.15 but rx copybreak commented out: 460Kpps copybreaks help you..

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Tue, 2005-06-12 at 23:33 -0700, Grant Grundler wrote: On Tue, Dec 06, 2005 at 06:08:35PM -0500, jamal wrote: All load goes onto CPU#0. I didnt try to tune or balance anything so the numbers could be better than those noted here ok - that's fair. I suspect the hyperthreading case is one

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 00:18 -0800, Jesse Brandeburg wrote: On 12/6/05, Robert Olsson [EMAIL PROTECTED] wrote: jamal writes: Results: kernel 2.6.11.7: 446Kpps kernel 2.6.14: 452kpps kernel 2.6.14 with e1000-6.2.15: 470Kpps Kernel 2.6.14 with e1000-6.2.15

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread John Ronciak
On 12/7/05, jamal [EMAIL PROTECTED] wrote: It is possible it will help some traffic setups to turn it on, however, forever you had it as off. So people who need it know how to turn it on. The sudden change of behavior that was questionable and btw it is not documented either. Well it's

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Jeff Garzik
Grant Grundler wrote: Yes - his results indicated copybreak hurt perf on the AMD box. h... Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Grant Grundler
On Wed, Dec 07, 2005 at 07:41:29AM -0500, jamal wrote: ok - that's fair. I suspect the hyperthreading case is one where binding the IRQs to particule CPUs is necessary to reproduce the results. Note: I didnt bind anything. The p4/xeon starts with routing everything to CPU#0 - i just

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Robert Olsson
jamal writes: copybreaks help you.. Thanks for running these tests, so far it mostly validates that for the general case the copybreak and prefetch benefits things. I dont know what you would call a general case. Pick two modern boards as in these tests: I'll add some

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
Robert, Very interesting results - i would like to comment; but let me post my results first. I recompiled all kernels from scratch and made sure that flow control was off in all cases. I still test with two flows .. will get to something like 32K flows perhaps tommorow (keeping my fingers

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Jeff Garzik
John Ronciak wrote: As far as copybreak goes, we knew it probably won't help routing type test with small packets. Robert's test shows it really only hurts where it seems to be CPU bound, which makes sense. This can be disable at compile time by setting E1000_CB_LENGHT to 2K which means that

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 14:27 -0500, Jeff Garzik wrote: John Ronciak wrote: As far as copybreak goes, we knew it probably won't help routing type test with small packets. Robert's test shows it really only hurts where it seems to be CPU bound, which makes sense. This can be disable at

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread John Ronciak
On 12/7/05, Jeff Garzik [EMAIL PROTECTED] wrote: So... under load, copybreak causes e1000 to fall over more rapidly than no-copybreak? If so, it sounds like copybreak should be disabled by default, and/or a runtime switched added for it. I wouldn't say fall over. With small packet only

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 11:48 -0800, John Ronciak wrote: On 12/7/05, Jeff Garzik [EMAIL PROTECTED] wrote: So... under load, copybreak causes e1000 to fall over more rapidly than no-copybreak? If so, it sounds like copybreak should be disabled by default, and/or a runtime switched added

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Robert Olsson
jamal writes: Kernel 2.6.14 + e1000-6.2.15 prefetch off copybreak off: 451Kpps kernel 2.6.14 + e1000-6.2.15 prefetch off copybreak on: 450Kpps This pretty close to the results I got today in the single flow test I even noticed a little win w. the copy-break on. Kernel 2.6.14 +

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: Jeff Garzik [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 14:27:50 -0500 If so, it sounds like copybreak should be disabled by default, and/or a runtime switched added for it. This logic applies to all drivers, though. If you're cpu loaded, then yes copying the packets will require more cpu

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Eric Dumazet
jamal a écrit : On Wed, 2005-07-12 at 11:48 -0800, John Ronciak wrote: On 12/7/05, Jeff Garzik [EMAIL PROTECTED] wrote: So... under load, copybreak causes e1000 to fall over more rapidly than no-copybreak? If so, it sounds like copybreak should be disabled by default, and/or a runtime

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: jamal [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 14:44:53 -0500 so thats conclusion one. Copybreak oughta be off by default. People who think it is useful can turn it on. I disagree, the socket buffering side effects of non-copybreak are severe especially during loss handling where it is

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: John Ronciak [EMAIL PROTECTED] Date: Wed, 7 Dec 2005 11:48:46 -0800 Copybreak probably shouldn't be used in routing use cases. I think even this is arguable, routers route a lot more than small 64-byte frames. Unfortunately, that is what everyone uses for packet rate tests. :-/ Assuming

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: jamal [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 15:23:57 -0500 I am no longer sure that your results on copybreak for host bound packets can be trusted anymore. All your copybreak was doing was making the prefetch look good according to my tests. For the host bound case, copybreak is

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Robert Olsson
John Ronciak writes: If so, it sounds like copybreak should be disabled by default, and/or a runtime switched added for it. I wouldn't say fall over. With small packet only tests (the ones being run for this exercise) _all_ packets are being copied which is why when the system

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 21:50 +0100, Robert Olsson wrote: jamal writes: Kernel 2.6.14 + e1000-6.2.15 prefetch off copybreak off: 451Kpps kernel 2.6.14 + e1000-6.2.15 prefetch off copybreak on: 450Kpps This pretty close to the results I got today in the single flow test I even noticed

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 21:59 +0100, Eric Dumazet wrote: jamal a écrit : Eric Dumazet [EMAIL PROTECTED] theorized there may be some value in copybreak if you are host bound. I only seen it as an unnecessary pain really. In my case, my production servers are usually ram bounded, not

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 13:06 -0800, David S. Miller wrote: From: jamal [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 15:23:57 -0500 I am no longer sure that your results on copybreak for host bound packets can be trusted anymore. All your copybreak was doing was making the prefetch look good

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread John Ronciak
On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: Regardless of the decision, it's incorrect to point out e1000 specifically as many other Linux networking drivers do copybreak too and I've always public advocated for copybreak to be used by drivers due to the socket buffering issue.

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: jamal [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 16:37:10 -0500 I think there is value for prefetch - just not the way the current patch has it. Something less adventorous as suggested by Robert would make a lot more sense. Looking at the e1000 patch in question again, it may be doing a

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread John Ronciak
On 12/7/05, jamal [EMAIL PROTECTED] wrote: On the prefetch, i think would you agree now that it is problematic? Sorry, I don't agree. I just showed that if i changed the cycle of execution between the moment the prefecth gets issued to the moment the data gets used we get different

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Eric Dumazet
David S. Miller a écrit : From: jamal [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 16:37:10 -0500 I think there is value for prefetch - just not the way the current patch has it. Something less adventorous as suggested by Robert would make a lot more sense. Looking at the e1000 patch in

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 23:04:04 +0100 Another try could be to do some benchmarks about NET_IP_ALIGN being a valid optimization nowadays : Maybe setting it to 0 in e1000 driver could give better results. Could somebody give it a try ? NET_IP_ALIGN is a

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: John Ronciak [EMAIL PROTECTED] Date: Wed, 7 Dec 2005 13:56:29 -0800 The different between the cases was not significant and the prefetching cases were better than no prefetching. Again, still no detriment to performance. I still think what e1000 is doing is way too aggressive. I know

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 14:09 -0800, David S. Miller wrote: From: John Ronciak [EMAIL PROTECTED] Date: Wed, 7 Dec 2005 13:56:29 -0800 The different between the cases was not significant and the prefetching cases were better than no prefetching. Again, still no detriment to performance.

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Grant Grundler
On Wed, Dec 07, 2005 at 02:17:16PM -0500, jamal wrote: ... Note, however that as soon as i turn copybreak off, the numbers go down ;- Now for some obtuse theory as to why this happens: I think the reason that prefetch + copybreak together have higher numbers is because the copybreak code

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread jamal
On Wed, 2005-07-12 at 14:11 -0800, David S. Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 23:04:04 +0100 Another try could be to do some benchmarks about NET_IP_ALIGN being a valid optimization nowadays : Maybe setting it to 0 in e1000 driver could give

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Jesse Brandeburg
On Wed, 7 Dec 2005, David S. Miller wrote: The different between the cases was not significant and the prefetching cases were better than no prefetching. Again, still no detriment to performance. I still think what e1000 is doing is way too aggressive. I know of at least one platform,

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: Grant Grundler [EMAIL PROTECTED] Date: Wed, 7 Dec 2005 16:01:50 -0700 I would argue the other way around. copybreak would stall and hurt small packet routing performance if there was no prefetching. With agressive prefetching, copybreak takes advantage of data that is already in flight

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Eric Dumazet
John Ronciak a écrit : On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: Keyword, this box. We don't disagree and never have with this. It's why we were asking the question of find us a case where the prefetch shows a detriment to performance. I think Jesse's data and recommendation of

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread David S. Miller
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 08 Dec 2005 04:47:05 +0100 #4#5 as proposed in the patch can not be a win + prefetch(next_skb); + prefetch(next_skb-data - NET_IP_ALIGN); because at the time #5 is done, the CPU dont have in its cache next_skb-data

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-06 Thread Robert Olsson
Ronciak, John writes: So we still need to see a case where performance is hurt by the prefetching. We have some data coming from another group here at Intel next week which we'll share once we have it which also shows the performance gains with prefetching. Hello! Well here is another

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-06 Thread jamal
On Tue, 2005-06-12 at 16:55 +0100, Robert Olsson wrote: Ronciak, John writes: So we still need to see a case where performance is hurt by the prefetching. We have some data coming from another group here at Intel next week which we'll share once we have it which also shows the

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-06 Thread jamal
Ok, here are some results - unfortunately i dont have further access to the hardware until tommorow: Hardware: - a 2Ghz dual Xeon hyperthreading (shows up as 4 processors); 512 KB L2 Cache and 1Gig Ram. Two ethernet e1000 82546EB tests: -- Forwarding tests with a single flow into

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-06 Thread Robert Olsson
jamal writes: Results: kernel 2.6.11.7: 446Kpps kernel 2.6.14: 452kpps kernel 2.6.14 with e1000-6.2.15: 470Kpps Kernel 2.6.14 with e1000-6.2.15 but rx copybreak commented out: 460Kpps copybreaks help you.. And lastly to just play with different prefetch on/off as

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-05 Thread jamal
On Sat, 2005-03-12 at 14:58 -0700, Grant Grundler wrote: On Sat, Dec 03, 2005 at 02:37:59PM -0500, jamal wrote: On Sat, 2005-03-12 at 12:00 -0700, Grant Grundler wrote: On Sat, Dec 03, 2005 at 09:20:52AM -0500, jamal wrote: Ok, so you seem to be saying again that for case #b above, there

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread Eric Dumazet
David S. Miller a écrit : I agree with the analysis, but I truly hate knobs. Every new one we add means it's even more true that you need to be a wizard to get a Linux box performing optimally. [rant mode] Well, I suspect this is the reason why various hash tables (IP route cache, TCP

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread jamal
On Fri, 2005-02-12 at 11:04 -0700, Grant Grundler wrote: On Thu, Dec 01, 2005 at 09:32:37PM -0500, jamal wrote: [..] We've already been down this path before. How and where to prefetch is quite dependent on the CPU implementation and workload. [..] At the time you did this, I read the

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread jamal
On Fri, 2005-02-12 at 16:53 -0800, Ronciak, John wrote: In this combination of hardware and in this forwarding test copybreak is bad but prefetching helps. e1000 vanilla 1150 kpps e1000 6.2.151084 e1000

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread jamal
On Fri, 2005-02-12 at 20:04 -0800, David S. Miller wrote: We don't even know the _nature_ of the cases where the e1000 prefetches might want to be disabled by a platform. It's therefore impossible for us to design any kind of reasonable interface or runtime test. All evidence shows the

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread jamal
On Sat, 2005-03-12 at 02:25 +0100, Eric Dumazet wrote: Note that on a router (ie most packets are not locally delivered), copybreak is useless and expensive. But if most packets are locally delivered (on local TCP or UDP queues), then copybreak is a win because less memory is taken by not

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread jamal
On Sat, 2005-03-12 at 09:39 -0500, jamal wrote: I am going to go and install Linux (running something else at the moment) on this one piece of hardware that i happen to know was problematic and try to test like the way Robert did. That will be my good deed of the day ;- I suppose no good

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread jamal
On Sat, 2005-03-12 at 12:00 -0700, Grant Grundler wrote: On Sat, Dec 03, 2005 at 09:20:52AM -0500, jamal wrote: Ok, so you seem to be saying again that for case #b above, there is no harm in issuing the prefetch late since the CPU wont issue a second fetch for the address? Right.

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread Grant Grundler
On Sat, Dec 03, 2005 at 02:37:59PM -0500, jamal wrote: On Sat, 2005-03-12 at 12:00 -0700, Grant Grundler wrote: On Sat, Dec 03, 2005 at 09:20:52AM -0500, jamal wrote: Ok, so you seem to be saying again that for case #b above, there is no harm in issuing the prefetch late since the CPU

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Robert Olsson
jamal writes: I will test with a newer piece of hardware and one of the older ones i have (both Xeons) - perhaps this weekend. Robert may have some results perhaps on this driver, Robert? It would also be nice for the intel folks to post their full results somewhere. I agree with you

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Grant Grundler
On Thu, Dec 01, 2005 at 09:32:37PM -0500, jamal wrote: I think until a counter case is shown, the prefetches should remain on unconditionally. Proof of detriment is the burdon of the accusor, especially since the Intel folks aparently did a lot of testing :-) We've already been down this

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Andi Kleen
On Fri, Dec 02, 2005 at 11:04:14AM -0700, Grant Grundler wrote: At the time you did this, I read the Intel docs on P3 and P4 cache behaviors. IIRC, the P4 HW prefetches very aggressively. ie the SW prefetching just becomes noise or burns extra CPU cycles. My guess I don't think they can follow

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Ronciak, John
In this combination of hardware and in this forwarding test copybreak is bad but prefetching helps. e1000 vanilla 1150 kpps e1000 6.2.151084 e1000 6.2.15 copybreak disabled 1216 e1000 6.2.15

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread John Ronciak
On 12/2/05, Grant Grundler [EMAIL PROTECTED] wrote: Yup. We can tune for workload/load-latency of each architecture. I think tuning for all of them in one source code is the current problem. We have to come up with a way for the compiler to insert (or not) prefetching at different places for

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Eric Dumazet
Ronciak, John a écrit : In this combination of hardware and in this forwarding test copybreak is bad but prefetching helps. e1000 vanilla 1150 kpps e1000 6.2.151084 e1000 6.2.15 copybreak disabled

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Andi Kleen
On Fri, Dec 02, 2005 at 05:01:39PM -0800, John Ronciak wrote: On 12/2/05, Grant Grundler [EMAIL PROTECTED] wrote: Yup. We can tune for workload/load-latency of each architecture. I think tuning for all of them in one source code is the current problem. We have to come up with a way for

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread David S. Miller
From: Eric Dumazet [EMAIL PROTECTED] Date: Sat, 03 Dec 2005 02:25:53 +0100 Note that on a router (ie most packets are not locally delivered), copybreak is useless and expensive. But if most packets are locally delivered (on local TCP or UDP queues), then copybreak is a win because less

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-01 Thread jamal
On Thu, 2005-01-12 at 02:36 -0800, Jeff Kirsher wrote: e1000 driver update Signed-off-by: Jeff Kirsher [EMAIL PROTECTED] Signed-off-by: John Ronciak [EMAIL PROTECTED] Signed-off-by: Jesse Brandeburg [EMAIL PROTECTED] 2. Performance Enhancements - aggressive prefetch of rx_desc and

RE: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-01 Thread Ronciak, John
, Jeffrey T Cc: Jeff Garzik; netdev@vger.kernel.org; [EMAIL PROTECTED]; Ronciak, John; Brandeburg, Jesse; Robert Olsson Subject: Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements On Thu, 2005-01-12 at 02:36 -0800, Jeff Kirsher wrote: e1000 driver update Signed-off

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-01 Thread David S. Miller
From: Ronciak, John [EMAIL PROTECTED] Date: Thu, 1 Dec 2005 15:10:19 -0800 Do you have a specific example? We have tried this on Intel (x86 and IA-64), AMD (x86 and x86_64) and PPC processors. Most show gains with none showing any detriment to performance with the test cases. The gains

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-01 Thread jamal
On Thu, 2005-01-12 at 16:16 -0800, David S. Miller wrote: From: Ronciak, John [EMAIL PROTECTED] Date: Thu, 1 Dec 2005 15:10:19 -0800 Do you have a specific example? We have tried this on Intel (x86 and IA-64), AMD (x86 and x86_64) and PPC processors. Most show gains with none showing