Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Michael Chan
On Wed, 2008-02-20 at 15:08 -0800, David Miller wrote: > From: Tony Battersby <[EMAIL PROTECTED]> > Date: Wed, 20 Feb 2008 18:04:09 -0500 > > > The following patch fixes the problem for me. Do we want to accept this > > patch and call it a day or continue investigating the source of the problem?

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread David Miller
From: Tony Battersby <[EMAIL PROTECTED]> Date: Wed, 20 Feb 2008 18:04:09 -0500 > The following patch fixes the problem for me. Do we want to accept this > patch and call it a day or continue investigating the source of the problem? > > Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc.

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
The following patch fixes the problem for me. Do we want to accept this patch and call it a day or continue investigating the source of the problem? Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc. If everyone agrees that this is the right solution, I will resubmit with a proper

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Update: Herbert's patch alters the arguments to alloc_skb_fclone() and skb_reserve() from within sk_stream_alloc_pskb(). This changes the skb_headroom() and skb_tailroom() of the returned skb. I decided to see if I could detect the precise point at which data corruption started to happen. The

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Matt Carlson wrote: > Hi Tony. Can you give us the output of : > > sudo lspci -vvv - -s 03:01.0' > 03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) Subsystem: Compaq Computer Corporation NC7770 Gigabit Server Adapter (PCI-X,

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Herbert Xu wrote: > On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: > >> Update: when I revert Herbert's patch in addition to applying your >> patch, the iSCSI performance goes back up to 115 MB/s again in both >> directions. So it looks like turning off SG for TX didn't itself

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Michael Chan wrote: > On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: > > >> Update: when I revert Herbert's patch in addition to applying your >> patch, the iSCSI performance goes back up to 115 MB/s again in both >> directions. So it looks like turning off SG for TX didn't itself

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Michael Chan wrote: On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Herbert Xu wrote: On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Matt Carlson wrote: Hi Tony. Can you give us the output of : sudo lspci -vvv - -s 03:01.0' 03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) Subsystem: Compaq Computer Corporation NC7770 Gigabit Server Adapter (PCI-X, 10/100/1000-T)

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Update: Herbert's patch alters the arguments to alloc_skb_fclone() and skb_reserve() from within sk_stream_alloc_pskb(). This changes the skb_headroom() and skb_tailroom() of the returned skb. I decided to see if I could detect the precise point at which data corruption started to happen. The

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
The following patch fixes the problem for me. Do we want to accept this patch and call it a day or continue investigating the source of the problem? Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc. If everyone agrees that this is the right solution, I will resubmit with a proper

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread David Miller
From: Tony Battersby [EMAIL PROTECTED] Date: Wed, 20 Feb 2008 18:04:09 -0500 The following patch fixes the problem for me. Do we want to accept this patch and call it a day or continue investigating the source of the problem? Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc. If

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Michael Chan
On Wed, 2008-02-20 at 15:08 -0800, David Miller wrote: From: Tony Battersby [EMAIL PROTECTED] Date: Wed, 20 Feb 2008 18:04:09 -0500 The following patch fixes the problem for me. Do we want to accept this patch and call it a day or continue investigating the source of the problem?

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Herbert Xu
On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: > > Update: when I revert Herbert's patch in addition to applying your > patch, the iSCSI performance goes back up to 115 MB/s again in both > directions. So it looks like turning off SG for TX didn't itself cause > the performance

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Matt Carlson
On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: > Michael Chan wrote: > > On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: > > > >> iSCSI > >> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx > >> with light tx, > >> > > > > That's strange. The

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Michael Chan
On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: > > Update: when I revert Herbert's patch in addition to applying your > patch, the iSCSI performance goes back up to 115 MB/s again in both > directions. So it looks like turning off SG for TX didn't itself cause > the performance drop,

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: > On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: > >> iSCSI >> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx >> with light tx, >> > > That's strange. The patch should only affect TX performance slightly > since we are just turning off

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: >> The SysKonnect NIC that does not exhibit this problem has a chip that >> says "BCM5411KQM" "TT0128 P2Q" and "56975E". > I think this is the 5700, but please send me the tg3 output that > identifies the chip and the revision. Something like this: > > eth2: Tigon3

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Michael Chan
On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: > iSCSI > performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx > with light tx, That's strange. The patch should only affect TX performance slightly since we are just turning off SG for TX. Please take an ethereal trace to

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: > On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: > > >> One consequence of Herbert's change is that the chip will see a >> different datastream. The initial skb->data linear area will be >> smaller, and the transition to the fragmented area of pages will be >>

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: One consequence of Herbert's change is that the chip will see a different datastream. The initial skb-data linear area will be smaller, and the transition to the fragmented area of pages will be quicker.

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Michael Chan
On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, That's strange. The patch should only affect TX performance slightly since we are just turning off SG for TX. Please take an ethereal trace to

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: The SysKonnect NIC that does not exhibit this problem has a chip that says BCM5411KQM TT0128 P2Q and 56975E. I think this is the 5700, but please send me the tg3 output that identifies the chip and the revision. Something like this: eth2: Tigon3 [partno(BCM95705) rev

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote: On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, That's strange. The patch should only affect TX performance slightly since we are just turning off SG for TX.

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Michael Chan
On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop, but

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Matt Carlson
On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: Michael Chan wrote: On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, That's strange. The patch should only

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Herbert Xu
On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop,

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: > One consequence of Herbert's change is that the chip will see a > different datastream. The initial skb->data linear area will be > smaller, and the transition to the fragmented area of pages will be > quicker. > I see. Perhaps when we

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread David Miller
From: "Michael Chan" <[EMAIL PROTECTED]> Date: Mon, 18 Feb 2008 16:32:00 -0800 > On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote: > > I am experiencing network data corruption with a 3Com 3C996B-T NIC > > (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the > > following

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote: > I am experiencing network data corruption with a 3Com 3C996B-T NIC > (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the > following patch as the trigger: Assuming this problem is unique to the 5701, I'm not sure how it is

TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Tony Battersby
I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the trigger: commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9 Author: Herbert Xu <[EMAIL PROTECTED]> Date: Wed Nov 14 15:45:21 2007 -0800

TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Tony Battersby
I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the trigger: commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9 Author: Herbert Xu [EMAIL PROTECTED] Date: Wed Nov 14 15:45:21 2007 -0800

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread David Miller
From: Michael Chan [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 16:32:00 -0800 On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote: I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: One consequence of Herbert's change is that the chip will see a different datastream. The initial skb-data linear area will be smaller, and the transition to the fragmented area of pages will be quicker. I see. Perhaps when we get to

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote: I am experiencing network data corruption with a 3Com 3C996B-T NIC (Broadcom NetXtreme BCM5701; driver tg3.ko). I have identified the following patch as the trigger: Assuming this problem is unique to the 5701, I'm not sure how it is