Re: [ 00/73] 3.8.13-stable review

2013-05-10 Thread Holger Hoffstaette
On Thu, 09 May 2013 15:31:23 -0700, Greg Kroah-Hartman wrote:

> This is the start of the stable review cycle for the 3.8.13 release. There

This patchset broke my internet, with all sorts of weird effects like
Samba clients having problems to talk to the server and only partially
working DNS resolution (CDNs broken, Amazon unreachable).

After two reboots to/from .12/.13 (to rule out temporary internet
brokenness) the problem has been identified as:

> Stefan Bader 
> r8169: fix 8168evl frame padding.

After reverting only this patch (turning r8169 back to 3.8.12) things
again behave as expected with the rest of .13. So far no other regressions
detected.

This patch should probably be removed from 3.9.2-rc as well.

-h


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/73] 3.8.13-stable review

2013-05-10 Thread Holger Hoffstaette
On Thu, 09 May 2013 15:31:23 -0700, Greg Kroah-Hartman wrote:

 This is the start of the stable review cycle for the 3.8.13 release. There

This patchset broke my internet, with all sorts of weird effects like
Samba clients having problems to talk to the server and only partially
working DNS resolution (CDNs broken, Amazon unreachable).

After two reboots to/from .12/.13 (to rule out temporary internet
brokenness) the problem has been identified as:

 Stefan Bader stefan.ba...@canonical.com
 r8169: fix 8168evl frame padding.

After reverting only this patch (turning r8169 back to 3.8.12) things
again behave as expected with the rest of .13. So far no other regressions
detected.

This patch should probably be removed from 3.9.2-rc as well.

-h


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


drm/radeon: continued dead graphics with X on Thinkpad T60 with kernel 3.8.x/3.9.x

2013-03-28 Thread Holger Hoffstaette

My old trusty Thinkpad T60 with a Mobility Radeon X1400 runs fine with
3.7.10 - glxgears, vlc work great. However with kernel >=3.8.x any attempt
at X dies in various ways. While waiting for all the other regressions in
3.8.x to settle down I've collected the occasional attempt with newer
kernels. As even latest 3.8.5 and 3.9-rc2 are affected (the two failing
almost identically) I figured it is about time I complain. :)

Captured logs with annotations and oopses are at:
http://hoho.dyndns.org/linux/radeon/, including as much relevant info as I
could collect. All kernels are built the same way (via genkernel, an
automated script) and configurations between kernel versions have not been
altered. The 3.7.10 log is just included for reference (maybe the HW
is initialized differently?); the system is rock-stable otherwise.

I can gladly help track this down/build/patch from git but need some
pointers how to proceed. 3.7.10 is the last kernel that works (very well)
but is unmaintained, and it would be nice to move on to 3.8.5 to benefit
from ongoing bug fixes.

thanks!
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


drm/radeon: continued dead graphics with X on Thinkpad T60 with kernel 3.8.x/3.9.x

2013-03-28 Thread Holger Hoffstaette

My old trusty Thinkpad T60 with a Mobility Radeon X1400 runs fine with
3.7.10 - glxgears, vlc work great. However with kernel =3.8.x any attempt
at X dies in various ways. While waiting for all the other regressions in
3.8.x to settle down I've collected the occasional attempt with newer
kernels. As even latest 3.8.5 and 3.9-rc2 are affected (the two failing
almost identically) I figured it is about time I complain. :)

Captured logs with annotations and oopses are at:
http://hoho.dyndns.org/linux/radeon/, including as much relevant info as I
could collect. All kernels are built the same way (via genkernel, an
automated script) and configurations between kernel versions have not been
altered. The 3.7.10 log is just included for reference (maybe the HW
is initialized differently?); the system is rock-stable otherwise.

I can gladly help track this down/build/patch from git but need some
pointers how to proceed. 3.7.10 is the last kernel that works (very well)
but is unmaintained, and it would be nice to move on to 3.8.5 to benefit
from ongoing bug fixes.

thanks!
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/27] 3.6.10-stable review

2012-12-07 Thread Holger Hoffstaette
On Thu, 06 Dec 2012 16:58:44 -0800, Greg Kroah-Hartman wrote:

> This is the start of the stable review cycle for the 3.6.10 release.

Patched against 3.6.9/builds/works fine on 3 Gentoo ~x86 machines, two
generic i5/i7 boxes (one ATI with Evergreen chip) and an old T60 Thinkpad.
No borkage detected so far.

-h


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/27] 3.6.10-stable review

2012-12-07 Thread Holger Hoffstaette
On Thu, 06 Dec 2012 16:58:44 -0800, Greg Kroah-Hartman wrote:

 This is the start of the stable review cycle for the 3.6.10 release.

Patched against 3.6.9/builds/works fine on 3 Gentoo ~x86 machines, two
generic i5/i7 boxes (one ATI with Evergreen chip) and an old T60 Thinkpad.
No borkage detected so far.

-h


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/46] 3.5.3-stable review

2012-08-20 Thread Holger Hoffstaette
(CC: J. Bruce Fields)

Greg,

it looks like a fix for a regression in NFS is not mentioned in this
patch even though it was supposedly sent to -stable:
http://lkml.indiana.edu/hypermail/linux/kernel/1208.2/00594.html

I tried to locate the patch in git but could not find it anywhere. As far
as I can tell the fix works; in my case I noticed repeatable hangs when
trying to "move files to the trash" in XFCE and these hangs are now gone.
Have not noticed any negative side effects either.

thanks
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/46] 3.5.3-stable review

2012-08-20 Thread Holger Hoffstaette
(CC: J. Bruce Fields)

Greg,

it looks like a fix for a regression in NFS is not mentioned in this
patch even though it was supposedly sent to -stable:
http://lkml.indiana.edu/hypermail/linux/kernel/1208.2/00594.html

I tried to locate the patch in git but could not find it anywhere. As far
as I can tell the fix works; in my case I noticed repeatable hangs when
trying to move files to the trash in XFCE and these hangs are now gone.
Have not noticed any negative side effects either.

thanks
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2008-01-03 Thread Holger Hoffstaette

I got my Promise card and everything is up and running without problems,
using kernel 2.6.24-rc6 and the sata_promise driver out of the box:

00:0c.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA 300 
TX2plus) (rev 02)
Subsystem: Promise Technology, Inc. PDC40775 (SATA 300 TX2plus)
Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 17
I/O ports at b400 [size=128]
I/O ports at b800 [size=256]
Memory at ef025000 (32-bit, non-prefetchable) [size=4K]
Memory at ef00 (32-bit, non-prefetchable) [size=128K]
[virtual] Expansion ROM at 9802 [disabled] [size=32K]
Capabilities: [60] Power Management version 2
Kernel driver in use: sata_promise

Drive is a Samsung HD321KJ 320 GB and hdparm says the drive does ~75
MB/s; with dd it does buffered writes at ~85 MB/s and reads ~295 MB/s. So
all in all I'd say it's not the card or the driver..

hth,
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2008-01-03 Thread Holger Hoffstaette

I got my Promise card and everything is up and running without problems,
using kernel 2.6.24-rc6 and the sata_promise driver out of the box:

00:0c.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA 300 
TX2plus) (rev 02)
Subsystem: Promise Technology, Inc. PDC40775 (SATA 300 TX2plus)
Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 17
I/O ports at b400 [size=128]
I/O ports at b800 [size=256]
Memory at ef025000 (32-bit, non-prefetchable) [size=4K]
Memory at ef00 (32-bit, non-prefetchable) [size=128K]
[virtual] Expansion ROM at 9802 [disabled] [size=32K]
Capabilities: [60] Power Management version 2
Kernel driver in use: sata_promise

Drive is a Samsung HD321KJ 320 GB and hdparm says the drive does ~75
MB/s; with dd it does buffered writes at ~85 MB/s and reads ~295 MB/s. So
all in all I'd say it's not the card or the driver..

hth,
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2007-12-31 Thread Holger Hoffstaette
On Mon, 31 Dec 2007 16:19:26 -0800, Linda Walsh wrote:

> [snip]
> Another new "problem" (not as important) -- even though SATA disks are
> called with "sdX", my ATA disks that *were* at hda-hdc are now at hde-hdg.
> Devices hda-hdd are not populated in my dev directory on bootup.  Of

I think this is because the Promise SATA card also has one or more PATA
channels, so if the card is activated it takes precedence over your old
controller. But it should only be one channel, not four?
As for the other problem - I plan on adding such a card to one of my
systems during the week and might be able to contribute some findings.

Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2007-12-31 Thread Holger Hoffstaette
On Mon, 31 Dec 2007 16:19:26 -0800, Linda Walsh wrote:

 [snip]
 Another new problem (not as important) -- even though SATA disks are
 called with sdX, my ATA disks that *were* at hda-hdc are now at hde-hdg.
 Devices hda-hdd are not populated in my dev directory on bootup.  Of

I think this is because the Promise SATA card also has one or more PATA
channels, so if the card is activated it takes precedence over your old
controller. But it should only be one channel, not four?
As for the other problem - I plan on adding such a card to one of my
systems during the week and might be able to contribute some findings.

Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-15 Thread Holger Hoffstaette
On Thu, 06 Dec 2007 19:44:26 +0100, Francois Romieu wrote:

> Holger Hoffstaette <[EMAIL PROTECTED]> : [...]
>> Maybe turning off sendfile or NAPI just lead to random success - so far
>> it really looks like tso on the r8169 is the common cause.
> 
> TSO on the r8169 is the magic switch but the regression makes imvho more
> sense from a VM pov:
> 
> - the corrupted file has the same size as the expected file
> - the corrupted file exhibits holes which come as a multiple of 4096 bytes
>   (8*4k, 2 places, there may be more)
> - the r8169 driver does not know what a page is
> - the 8169 hardware has a small 8192 bytes Tx buffer
> 
> It would be nice if someone could do a sendfile + vsftp test with TSO on a
> different hardware. While I could not reproduce the corruption when simply
> downloading a file that I had copied on the server with scp, it triggered
> almost immediately after I copied it locally and tried to download the
> copy.

I tested 2.6.24-rc5 on my T60 (Intel e1000 built with NAPI) and installed
vsftp/apache with sendfile and enabled all offload options incl. TSO.
Repeated downloads of >500 MB with ftp or wget over the NIC onto ram- or
physical disk gives no corruption whatsoever. Speed of download to ramdisk
is a nice continuous 125 MB/sec.
Looks like the r8169 or the driver after all..

thanks
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-15 Thread Holger Hoffstaette
On Thu, 13 Dec 2007 03:19:43 +0100, Holger Hoffstaette wrote:

> I have now gone back to enable TSO since vsftp with sendfile really seems
> to be the only app that causes this. I have simply set it to
> use_sendfile=NO and no corruption occurs at all; the machine is stable and
> fast.

In the good tradition of proving myself wrong I can reliably create
corrupted files by wget-ting from apache (with sendfile enabled) as
well, so no more TSO after all. No TSO, no corruption.
The same also happens on a different machine with a r8169 (same model).
Tickless kernel makes no difference either. Shot in the dark, but hey..

Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-15 Thread Holger Hoffstaette
On Thu, 13 Dec 2007 03:19:43 +0100, Holger Hoffstaette wrote:

 I have now gone back to enable TSO since vsftp with sendfile really seems
 to be the only app that causes this. I have simply set it to
 use_sendfile=NO and no corruption occurs at all; the machine is stable and
 fast.

In the good tradition of proving myself wrong I can reliably create
corrupted files by wget-ting from apache (with sendfile enabled) as
well, so no more TSO after all. No TSO, no corruption.
The same also happens on a different machine with a r8169 (same model).
Tickless kernel makes no difference either. Shot in the dark, but hey..

Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-15 Thread Holger Hoffstaette
On Thu, 06 Dec 2007 19:44:26 +0100, Francois Romieu wrote:

 Holger Hoffstaette [EMAIL PROTECTED] : [...]
 Maybe turning off sendfile or NAPI just lead to random success - so far
 it really looks like tso on the r8169 is the common cause.
 
 TSO on the r8169 is the magic switch but the regression makes imvho more
 sense from a VM pov:
 
 - the corrupted file has the same size as the expected file
 - the corrupted file exhibits holes which come as a multiple of 4096 bytes
   (8*4k, 2 places, there may be more)
 - the r8169 driver does not know what a page is
 - the 8169 hardware has a small 8192 bytes Tx buffer
 
 It would be nice if someone could do a sendfile + vsftp test with TSO on a
 different hardware. While I could not reproduce the corruption when simply
 downloading a file that I had copied on the server with scp, it triggered
 almost immediately after I copied it locally and tried to download the
 copy.

I tested 2.6.24-rc5 on my T60 (Intel e1000 built with NAPI) and installed
vsftp/apache with sendfile and enabled all offload options incl. TSO.
Repeated downloads of 500 MB with ftp or wget over the NIC onto ram- or
physical disk gives no corruption whatsoever. Speed of download to ramdisk
is a nice continuous 125 MB/sec.
Looks like the r8169 or the driver after all..

thanks
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-12 Thread Holger Hoffstaette
On Thu, 06 Dec 2007 19:44:26 +0100, Francois Romieu wrote:

> Holger Hoffstaette <[EMAIL PROTECTED]> : [...]
>> Maybe turning off sendfile or NAPI just lead to random success - so far
>> it really looks like tso on the r8169 is the common cause.
> 
> TSO on the r8169 is the magic switch but the regression makes imvho more
> sense from a VM pov:
> 
> - the corrupted file has the same size as the expected file - the
> corrupted file exhibits holes which come as a multiple of 4096 bytes
>   (8*4k, 2 places, there may be more)
> - the r8169 driver does not know what a page is - the 8169 hardware has a
> small 8192 bytes Tx buffer
> 
> It would be nice if someone could do a sendfile + vsftp test with TSO on a
> different hardware. While I could not reproduce the corruption when simply
> downloading a file that I had copied on the server with scp, it triggered
> almost immediately after I copied it locally and tried to download the
> copy.

Here's an update - sorry for the delay but I need that machine for everyday 
work.

I have now gone back to enable TSO since vsftp with sendfile really seems
to be the only app that causes this. I have simply set it to
use_sendfile=NO and no corruption occurs at all; the machine is stable and
fast.

FWIW the corruption can still be reproduced with 2.6.24-rc5. For kicks I
have also tried -rc5 with SLAB instead of SLUB, but that didn't help
either.

The directory with the tcpdump & test data now also contains a few more
corrupted files; maybe comparing the corruption offsets gives someone a
better idea.

thanks
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-12 Thread Holger Hoffstaette
On Thu, 06 Dec 2007 19:44:26 +0100, Francois Romieu wrote:

 Holger Hoffstaette [EMAIL PROTECTED] : [...]
 Maybe turning off sendfile or NAPI just lead to random success - so far
 it really looks like tso on the r8169 is the common cause.
 
 TSO on the r8169 is the magic switch but the regression makes imvho more
 sense from a VM pov:
 
 - the corrupted file has the same size as the expected file - the
 corrupted file exhibits holes which come as a multiple of 4096 bytes
   (8*4k, 2 places, there may be more)
 - the r8169 driver does not know what a page is - the 8169 hardware has a
 small 8192 bytes Tx buffer
 
 It would be nice if someone could do a sendfile + vsftp test with TSO on a
 different hardware. While I could not reproduce the corruption when simply
 downloading a file that I had copied on the server with scp, it triggered
 almost immediately after I copied it locally and tried to download the
 copy.

Here's an update - sorry for the delay but I need that machine for everyday 
work.

I have now gone back to enable TSO since vsftp with sendfile really seems
to be the only app that causes this. I have simply set it to
use_sendfile=NO and no corruption occurs at all; the machine is stable and
fast.

FWIW the corruption can still be reproduced with 2.6.24-rc5. For kicks I
have also tried -rc5 with SLAB instead of SLUB, but that didn't help
either.

The directory with the tcpdump  test data now also contains a few more
corrupted files; maybe comparing the corruption offsets gives someone a
better idea.

thanks
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-06 Thread Holger Hoffstaette

On Wed, 05 Dec 2007 23:54:29 +0100, Francois Romieu wrote:

> Holger Hoffstaette <[EMAIL PROTECTED]> : [...]
>> Should I file this in bugzilla?
> 
> Yes.

Thanks for responding - will do. I verified with 2.6.24-rc4 (same bug) and
have some new information about this.
Despite my previous posting the corruption is NOT triggered by NAPI. It
may be related, but even without NAPI but tso on again I got corruption,
now also on the gbit client (Thinkpad T60). When ftp'ing to ramdisk with
full speed (at a reasonable ~77 MB/sec) it "often" works, but intermediate
writes that cause the ftp to temporarily slow down reliably cause
corrupted files, so I guess tso gets confused when some kind of throttling
sets in during transfer. That is probably why I first noticed it on the
slow 100mbit client.
Maybe turning off sendfile or NAPI just lead to random success - so far it
really looks like tso on the r8169 is the common cause.

thank you
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-06 Thread Holger Hoffstaette

On Wed, 05 Dec 2007 23:54:29 +0100, Francois Romieu wrote:

 Holger Hoffstaette [EMAIL PROTECTED] : [...]
 Should I file this in bugzilla?
 
 Yes.

Thanks for responding - will do. I verified with 2.6.24-rc4 (same bug) and
have some new information about this.
Despite my previous posting the corruption is NOT triggered by NAPI. It
may be related, but even without NAPI but tso on again I got corruption,
now also on the gbit client (Thinkpad T60). When ftp'ing to ramdisk with
full speed (at a reasonable ~77 MB/sec) it often works, but intermediate
writes that cause the ftp to temporarily slow down reliably cause
corrupted files, so I guess tso gets confused when some kind of throttling
sets in during transfer. That is probably why I first noticed it on the
slow 100mbit client.
Maybe turning off sendfile or NAPI just lead to random success - so far it
really looks like tso on the r8169 is the common cause.

thank you
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-02 Thread Holger Hoffstaette
On Sun, 02 Dec 2007 17:00:03 +0100, Holger Hoffstaette wrote:

> On Fri, 30 Nov 2007 10:26:54 -0800, Rick Jones wrote:
> 
>> Could the corruption be seen in a tcpdump trace prior to transmission
>> (ie taken on the sender) or was it only seen after the data passed out
>> the NIC?
> 
> I did the following:
> 
> 1) turn on tso on the server's r8169: ethtool --offload eth0 tso on
> 2) on the server: tcpdump -i eth0 -s 0 -w 
> 3) ftp'ed file to 100mbit client
> 
> As expected the file was corrupted, and the various corrupted byte
> sequences also show up in the tcpdump file at the corresponding offsets.
> 
> I did this with 2.6.22.14, so it does not seem to be a recent regression
> in .23/.24.
> 
> All files can be found here:
> http://hoho.dyndns.org/~holger/dist/r8169-tso/
> 
> I will gladly try out any other tweaks but need some guidance as I don't
> know what exactly to change - maybe without NAPI for the r8169?

Ta-daa! Rebuilding 2.6.22.14 (and I suspect all other versions) without
NAPI for the r8169 but with tso enabled yields NO data corruption; the
ftp'ed file has a good crc, repeatedly.

Any suggestions how to proceed? Should I file this in bugzilla?

thanks
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-02 Thread Holger Hoffstaette

On Fri, 30 Nov 2007 10:26:54 -0800, Rick Jones wrote:

> Could the corruption be seen in a tcpdump trace prior to transmission (ie
> taken on the sender) or was it only seen after the data passed out the
> NIC?

I did the following:

1) turn on tso on the server's r8169: ethtool --offload eth0 tso on
2) on the server: tcpdump -i eth0 -s 0 -w 
3) ftp'ed file to 100mbit client

As expected the file was corrupted, and the various corrupted byte
sequences also show up in the tcpdump file at the corresponding offsets.

I did this with 2.6.22.14, so it does not seem to be a recent regression
in .23/.24.

All files can be found here:
http://hoho.dyndns.org/~holger/dist/r8169-tso/

I will gladly try out any other tweaks but need some guidance as I don't
know what exactly to change - maybe without NAPI for the r8169?

thank you
Holger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-02 Thread Holger Hoffstaette

On Fri, 30 Nov 2007 10:26:54 -0800, Rick Jones wrote:

 Could the corruption be seen in a tcpdump trace prior to transmission (ie
 taken on the sender) or was it only seen after the data passed out the
 NIC?

I did the following:

1) turn on tso on the server's r8169: ethtool --offload eth0 tso on
2) on the server: tcpdump -i eth0 -s 0 -w file
3) ftp'ed file to 100mbit client

As expected the file was corrupted, and the various corrupted byte
sequences also show up in the tcpdump file at the corresponding offsets.

I did this with 2.6.22.14, so it does not seem to be a recent regression
in .23/.24.

All files can be found here:
http://hoho.dyndns.org/~holger/dist/r8169-tso/

I will gladly try out any other tweaks but need some guidance as I don't
know what exactly to change - maybe without NAPI for the r8169?

thank you
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-12-02 Thread Holger Hoffstaette
On Sun, 02 Dec 2007 17:00:03 +0100, Holger Hoffstaette wrote:

 On Fri, 30 Nov 2007 10:26:54 -0800, Rick Jones wrote:
 
 Could the corruption be seen in a tcpdump trace prior to transmission
 (ie taken on the sender) or was it only seen after the data passed out
 the NIC?
 
 I did the following:
 
 1) turn on tso on the server's r8169: ethtool --offload eth0 tso on
 2) on the server: tcpdump -i eth0 -s 0 -w file
 3) ftp'ed file to 100mbit client
 
 As expected the file was corrupted, and the various corrupted byte
 sequences also show up in the tcpdump file at the corresponding offsets.
 
 I did this with 2.6.22.14, so it does not seem to be a recent regression
 in .23/.24.
 
 All files can be found here:
 http://hoho.dyndns.org/~holger/dist/r8169-tso/
 
 I will gladly try out any other tweaks but need some guidance as I don't
 know what exactly to change - maybe without NAPI for the r8169?

Ta-daa! Rebuilding 2.6.22.14 (and I suspect all other versions) without
NAPI for the r8169 but with tso enabled yields NO data corruption; the
ftp'ed file has a good crc, repeatedly.

Any suggestions how to proceed? Should I file this in bugzilla?

thanks
Holger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-30 Thread Holger Hoffstaette

Btw, the r8169 has NAPI enabled.

kernel config:
http://hoho.dyndns.org/~holger/dist/kernel-config-x86-2.6.23.9

dmesg:
http://hoho.dyndns.org/~holger/dist/dmesg

lspci -vv:
http://hoho.dyndns.org/~holger/dist/lspci

thanks
Holger


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-30 Thread Holger Hoffstaette
On Fri, 30 Nov 2007 09:07:53 +0100, Eric Dumazet wrote:

> CC to netdev, it might concern network guys

It is indeed related to network/r8169, more below.

> Could you try with a test file containing unique patterns ?

Same result, here is new information.

- contrary to my first posting, the corruption does not reliably occur
when a second client pulls the file; sorry for that. The difference is
that the box that gets corrupted data only has a 100mbit interface, while
the one that gets working data is completely gigabit (all on the same
switch though).

- after some digging in my server changelogs I noticed that I had enabled
misc. r8169 offload options not too long ago (while migrating to gigabit
and perftesting the new network), and bingo! Turning off tso (leaving all
others on except for UDP which is apparently not implemented) singled out
the corruption while ftp'ing to the slower 100mbit client.

I have since just permanently disabled tso and everything is
fine with and without sendfile. So this seems to be either a bug with the
r8169 or some bad interaction of tso with sendfile, but then maybe it's
just the symptom of a race condition/timing problem. Is tso on the r8169
known to be kaput?

lspci says:

00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit 
Ethernet (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 17
I/O ports at d000 [size=256]
Memory at f6022000 (32-bit, non-prefetchable) [size=256]
[virtual] Expansion ROM at 6000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2

Further suggestions welcome, looks like we're getting somewhere.
I can still create broken files with tso and the unique patterns that Eric
suggested, if that helps tracking down the tso corruption.

thank you!
Holger


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-30 Thread Holger Hoffstaette
On Fri, 30 Nov 2007 09:07:53 +0100, Eric Dumazet wrote:

 CC to netdev, it might concern network guys

It is indeed related to network/r8169, more below.

 Could you try with a test file containing unique patterns ?

Same result, here is new information.

- contrary to my first posting, the corruption does not reliably occur
when a second client pulls the file; sorry for that. The difference is
that the box that gets corrupted data only has a 100mbit interface, while
the one that gets working data is completely gigabit (all on the same
switch though).

- after some digging in my server changelogs I noticed that I had enabled
misc. r8169 offload options not too long ago (while migrating to gigabit
and perftesting the new network), and bingo! Turning off tso (leaving all
others on except for UDP which is apparently not implemented) singled out
the corruption while ftp'ing to the slower 100mbit client.

I have since just permanently disabled tso and everything is
fine with and without sendfile. So this seems to be either a bug with the
r8169 or some bad interaction of tso with sendfile, but then maybe it's
just the symptom of a race condition/timing problem. Is tso on the r8169
known to be kaput?

lspci says:

00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit 
Ethernet (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 17
I/O ports at d000 [size=256]
Memory at f6022000 (32-bit, non-prefetchable) [size=256]
[virtual] Expansion ROM at 6000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2

Further suggestions welcome, looks like we're getting somewhere.
I can still create broken files with tso and the unique patterns that Eric
suggested, if that helps tracking down the tso corruption.

thank you!
Holger


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-30 Thread Holger Hoffstaette

Btw, the r8169 has NAPI enabled.

kernel config:
http://hoho.dyndns.org/~holger/dist/kernel-config-x86-2.6.23.9

dmesg:
http://hoho.dyndns.org/~holger/dist/dmesg

lspci -vv:
http://hoho.dyndns.org/~holger/dist/lspci

thanks
Holger


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-29 Thread Holger Hoffstaette

Hi -

This regular Linux user and lkml lurker just noticed data corruption in
ftp'ed files and narrowed it down to vsftpd using sendfile(). So far this
has never caused problems in the past; I have not noticed this with
2.6.22.x but may have missed it. I do remember reading about some changes
to the underlying splice stuff since .23 so that may have something to do
with it.

The scenario:

- created a file with known bit pattern on Linux server
- ftp-got this file to Windows client: file has bad crc (yes, binary)
- verified with another client: same result

I have thus far eliminated (to the best of my knowledge) NICs, switches,
cables, the Windows FTP clients, the hard disk in the server (SATA, ext3):
nothing suspicious in any logs. Box is an AMD Sempron 2600+ with 1.5 GB
RAM, added rt8169 card, Gentoo, vsftpd stable 2.0.5 - nothing fancy.
Transferring the file with samba (interestingly with sendfile enabled) and
via ftp but from /dev/shm repeatably works fine; pulling from disk creates
bad crc, every time. The file is readable and can be copied, verified etc.
over and over so I'm sure that I'm not falling prey to a false positive.
ifconfig indicates no dropped or otherwise corrupted packets.
I noticed this first with 2.6.4-rc3, but also just tried the latest stable
2.6.23.9 with the same config, with no change in behaviour. After setting
vsftpd to use_sendfile=NO, gigs can be transferred without corruption.

The data corruption is sporadic, but absolutely repeatable. The file with
the known good pattern just contains multiple lines of:

012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
..etc..

A corrupted file is missing random characters, so that the corrupted lines
looks like this (line numbers added by me):

19785: 012345678901234567890123456789012345678901234567890
19786: 01234567890123456789012345678901234567890123678901234567890
19787: 012345678901234567890123456789012345678901234567890

or:

20074: 012345678901234567890123456789012345678901234567890
20075: 
01234567890123456789012345678901234567890123012345678901234567890123456789012345678901234567890
20076: 012345678901234567890123456789012345678901234567890

Again, other network or hd traffic shows no signs of gremlins; the box is
perfectly stable, and turning sendfile on or off triggers/untriggers the
corruption reliably.  I will try 2.6.22.x over the weekend, and before I
bother lkml with dmesg/.config etc. I wanted to fish for initial thoughts.

thanks
Holger


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Reproducible data corruption with sendfile+vsftp - splice regression?

2007-11-29 Thread Holger Hoffstaette

Hi -

This regular Linux user and lkml lurker just noticed data corruption in
ftp'ed files and narrowed it down to vsftpd using sendfile(). So far this
has never caused problems in the past; I have not noticed this with
2.6.22.x but may have missed it. I do remember reading about some changes
to the underlying splice stuff since .23 so that may have something to do
with it.

The scenario:

- created a file with known bit pattern on Linux server
- ftp-got this file to Windows client: file has bad crc (yes, binary)
- verified with another client: same result

I have thus far eliminated (to the best of my knowledge) NICs, switches,
cables, the Windows FTP clients, the hard disk in the server (SATA, ext3):
nothing suspicious in any logs. Box is an AMD Sempron 2600+ with 1.5 GB
RAM, added rt8169 card, Gentoo, vsftpd stable 2.0.5 - nothing fancy.
Transferring the file with samba (interestingly with sendfile enabled) and
via ftp but from /dev/shm repeatably works fine; pulling from disk creates
bad crc, every time. The file is readable and can be copied, verified etc.
over and over so I'm sure that I'm not falling prey to a false positive.
ifconfig indicates no dropped or otherwise corrupted packets.
I noticed this first with 2.6.4-rc3, but also just tried the latest stable
2.6.23.9 with the same config, with no change in behaviour. After setting
vsftpd to use_sendfile=NO, gigs can be transferred without corruption.

The data corruption is sporadic, but absolutely repeatable. The file with
the known good pattern just contains multiple lines of:

012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
012345678901234567890123456789012345678901234567890
..etc..

A corrupted file is missing random characters, so that the corrupted lines
looks like this (line numbers added by me):

19785: 012345678901234567890123456789012345678901234567890
19786: 01234567890123456789012345678901234567890123678901234567890
19787: 012345678901234567890123456789012345678901234567890

or:

20074: 012345678901234567890123456789012345678901234567890
20075: 
01234567890123456789012345678901234567890123012345678901234567890123456789012345678901234567890
20076: 012345678901234567890123456789012345678901234567890

Again, other network or hd traffic shows no signs of gremlins; the box is
perfectly stable, and turning sendfile on or off triggers/untriggers the
corruption reliably.  I will try 2.6.22.x over the weekend, and before I
bother lkml with dmesg/.config etc. I wanted to fish for initial thoughts.

thanks
Holger


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/