Re: slab corruption in skb allocs
Richard Fuchs <[EMAIL PROTECTED]> wrote: > > he memory allocation debugger gives me the following messages under a > vanilla 2.6.10 and 2.6.11 kernel when doing > > 1) hdparm -d0 on my hard disk > 2) tar c / > /dev/null > 3) sending lots of network traffic to the machine (e.g. close to 100 > mbit/s udp packets) > We ended up deciding that this was a bug in the e100 NAPI implementation. I have a not-very-official patch in -mm, at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm1/broken-out/e100-napi-state-machine-fix.patch. Would you be able to test that? AFAIK there has been no official fix for this yet. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Richard Fuchs [EMAIL PROTECTED] wrote: he memory allocation debugger gives me the following messages under a vanilla 2.6.10 and 2.6.11 kernel when doing 1) hdparm -d0 on my hard disk 2) tar c / /dev/null 3) sending lots of network traffic to the machine (e.g. close to 100 mbit/s udp packets) We ended up deciding that this was a bug in the e100 NAPI implementation. I have a not-very-official patch in -mm, at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc1/2.6.12-rc1-mm1/broken-out/e100-napi-state-machine-fix.patch. Would you be able to test that? AFAIK there has been no official fix for this yet. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Scott Feldman wrote: Would you mind giving this patch a try against 2.6.11? I think it's equivalent to Jesse's patch, but less intrusive to the driver. looks good, no more memory corruption errors. thanks for this. cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Scott Feldman wrote: Would you mind giving this patch a try against 2.6.11? I think it's equivalent to Jesse's patch, but less intrusive to the driver. looks good, no more memory corruption errors. thanks for this. cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Mar 6, 2005, at 10:40 AM, Richard Fuchs wrote: Scott Feldman wrote: A bug in the driver. I have a hunch: please try this patch with 2.6.9 or higher: http://marc.theaimsgroup.com/?l=linux-netdev=110726809431611=2 bingo, that fixes it. too bad neither this patch nor the removal of the NAPI config option made it into 2.6.11... Jesse Brandeburg @ Intel found the fix for the bug but I don't think it's been pushed out to Jeff's tree yet, AFAIK. Soon, I would guess. Would you mind giving this patch a try against 2.6.11? I think it's equivalent to Jesse's patch, but less intrusive to the driver. --- linux-2.6.11/drivers/net/e100.c.origSun Mar 6 20:58:15 2005 +++ linux-2.6.11/drivers/net/e100.c Sun Mar 6 21:01:34 2005 @@ -1471,8 +1471,12 @@ static inline int e100_rx_indicate(struc /* If data isn't ready, nothing to indicate */ if(unlikely(!(rfd_status & cb_complete))) - return -EAGAIN; + return -ENODATA; + /* This allows for a fast restart without re-enabling interrupts */ + if(le16_to_cpu(rfd->command) & cb_el) + nic->ru_running = 0; + /* Get actual data size */ actual_size = le16_to_cpu(rfd->actual_size) & 0x3FFF; if(unlikely(actual_size > RFD_BUF_LEN - sizeof(struct rfd))) @@ -1527,7 +1531,11 @@ static inline void e100_rx_clean(struct break; /* Better luck next time (see watchdog) */ } - e100_start_receiver(nic); + /* NAPI: attempt to restart the receiver iff the list is +* totally clean otherwise we'll race between hardware and +* nic->rx_to_clean. */ + if(!work_done || *work_done == 0) + e100_start_receiver(nic); } static void e100_rx_clean_list(struct nic *nic) No. e1000 is a totally different driver/device with very similar name. too bad, i was hoping for an explanation for some unexplainable crashes i've been experiencing... ;) Take the e1000 issue to linux-netdev. -scott - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Scott Feldman wrote: On Mar 5, 2005, at 11:10 AM, Richard Fuchs wrote: looks like you are right, enabling NAPI in 2.6.7 does trigger this. what exactly is this? A bug in the driver. I have a hunch: please try this patch with 2.6.9 or higher: http://marc.theaimsgroup.com/?l=linux-netdev=110726809431611=2 bingo, that fixes it. too bad neither this patch nor the removal of the NAPI config option made it into 2.6.11... also, does this affect the e1000 driver in any way? No. e1000 is a totally different driver/device with very similar name. too bad, i was hoping for an explanation for some unexplainable crashes i've been experiencing... ;) cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Mar 5, 2005, at 11:10 AM, Richard Fuchs wrote: Scott Feldman wrote: Was NAPI turned on for e100 in 2.6.7? If not, turn NAPI on in the 2.6.7 driver and see if you get the same result. If you do, it's very likely the bug is in the e100 driver's NAPI implementation. looks like you are right, enabling NAPI in 2.6.7 does trigger this. what exactly is this? A bug in the driver. I have a hunch: please try this patch with 2.6.9 or higher: http://marc.theaimsgroup.com/?l=linux-netdev=110726809431611=2 i didn't enable NAPI in any of the newer kernel versions i was trying, so i'm somewhat confused. :) NAPI is the only option for new kernels. 2.6.7 had both NAPI and non-NAPI. also, does this affect the e1000 driver in any way? No. e1000 is a totally different driver/device with very similar name. -scott - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Mar 5, 2005, at 11:10 AM, Richard Fuchs wrote: Scott Feldman wrote: Was NAPI turned on for e100 in 2.6.7? If not, turn NAPI on in the 2.6.7 driver and see if you get the same result. If you do, it's very likely the bug is in the e100 driver's NAPI implementation. looks like you are right, enabling NAPI in 2.6.7 does trigger this. what exactly is this? A bug in the driver. I have a hunch: please try this patch with 2.6.9 or higher: http://marc.theaimsgroup.com/?l=linux-netdevm=110726809431611w=2 i didn't enable NAPI in any of the newer kernel versions i was trying, so i'm somewhat confused. :) NAPI is the only option for new kernels. 2.6.7 had both NAPI and non-NAPI. also, does this affect the e1000 driver in any way? No. e1000 is a totally different driver/device with very similar name. -scott - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Scott Feldman wrote: On Mar 5, 2005, at 11:10 AM, Richard Fuchs wrote: looks like you are right, enabling NAPI in 2.6.7 does trigger this. what exactly is this? A bug in the driver. I have a hunch: please try this patch with 2.6.9 or higher: http://marc.theaimsgroup.com/?l=linux-netdevm=110726809431611w=2 bingo, that fixes it. too bad neither this patch nor the removal of the NAPI config option made it into 2.6.11... also, does this affect the e1000 driver in any way? No. e1000 is a totally different driver/device with very similar name. too bad, i was hoping for an explanation for some unexplainable crashes i've been experiencing... ;) cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Mar 6, 2005, at 10:40 AM, Richard Fuchs wrote: Scott Feldman wrote: A bug in the driver. I have a hunch: please try this patch with 2.6.9 or higher: http://marc.theaimsgroup.com/?l=linux-netdevm=110726809431611w=2 bingo, that fixes it. too bad neither this patch nor the removal of the NAPI config option made it into 2.6.11... Jesse Brandeburg @ Intel found the fix for the bug but I don't think it's been pushed out to Jeff's tree yet, AFAIK. Soon, I would guess. Would you mind giving this patch a try against 2.6.11? I think it's equivalent to Jesse's patch, but less intrusive to the driver. --- linux-2.6.11/drivers/net/e100.c.origSun Mar 6 20:58:15 2005 +++ linux-2.6.11/drivers/net/e100.c Sun Mar 6 21:01:34 2005 @@ -1471,8 +1471,12 @@ static inline int e100_rx_indicate(struc /* If data isn't ready, nothing to indicate */ if(unlikely(!(rfd_status cb_complete))) - return -EAGAIN; + return -ENODATA; + /* This allows for a fast restart without re-enabling interrupts */ + if(le16_to_cpu(rfd-command) cb_el) + nic-ru_running = 0; + /* Get actual data size */ actual_size = le16_to_cpu(rfd-actual_size) 0x3FFF; if(unlikely(actual_size RFD_BUF_LEN - sizeof(struct rfd))) @@ -1527,7 +1531,11 @@ static inline void e100_rx_clean(struct break; /* Better luck next time (see watchdog) */ } - e100_start_receiver(nic); + /* NAPI: attempt to restart the receiver iff the list is +* totally clean otherwise we'll race between hardware and +* nic-rx_to_clean. */ + if(!work_done || *work_done == 0) + e100_start_receiver(nic); } static void e100_rx_clean_list(struct nic *nic) No. e1000 is a totally different driver/device with very similar name. too bad, i was hoping for an explanation for some unexplainable crashes i've been experiencing... ;) Take the e1000 issue to linux-netdev. -scott - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Scott Feldman wrote: Was NAPI turned on for e100 in 2.6.7? If not, turn NAPI on in the 2.6.7 driver and see if you get the same result. If you do, it's very likely the bug is in the e100 driver's NAPI implementation. looks like you are right, enabling NAPI in 2.6.7 does trigger this. what exactly is this? i didn't enable NAPI in any of the newer kernel versions i was trying, so i'm somewhat confused. :) also, does this affect the e1000 driver in any way? cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Mar 4, 2005, at 4:23 AM, Richard Fuchs wrote: kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 and up do. (i didn't test 2.6.8.x). Was NAPI turned on for e100 in 2.6.7? If not, turn NAPI on in the 2.6.7 driver and see if you get the same result. If you do, it's very likely the bug is in the e100 driver's NAPI implementation. -scott - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Mar 4, 2005, at 4:23 AM, Richard Fuchs wrote: kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 and up do. (i didn't test 2.6.8.x). Was NAPI turned on for e100 in 2.6.7? If not, turn NAPI on in the 2.6.7 driver and see if you get the same result. If you do, it's very likely the bug is in the e100 driver's NAPI implementation. -scott - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Scott Feldman wrote: Was NAPI turned on for e100 in 2.6.7? If not, turn NAPI on in the 2.6.7 driver and see if you get the same result. If you do, it's very likely the bug is in the e100 driver's NAPI implementation. looks like you are right, enabling NAPI in 2.6.7 does trigger this. what exactly is this? i didn't enable NAPI in any of the newer kernel versions i was trying, so i'm somewhat confused. :) also, does this affect the e1000 driver in any way? cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 10:19:21PM +0100, Richard Fuchs wrote: > _correction_ to my previous mail, this does _not_ happen with the > eepro100 driver. (sorry for the confusion, i got the kernel images mixed > up with all the testing i've been doing.) > > could this affect the e1000 driver as well? Yes. > >Send the output of ethtool, please. Doh. 'ethtool -k' is what's needed, sorry. If it's reproduceable, try turning off rx/tx hardware checksumming: ethtool -k eth0 rx off tx off -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Matt Mackall wrote: Doh. 'ethtool -k' is what's needed, sorry. doh myself. :) this won't be very helpful though, as i get the same on all machines (with both drivers): Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available ethtool -k eth0 rx off tx off ditto. i'll try to reproduce this on a machine with e1000 though... cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
_correction_ to my previous mail, this does _not_ happen with the eepro100 driver. (sorry for the confusion, i got the kernel images mixed up with all the testing i've been doing.) could this affect the e1000 driver as well? Matt Mackall wrote: Send the output of ethtool, please. box 1, affected: Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Current message level: 0x20c1 (8385) Link detected: yes box 2, affected: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x0007 (7) Link detected: yes box 3, not affected: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x0007 (7) Link detected: yes This tends to be checksum offloading not working as it should or the like. Can you repeat this with bulk ssh traffic? yes, with various strange effects: Received disconnect from 195.58.172.154: 2: Bad packet length 919251405. or Received disconnect from 195.58.172.154: 2: Corrupted MAC on input. cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Matt Mackall wrote: Which card/driver is this? Is this the same card that's showing ssh troubles? My theory about your ssh trouble only applies to cards with checksum offload. i got the same on all three machines i was testing with, with both the e100 and the eepro100 driver. one of those three machines was the one with the ssh troubles, its card is identified as "Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)", pci id 8086:1229. plus, i couldn't reproduce those problems on a machine with e1000, which does support all kinds of checksum offloading. (there might still be something fishy with the e1000 as well, as i'm not entirely trusting the errors from the slab checkers alone. especially since i don't see those messages when i enable page alloc debugging.) another machine behaves even more strangely... its nic is identified as "Intel Corp. 82801BD PRO/100 VE (LOM) Ethernet Controller (rev 81)", pci id 8086:1039, also apparently not supporting hardware checksums. it does immediately produce the slab debug errors when i bombard it with udp packets while having disk access w/o dma, but remains silent when doing the same with a tcp transfer instead of udp packets. neither ssh traffic nor /dev/zero piped through netcat (no matter in which direction) makes it catch any errors. i only got a _single_ message from the slab debugger when sending /dev/zero through netcat in _both_ directions at the same time (in and out). however, i do get pages and pages of those messages when sending a simple stream of udp packets to the box... again, this is all with the e100 driver, i couldn't produce any similar results with the eepro100 or the e1000 driver yet, but apparently this doesn't necessarily mean that there isn't something wrong anyway... cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 10:52:38PM +0100, Richard Fuchs wrote: > Matt Mackall wrote: > > >Doh. 'ethtool -k' is what's needed, sorry. > > doh myself. :) this won't be very helpful though, as i get the same on > all machines (with both drivers): > > Offload parameters for eth0: > Cannot get device rx csum settings: Operation not supported > Cannot get device tx csum settings: Operation not supported > Cannot get device scatter-gather settings: Operation not supported > Cannot get device tcp segmentation offload settings: Operation not supported > no offload info available Which card/driver is this? Is this the same card that's showing ssh troubles? My theory about your ssh trouble only applies to cards with checksum offload. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 01:23:48PM +0100, Richard Fuchs wrote: > Andrew Morton wrote: > > >I guess it could be hardware. But given that disabling DMA _causes_ the > >problem, rather than fixes it, it seems unlikely. > > > >Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers > >an oops? > > by now, i could reproduce this on two different machines with quite > different hardware, while a third doesn't seem to show those symptoms. > on the second machine, i got the corruption errors from the slab > debugger mostly from the disk access alone, the network traffic was only > minimal (but still present). i was doing write operations on the hdd in > this test. > > kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 > and up do. (i didn't test 2.6.8.x). > > as for DEBUG_PAGEALLOC... when i enable this option, the errors from > DEBUG_SLAB magically disappear. however, my ssh session got disconnected > once while doing the disk access with the message: > > Received disconnect from 195.58.172.154: 2: Bad packet length 4239103034. Send the output of ethtool, please. This tends to be checksum offloading not working as it should or the like. Can you repeat this with bulk ssh traffic? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Richard Fuchs wrote: > [e100] i will try again the eepro100 driver and see if it does the same... yes, the same thing happens with the eepro100 driver. cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Dave Jones wrote: Which network drivers are in use on the box that gets the corruption ? all three that i tested it on are using the e100 driver. the boxes with pci id 8086:1039 and 8086:1229 are seeing corruptions, the one with pci id 8086:2449 is not. i will try again the eepro100 driver and see if it does the same... cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 10:55:31AM +0100, Richard Fuchs wrote: > hello all! > > the memory allocation debugger gives me the following messages under a > vanilla 2.6.10 and 2.6.11 kernel when doing > > 1) hdparm -d0 on my hard disk > 2) tar c / > /dev/null > 3) sending lots of network traffic to the machine (e.g. close to 100 > mbit/s udp packets) Which network drivers are in use on the box that gets the corruption ? Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Andrew Morton wrote: I guess it could be hardware. But given that disabling DMA _causes_ the problem, rather than fixes it, it seems unlikely. Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers an oops? by now, i could reproduce this on two different machines with quite different hardware, while a third doesn't seem to show those symptoms. on the second machine, i got the corruption errors from the slab debugger mostly from the disk access alone, the network traffic was only minimal (but still present). i was doing write operations on the hdd in this test. kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 and up do. (i didn't test 2.6.8.x). as for DEBUG_PAGEALLOC... when i enable this option, the errors from DEBUG_SLAB magically disappear. however, my ssh session got disconnected once while doing the disk access with the message: Received disconnect from 195.58.172.154: 2: Bad packet length 4239103034. never seen this before and not sure if this has anything to do with it... cheers richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Richard Fuchs <[EMAIL PROTECTED]> wrote: > > hello all! > > the memory allocation debugger gives me the following messages under a > vanilla 2.6.10 and 2.6.11 kernel when doing > > 1) hdparm -d0 on my hard disk > 2) tar c / > /dev/null > 3) sending lots of network traffic to the machine (e.g. close to 100 > mbit/s udp packets) > > - > Slab corruption: start=de9141a4, len=2048 > Redzone: 0x5a2cf071/0x5a2cf071. > Last user: [](kfree_skbmem+0x13/0x30) > 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 > 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 > 030: 00 df 08 00 45 00 00 1c 41 d0 40 00 40 11 33 78 > 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b > ... > > and so on. the disk activity alone or the network traffic alone doesn't > trigger this. also doing the same with dma enabled doesn't trigger this > either, but when everything comes together i get this within a second. > kernel is not smp and preempt is not enabled. > > kernel config (from 2.6.11) is attached; if you need any more info, let > me know. is this a kernel issue, or could the hardware be at fault? I guess it could be hardware. But given that disabling DMA _causes_ the problem, rather than fixes it, it seems unlikely. Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers an oops? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
slab corruption in skb allocs
hello all! the memory allocation debugger gives me the following messages under a vanilla 2.6.10 and 2.6.11 kernel when doing 1) hdparm -d0 on my hard disk 2) tar c / > /dev/null 3) sending lots of network traffic to the machine (e.g. close to 100 mbit/s udp packets) - Slab corruption: start=de9141a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 41 d0 40 00 40 11 33 78 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Next obj: start=de9149b0, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Slab corruption: start=de92e8b0, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 41 cf 40 00 40 11 33 79 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Prev obj: start=de92e0a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=de92f0bc, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Slab corruption: start=def5e3a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c e3 14 40 00 40 11 92 33 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Next obj: start=def5ebb0, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Slab corruption: start=de938b30, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c e3 13 40 00 40 11 92 34 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Prev obj: start=de938324, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=de93933c, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [](alloc_skb+0x47/0xf0) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 20 a0 00 00 42 cd a9 1e ff ff ff ff 3c c0 Slab corruption: start=de96aa30, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 0d e4 40 00 40 11 67 64 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Prev obj: start=de96a224, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=de96b23c, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [](alloc_skb+0x47/0xf0) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 00 00 00 00 b6 81 15 1f ff ff ff ff 00 00 Slab corruption: start=de8fa5a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 0d e3 40 00 40 11 67 65 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Next obj: start=de8fadb0, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [](alloc_skb+0x47/0xf0) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 00 00 00 00 ce 96 92 1e ff ff ff ff 00 00 - and so on. the disk activity alone or the network traffic alone doesn't trigger this. also doing the same with dma enabled doesn't trigger this either, but when everything comes together i get this within a second. kernel is not smp and preempt is not enabled. kernel config (from 2.6.11) is attached; if you need any more info, let me know. is this a kernel issue, or could the hardware be at fault? cheers richard # # Automatically generated make config: don't edit #
slab corruption in skb allocs
hello all! the memory allocation debugger gives me the following messages under a vanilla 2.6.10 and 2.6.11 kernel when doing 1) hdparm -d0 on my hard disk 2) tar c / /dev/null 3) sending lots of network traffic to the machine (e.g. close to 100 mbit/s udp packets) - Slab corruption: start=de9141a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 41 d0 40 00 40 11 33 78 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Next obj: start=de9149b0, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Slab corruption: start=de92e8b0, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 41 cf 40 00 40 11 33 79 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Prev obj: start=de92e0a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=de92f0bc, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Slab corruption: start=def5e3a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c e3 14 40 00 40 11 92 33 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Next obj: start=def5ebb0, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Slab corruption: start=de938b30, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c e3 13 40 00 40 11 92 34 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Prev obj: start=de938324, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=de93933c, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [c03b7e97](alloc_skb+0x47/0xf0) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 20 a0 00 00 42 cd a9 1e ff ff ff ff 3c c0 Slab corruption: start=de96aa30, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 0d e4 40 00 40 11 67 64 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Prev obj: start=de96a224, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: start=de96b23c, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [c03b7e97](alloc_skb+0x47/0xf0) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 00 00 00 00 b6 81 15 1f ff ff ff ff 00 00 Slab corruption: start=de8fa5a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 0d e3 40 00 40 11 67 65 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b Next obj: start=de8fadb0, len=2048 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [c03b7e97](alloc_skb+0x47/0xf0) 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 00 00 00 00 ce 96 92 1e ff ff ff ff 00 00 - and so on. the disk activity alone or the network traffic alone doesn't trigger this. also doing the same with dma enabled doesn't trigger this either, but when everything comes together i get this within a second. kernel is not smp and preempt is not enabled. kernel config (from 2.6.11) is attached; if you need any more info, let me know. is this a
Re: slab corruption in skb allocs
Richard Fuchs [EMAIL PROTECTED] wrote: hello all! the memory allocation debugger gives me the following messages under a vanilla 2.6.10 and 2.6.11 kernel when doing 1) hdparm -d0 on my hard disk 2) tar c / /dev/null 3) sending lots of network traffic to the machine (e.g. close to 100 mbit/s udp packets) - Slab corruption: start=de9141a4, len=2048 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [c03b8163](kfree_skbmem+0x13/0x30) 010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0 020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00 030: 00 df 08 00 45 00 00 1c 41 d0 40 00 40 11 33 78 040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b ... and so on. the disk activity alone or the network traffic alone doesn't trigger this. also doing the same with dma enabled doesn't trigger this either, but when everything comes together i get this within a second. kernel is not smp and preempt is not enabled. kernel config (from 2.6.11) is attached; if you need any more info, let me know. is this a kernel issue, or could the hardware be at fault? I guess it could be hardware. But given that disabling DMA _causes_ the problem, rather than fixes it, it seems unlikely. Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers an oops? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Andrew Morton wrote: I guess it could be hardware. But given that disabling DMA _causes_ the problem, rather than fixes it, it seems unlikely. Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers an oops? by now, i could reproduce this on two different machines with quite different hardware, while a third doesn't seem to show those symptoms. on the second machine, i got the corruption errors from the slab debugger mostly from the disk access alone, the network traffic was only minimal (but still present). i was doing write operations on the hdd in this test. kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 and up do. (i didn't test 2.6.8.x). as for DEBUG_PAGEALLOC... when i enable this option, the errors from DEBUG_SLAB magically disappear. however, my ssh session got disconnected once while doing the disk access with the message: Received disconnect from 195.58.172.154: 2: Bad packet length 4239103034. never seen this before and not sure if this has anything to do with it... cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 10:55:31AM +0100, Richard Fuchs wrote: hello all! the memory allocation debugger gives me the following messages under a vanilla 2.6.10 and 2.6.11 kernel when doing 1) hdparm -d0 on my hard disk 2) tar c / /dev/null 3) sending lots of network traffic to the machine (e.g. close to 100 mbit/s udp packets) Which network drivers are in use on the box that gets the corruption ? Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Dave Jones wrote: Which network drivers are in use on the box that gets the corruption ? all three that i tested it on are using the e100 driver. the boxes with pci id 8086:1039 and 8086:1229 are seeing corruptions, the one with pci id 8086:2449 is not. i will try again the eepro100 driver and see if it does the same... cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Richard Fuchs wrote: [e100] i will try again the eepro100 driver and see if it does the same... yes, the same thing happens with the eepro100 driver. cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 01:23:48PM +0100, Richard Fuchs wrote: Andrew Morton wrote: I guess it could be hardware. But given that disabling DMA _causes_ the problem, rather than fixes it, it seems unlikely. Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers an oops? by now, i could reproduce this on two different machines with quite different hardware, while a third doesn't seem to show those symptoms. on the second machine, i got the corruption errors from the slab debugger mostly from the disk access alone, the network traffic was only minimal (but still present). i was doing write operations on the hdd in this test. kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 and up do. (i didn't test 2.6.8.x). as for DEBUG_PAGEALLOC... when i enable this option, the errors from DEBUG_SLAB magically disappear. however, my ssh session got disconnected once while doing the disk access with the message: Received disconnect from 195.58.172.154: 2: Bad packet length 4239103034. Send the output of ethtool, please. This tends to be checksum offloading not working as it should or the like. Can you repeat this with bulk ssh traffic? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 10:52:38PM +0100, Richard Fuchs wrote: Matt Mackall wrote: Doh. 'ethtool -k' is what's needed, sorry. doh myself. :) this won't be very helpful though, as i get the same on all machines (with both drivers): Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available Which card/driver is this? Is this the same card that's showing ssh troubles? My theory about your ssh trouble only applies to cards with checksum offload. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Matt Mackall wrote: Which card/driver is this? Is this the same card that's showing ssh troubles? My theory about your ssh trouble only applies to cards with checksum offload. i got the same on all three machines i was testing with, with both the e100 and the eepro100 driver. one of those three machines was the one with the ssh troubles, its card is identified as Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08), pci id 8086:1229. plus, i couldn't reproduce those problems on a machine with e1000, which does support all kinds of checksum offloading. (there might still be something fishy with the e1000 as well, as i'm not entirely trusting the errors from the slab checkers alone. especially since i don't see those messages when i enable page alloc debugging.) another machine behaves even more strangely... its nic is identified as Intel Corp. 82801BD PRO/100 VE (LOM) Ethernet Controller (rev 81), pci id 8086:1039, also apparently not supporting hardware checksums. it does immediately produce the slab debug errors when i bombard it with udp packets while having disk access w/o dma, but remains silent when doing the same with a tcp transfer instead of udp packets. neither ssh traffic nor /dev/zero piped through netcat (no matter in which direction) makes it catch any errors. i only got a _single_ message from the slab debugger when sending /dev/zero through netcat in _both_ directions at the same time (in and out). however, i do get pages and pages of those messages when sending a simple stream of udp packets to the box... again, this is all with the e100 driver, i couldn't produce any similar results with the eepro100 or the e1000 driver yet, but apparently this doesn't necessarily mean that there isn't something wrong anyway... cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
_correction_ to my previous mail, this does _not_ happen with the eepro100 driver. (sorry for the confusion, i got the kernel images mixed up with all the testing i've been doing.) could this affect the e1000 driver as well? Matt Mackall wrote: Send the output of ethtool, please. box 1, affected: Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Current message level: 0x20c1 (8385) Link detected: yes box 2, affected: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x0007 (7) Link detected: yes box 3, not affected: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x0007 (7) Link detected: yes This tends to be checksum offloading not working as it should or the like. Can you repeat this with bulk ssh traffic? yes, with various strange effects: Received disconnect from 195.58.172.154: 2: Bad packet length 919251405. or Received disconnect from 195.58.172.154: 2: Corrupted MAC on input. cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
Matt Mackall wrote: Doh. 'ethtool -k' is what's needed, sorry. doh myself. :) this won't be very helpful though, as i get the same on all machines (with both drivers): Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available ethtool -k eth0 rx off tx off ditto. i'll try to reproduce this on a machine with e1000 though... cheers richard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: slab corruption in skb allocs
On Fri, Mar 04, 2005 at 10:19:21PM +0100, Richard Fuchs wrote: _correction_ to my previous mail, this does _not_ happen with the eepro100 driver. (sorry for the confusion, i got the kernel images mixed up with all the testing i've been doing.) could this affect the e1000 driver as well? Yes. Send the output of ethtool, please. Doh. 'ethtool -k' is what's needed, sorry. If it's reproduceable, try turning off rx/tx hardware checksumming: ethtool -k eth0 rx off tx off -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/