/proc/$pid/mem

2005-03-19 Thread Richard Fuchs
Aloha!
I know it has been discussed before, but I must express my feelings 
about this issue nonetheless. I find it a major pain in the back that 
/proc/$pid/mem isn't readable by an unrelated process without doing a 
PTRACE_ATTACH first.

I mainly want to ask: is there a good reason to not drop this restriction?
I can read all the machine's physical memory and all of the kernel's 
address space (/dev/mem, /proc/kcore) non-intrusively, but I can't do 
the same on a single process. It seems to me that /proc/$pid/mem should 
work analogous to /dev/mem or /proc/kcore, but currently in practice it 
doesn't, and I don't see a good reason why it is supposed to be that way.

Cheers
Richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


/proc/$pid/mem

2005-03-19 Thread Richard Fuchs
Aloha!
I know it has been discussed before, but I must express my feelings 
about this issue nonetheless. I find it a major pain in the back that 
/proc/$pid/mem isn't readable by an unrelated process without doing a 
PTRACE_ATTACH first.

I mainly want to ask: is there a good reason to not drop this restriction?
I can read all the machine's physical memory and all of the kernel's 
address space (/dev/mem, /proc/kcore) non-intrusively, but I can't do 
the same on a single process. It seems to me that /proc/$pid/mem should 
work analogous to /dev/mem or /proc/kcore, but currently in practice it 
doesn't, and I don't see a good reason why it is supposed to be that way.

Cheers
Richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-07 Thread Richard Fuchs
Scott Feldman wrote:
Would you mind giving this patch a try against 2.6.11?  I think it's 
equivalent to Jesse's patch, but less intrusive to the driver.
looks good, no more memory corruption errors. thanks for this.
cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-07 Thread Richard Fuchs
Scott Feldman wrote:
Would you mind giving this patch a try against 2.6.11?  I think it's 
equivalent to Jesse's patch, but less intrusive to the driver.
looks good, no more memory corruption errors. thanks for this.
cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-06 Thread Richard Fuchs
Scott Feldman wrote:
On Mar 5, 2005, at 11:10 AM, Richard Fuchs wrote:

looks like you are right, enabling NAPI in 2.6.7 does trigger this.
what exactly is this?

A bug in the driver.  I have a hunch: please try this patch with 2.6.9 
or higher:

http://marc.theaimsgroup.com/?l=linux-netdev=110726809431611=2
bingo, that fixes it. too bad neither this patch nor the removal of the 
NAPI config option made it into 2.6.11...

  also, does this affect the e1000 driver in any way?

No.  e1000 is a totally different driver/device with very similar name.
too bad, i was hoping for an explanation for some unexplainable crashes 
i've been experiencing... ;)

cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-06 Thread Richard Fuchs
Scott Feldman wrote:
On Mar 5, 2005, at 11:10 AM, Richard Fuchs wrote:

looks like you are right, enabling NAPI in 2.6.7 does trigger this.
what exactly is this?

A bug in the driver.  I have a hunch: please try this patch with 2.6.9 
or higher:

http://marc.theaimsgroup.com/?l=linux-netdevm=110726809431611w=2
bingo, that fixes it. too bad neither this patch nor the removal of the 
NAPI config option made it into 2.6.11...

  also, does this affect the e1000 driver in any way?

No.  e1000 is a totally different driver/device with very similar name.
too bad, i was hoping for an explanation for some unexplainable crashes 
i've been experiencing... ;)

cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-05 Thread Richard Fuchs
Scott Feldman wrote:
Was NAPI turned on for e100 in 2.6.7?  If not, turn NAPI on in the 2.6.7 
driver and see if you get the same result.  If you do, it's very likely 
the bug is in the e100 driver's NAPI implementation.
looks like you are right, enabling NAPI in 2.6.7 does trigger this.
what exactly is this? i didn't enable NAPI in any of the newer kernel 
versions i was trying, so i'm somewhat confused. :)  also, does this 
affect the e1000 driver in any way?

cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-05 Thread Richard Fuchs
Scott Feldman wrote:
Was NAPI turned on for e100 in 2.6.7?  If not, turn NAPI on in the 2.6.7 
driver and see if you get the same result.  If you do, it's very likely 
the bug is in the e100 driver's NAPI implementation.
looks like you are right, enabling NAPI in 2.6.7 does trigger this.
what exactly is this? i didn't enable NAPI in any of the newer kernel 
versions i was trying, so i'm somewhat confused. :)  also, does this 
affect the e1000 driver in any way?

cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Matt Mackall wrote:
Doh. 'ethtool -k' is what's needed, sorry.
doh myself. :) this won't be very helpful though, as i get the same on 
all machines (with both drivers):

Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
no offload info available
ethtool -k eth0 rx off tx off
ditto. i'll try to reproduce this on a machine with e1000 though...
cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
_correction_ to my previous mail, this does _not_ happen with the 
eepro100 driver. (sorry for the confusion, i got the kernel images mixed 
up with all the testing i've been doing.)

could this affect the e1000 driver as well?
Matt Mackall wrote:
Send the output of ethtool, please.
box 1, affected:
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Current message level: 0x20c1 (8385)
Link detected: yes

box 2, affected:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x0007 (7)
Link detected: yes

box 3, not affected:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x0007 (7)
Link detected: yes

This tends to be checksum
offloading not working as it should or the like. Can you repeat this
with bulk ssh traffic?
yes, with various strange effects:
Received disconnect from 195.58.172.154: 2: Bad packet length 919251405.
or
Received disconnect from 195.58.172.154: 2: Corrupted MAC on input.
cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Matt Mackall wrote:
Which card/driver is this? Is this the same card that's showing ssh
troubles? My theory about your ssh trouble only applies to cards with
checksum offload.
i got the same on all three machines i was testing with, with both the 
e100 and the eepro100 driver. one of those three machines was the one 
with the ssh troubles, its card is identified as "Intel Corp. 82557/8/9 
[Ethernet Pro 100] (rev 08)", pci id 8086:1229. plus, i couldn't 
reproduce those problems on a machine with e1000, which does support all 
kinds of checksum offloading. (there might still be something fishy with 
the e1000 as well, as i'm not entirely trusting the errors from the slab 
checkers alone. especially since i don't see those messages when i 
enable page alloc debugging.)

another machine behaves even more strangely... its nic is identified as 
"Intel Corp. 82801BD PRO/100 VE (LOM) Ethernet Controller (rev 81)", pci 
id 8086:1039, also apparently not supporting hardware checksums. it does 
immediately produce the slab debug errors when i bombard it with udp 
packets while having disk access w/o dma, but remains silent when doing 
the same with a tcp transfer instead of udp packets. neither ssh traffic 
nor /dev/zero piped through netcat (no matter in which direction) makes 
it catch any errors. i only got a _single_ message from the slab 
debugger when sending /dev/zero through netcat in _both_ directions at 
the same time (in and out). however, i do get pages and pages of those 
messages when sending a simple stream of udp packets to the box... 
again, this is all with the e100 driver, i couldn't produce any similar 
results with the eepro100 or the e1000 driver yet, but apparently this 
doesn't necessarily mean that there isn't something wrong anyway...

cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Richard Fuchs wrote:
> [e100]
i will try again the eepro100 driver and see if it does the same...
yes, the same thing happens with the eepro100 driver.
cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Dave Jones wrote:
Which network drivers are in use on the box that gets the corruption ?
all three that i tested it on are using the e100 driver. the boxes with 
pci id 8086:1039 and 8086:1229 are seeing corruptions, the one with pci 
id 8086:2449 is not.

i will try again the eepro100 driver and see if it does the same...
cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Andrew Morton wrote:
I guess it could be hardware.  But given that disabling DMA _causes_ the
problem, rather than fixes it, it seems unlikely.
Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers
an oops?
by now, i could reproduce this on two different machines with quite 
different hardware, while a third doesn't seem to show those symptoms. 
on the second machine, i got the corruption errors from the slab 
debugger mostly from the disk access alone, the network traffic was only 
minimal (but still present). i was doing write operations on the hdd in 
this test.

kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 
and up do. (i didn't test 2.6.8.x).

as for DEBUG_PAGEALLOC... when i enable this option, the errors from 
DEBUG_SLAB magically disappear. however, my ssh session got disconnected 
once while doing the disk access with the message:

Received disconnect from 195.58.172.154: 2: Bad packet length 4239103034.
never seen this before and not sure if this has anything to do with it...
cheers
richard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
hello all!
the memory allocation debugger gives me the following messages under a
vanilla 2.6.10 and 2.6.11 kernel when doing
1) hdparm -d0 on my hard disk
2) tar c / > /dev/null
3) sending lots of network traffic to the machine (e.g. close to 100
mbit/s udp packets)
-
Slab corruption: start=de9141a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 41 d0 40 00 40 11 33 78
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Next obj: start=de9149b0, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=de92e8b0, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 41 cf 40 00 40 11 33 79
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Prev obj: start=de92e0a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=de92f0bc, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=def5e3a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c e3 14 40 00 40 11 92 33
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Next obj: start=def5ebb0, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=de938b30, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c e3 13 40 00 40 11 92 34
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Prev obj: start=de938324, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=de93933c, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [](alloc_skb+0x47/0xf0)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 20 a0 00 00 42 cd a9 1e ff ff ff ff 3c c0
Slab corruption: start=de96aa30, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 0d e4 40 00 40 11 67 64
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Prev obj: start=de96a224, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=de96b23c, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [](alloc_skb+0x47/0xf0)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 00 00 00 00 b6 81 15 1f ff ff ff ff 00 00
Slab corruption: start=de8fa5a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 0d e3 40 00 40 11 67 65
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Next obj: start=de8fadb0, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [](alloc_skb+0x47/0xf0)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 00 00 00 00 ce 96 92 1e ff ff ff ff 00 00
-
and so on. the disk activity alone or the network traffic alone doesn't
trigger this. also doing the same with dma enabled doesn't trigger this
either, but when everything comes together i get this within a second.
kernel is not smp and preempt is not enabled.
kernel config (from 2.6.11) is attached; if you need any more info, let
me know. is this a kernel issue, or could the hardware be at fault?
cheers
richard

#
# Automatically generated make config: don't edit
# 

slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
hello all!
the memory allocation debugger gives me the following messages under a
vanilla 2.6.10 and 2.6.11 kernel when doing
1) hdparm -d0 on my hard disk
2) tar c /  /dev/null
3) sending lots of network traffic to the machine (e.g. close to 100
mbit/s udp packets)
-
Slab corruption: start=de9141a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 41 d0 40 00 40 11 33 78
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Next obj: start=de9149b0, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=de92e8b0, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 41 cf 40 00 40 11 33 79
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Prev obj: start=de92e0a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=de92f0bc, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=def5e3a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c e3 14 40 00 40 11 92 33
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Next obj: start=def5ebb0, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=de938b30, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c e3 13 40 00 40 11 92 34
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Prev obj: start=de938324, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=de93933c, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [c03b7e97](alloc_skb+0x47/0xf0)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 20 a0 00 00 42 cd a9 1e ff ff ff ff 3c c0
Slab corruption: start=de96aa30, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3b c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 0d e4 40 00 40 11 67 64
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Prev obj: start=de96a224, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=de96b23c, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [c03b7e97](alloc_skb+0x47/0xf0)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 00 00 00 00 b6 81 15 1f ff ff ff ff 00 00
Slab corruption: start=de8fa5a4, len=2048
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c03b8163](kfree_skbmem+0x13/0x30)
010: 6b 6b 20 a0 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 3c c0
020: 6b 6b 00 0b cd 1e 1f d2 00 04 23 01 c7 6f 81 00
030: 00 df 08 00 45 00 00 1c 0d e3 40 00 40 11 67 65
040: c0 a8 22 1d c0 a8 22 1b 80 52 30 18 00 08 89 ea
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b
Next obj: start=de8fadb0, len=2048
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [c03b7e97](alloc_skb+0x47/0xf0)
000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
010: 5a 5a 00 00 00 00 ce 96 92 1e ff ff ff ff 00 00
-
and so on. the disk activity alone or the network traffic alone doesn't
trigger this. also doing the same with dma enabled doesn't trigger this
either, but when everything comes together i get this within a second.
kernel is not smp and preempt is not enabled.
kernel config (from 2.6.11) is attached; if you need any more info, let
me know. is this a 

Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Andrew Morton wrote:
I guess it could be hardware.  But given that disabling DMA _causes_ the
problem, rather than fixes it, it seems unlikely.
Could you enable CONFIG_DEBUG_PAGEALLOC in .config and see it that triggers
an oops?
by now, i could reproduce this on two different machines with quite 
different hardware, while a third doesn't seem to show those symptoms. 
on the second machine, i got the corruption errors from the slab 
debugger mostly from the disk access alone, the network traffic was only 
minimal (but still present). i was doing write operations on the hdd in 
this test.

kernel 2.6.7 doesn't show this behavior, while all kernels from 2.6.9 
and up do. (i didn't test 2.6.8.x).

as for DEBUG_PAGEALLOC... when i enable this option, the errors from 
DEBUG_SLAB magically disappear. however, my ssh session got disconnected 
once while doing the disk access with the message:

Received disconnect from 195.58.172.154: 2: Bad packet length 4239103034.
never seen this before and not sure if this has anything to do with it...
cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Dave Jones wrote:
Which network drivers are in use on the box that gets the corruption ?
all three that i tested it on are using the e100 driver. the boxes with 
pci id 8086:1039 and 8086:1229 are seeing corruptions, the one with pci 
id 8086:2449 is not.

i will try again the eepro100 driver and see if it does the same...
cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Richard Fuchs wrote:
 [e100]
i will try again the eepro100 driver and see if it does the same...
yes, the same thing happens with the eepro100 driver.
cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Matt Mackall wrote:
Which card/driver is this? Is this the same card that's showing ssh
troubles? My theory about your ssh trouble only applies to cards with
checksum offload.
i got the same on all three machines i was testing with, with both the 
e100 and the eepro100 driver. one of those three machines was the one 
with the ssh troubles, its card is identified as Intel Corp. 82557/8/9 
[Ethernet Pro 100] (rev 08), pci id 8086:1229. plus, i couldn't 
reproduce those problems on a machine with e1000, which does support all 
kinds of checksum offloading. (there might still be something fishy with 
the e1000 as well, as i'm not entirely trusting the errors from the slab 
checkers alone. especially since i don't see those messages when i 
enable page alloc debugging.)

another machine behaves even more strangely... its nic is identified as 
Intel Corp. 82801BD PRO/100 VE (LOM) Ethernet Controller (rev 81), pci 
id 8086:1039, also apparently not supporting hardware checksums. it does 
immediately produce the slab debug errors when i bombard it with udp 
packets while having disk access w/o dma, but remains silent when doing 
the same with a tcp transfer instead of udp packets. neither ssh traffic 
nor /dev/zero piped through netcat (no matter in which direction) makes 
it catch any errors. i only got a _single_ message from the slab 
debugger when sending /dev/zero through netcat in _both_ directions at 
the same time (in and out). however, i do get pages and pages of those 
messages when sending a simple stream of udp packets to the box... 
again, this is all with the e100 driver, i couldn't produce any similar 
results with the eepro100 or the e1000 driver yet, but apparently this 
doesn't necessarily mean that there isn't something wrong anyway...

cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
_correction_ to my previous mail, this does _not_ happen with the 
eepro100 driver. (sorry for the confusion, i got the kernel images mixed 
up with all the testing i've been doing.)

could this affect the e1000 driver as well?
Matt Mackall wrote:
Send the output of ethtool, please.
box 1, affected:
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Current message level: 0x20c1 (8385)
Link detected: yes

box 2, affected:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x0007 (7)
Link detected: yes

box 3, not affected:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x0007 (7)
Link detected: yes

This tends to be checksum
offloading not working as it should or the like. Can you repeat this
with bulk ssh traffic?
yes, with various strange effects:
Received disconnect from 195.58.172.154: 2: Bad packet length 919251405.
or
Received disconnect from 195.58.172.154: 2: Corrupted MAC on input.
cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slab corruption in skb allocs

2005-03-04 Thread Richard Fuchs
Matt Mackall wrote:
Doh. 'ethtool -k' is what's needed, sorry.
doh myself. :) this won't be very helpful though, as i get the same on 
all machines (with both drivers):

Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
no offload info available
ethtool -k eth0 rx off tx off
ditto. i'll try to reproduce this on a machine with e1000 though...
cheers
richard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/