Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, 18 Jul 2007 09:55:37 -0700 Andrew Morton [EMAIL PROTECTED] wrote: hm. It should be the case that providing SLAB_HWCACHE_ALIGN at kmem_cache_create() time will override slab-debugging's offsetting of the returned addresses. That is true for SLUB but not in SLAB. SLAB has always ignored SLAB_HWCACHE_ALIGN when debugging is on because of the issues involved in placing the redzone values etc. Could be fun to fix. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Product: Platform Specific/Hardware Version: 2.5 KernelVersion: 2.6.22 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-32 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: not known - was probably already an issue in 2.6.10 Distribution: not relevant for this issue. Hardware Environment: AMCC Ocotea board Software Environment: not relevant for this issue. Problem Description: see title. Steps to reproduce: 1. Compile the 2.6.22 kernel with the attached .config 2. Boot an Ocotea board with this kernel. 3. Observe the output that appears on the serial console. U-Boot 1.1.1 (Nov 10 2005 - 16:29:34) IBM PowerPC 440 GUNKNOWN (PVR=51b21892) Board: IBM 440GX Evaluation Board VCO: 1066 MHz CPU: 533 MHz PLB: 152 MHz OPB: 76 MHz EPB: 76 MHz I2C: ready DRAM: I2c read: failed 4 I2c read: failed 4 256 MB FLASH: 5 MB PCI: Bus Dev VenId DevId Class Int In:serial Out: serial Err: serial KGDB: kgdb ready ready Net: ppc_440x_eth0 BEDBUG:ready = boot Waiting for PHY auto negotiation to complete.. done ENET Speed is 100 Mbps - FULL duplex connection Using ppc_440x_eth0 device TFTP from server 172.30.36.154; our IP address is 172.30.39.77 Filename 'ocotea-vanassb'. Load address: 0x100 Loading: T # # # # # done Bytes transferred = 1415440 (159910 hex) Automatic boot of image at addr 0x0100 ... ## Booting image at 0100 ... Image Name: Linux-2.6.22 Created: 2007-07-18 6:53:56 UTC Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size:1415376 Bytes = 1.3 MB Load Address: Entry Point: Verifying Checksum ... OK Uncompressing Kernel Image ... OK Linux version 2.6.22 ([EMAIL PROTECTED]) (gcc version 3.4.3 (MontaVista 3.4.7 IBM Ocotea port (MontaVista Software, Inc. [EMAIL PROTECTED]) Zone PFN ranges: DMA 0 -65536 Normal 65536 -65536 early_node_map[1] active PFN ranges 0:0 -65536 Built 1 zonelists. Total pages: 65024 Kernel command line: root=/dev/nfs nfsroot=172.30.36.154:/nfs-export/RFS_MVL4-00 PID hash table entries: 1024 (order: 10, 4096 bytes) | Locking API testsuite: | spin |wlock |rlock |mutex | wsem | rsem | -- A-A deadlock:failed|failed| ok |failed|failed|failed| A-B-B-A deadlock:failed|failed| ok |failed|failed|failed| A-B-B-C-C-A deadlock:failed|failed| ok |failed|failed|failed| A-B-C-A-B-C deadlock:failed|failed| ok |failed|failed|failed| A-B-B-C-C-D-D-A deadlock:failed|failed| ok |failed|failed|failed| A-B-C-D-B-D-D-A deadlock:failed|failed| ok |failed|failed|failed| A-B-C-D-B-C-D-A deadlock:failed|failed| ok |failed|failed|failed| double unlock: ok | ok |failed| ok |failed|failed| initialize held:failed|failed|failed|failed|failed|failed| bad unlock order: ok | ok | ok | ok | ok | ok | -- recursive read-lock: | ok | |failed| recursive read-lock #2: | ok | |failed| mixed read-write-lock: |failed| |failed| mixed write-read-lock: |failed| |failed| -- hard-irqs-on + irq-safe-A/12:failed|failed| ok | soft-irqs-on + irq-safe-A/12:failed|failed| ok | hard-irqs-on + irq-safe-A/21:failed|failed| ok | soft-irqs-on + irq-safe-A/21:failed|failed| ok | sirq-safe-A = hirqs-on/12:failed|failed| ok | sirq-safe-A = hirqs-on/21:failed|failed| ok | hard-safe-A + irqs-on/12:failed|failed| ok | soft-safe-A + irqs-on/12:failed|failed| ok |
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Slab debugging is probably the culprit here. I had similar problem couple of years ago, not sure something has changed since then, haven't checked. When slab debugging was enabled it made memory allocations non L1 cache line aligned. This is very bad for DMA on non-coherent cache arches (PPC440 is one of those archs). I have a hack for EMAC which tries to workaround this problem: http://kernel.ebshome.net/emac_slab_debug.diff which might help. -- Eugene - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote: On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Slab debugging is probably the culprit here. I had similar problem couple of years ago, not sure something has changed since then, haven't checked. When slab debugging was enabled it made memory allocations non L1 cache line aligned. This is very bad for DMA on non-coherent cache arches (PPC440 is one of those archs). I have a hack for EMAC which tries to workaround this problem: http://kernel.ebshome.net/emac_slab_debug.diff which might help. Would you be opposed to including that patch in mainline? I'd like to have the bug reporter try it and then get it in if it fixes the issue. josh - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote: On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote: On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Slab debugging is probably the culprit here. I had similar problem couple of years ago, not sure something has changed since then, haven't checked. When slab debugging was enabled it made memory allocations non L1 cache line aligned. This is very bad for DMA on non-coherent cache arches (PPC440 is one of those archs). I have a hack for EMAC which tries to workaround this problem: http://kernel.ebshome.net/emac_slab_debug.diff which might help. Would you be opposed to including that patch in mainline? Yes. I don't think it's the right way to fix this issue. IMO, the right one is to fix slab allocator. You cannot change all drivers to do this kind of cache flushing, and yes, I saw the same problem with PCI based NIC I tried on Ocotea at the time. -- Eugene - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, 2007-07-18 at 08:59 -0700, Eugene Surovegin wrote: On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote: On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote: On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Slab debugging is probably the culprit here. I had similar problem couple of years ago, not sure something has changed since then, haven't checked. When slab debugging was enabled it made memory allocations non L1 cache line aligned. This is very bad for DMA on non-coherent cache arches (PPC440 is one of those archs). I have a hack for EMAC which tries to workaround this problem: http://kernel.ebshome.net/emac_slab_debug.diff which might help. Would you be opposed to including that patch in mainline? Yes. I don't think it's the right way to fix this issue. IMO, the right one is to fix slab allocator. You cannot change all drivers to do this kind of cache flushing, and yes, I saw the same problem with PCI based NIC I tried on Ocotea at the time. Hm... good point. I'd still like to see if your patch works around the reporter's problem. josh - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, 18 Jul 2007 08:59:40 -0700 Eugene Surovegin [EMAIL PROTECTED] wrote: On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote: On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote: On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Slab debugging is probably the culprit here. I had similar problem couple of years ago, not sure something has changed since then, haven't checked. When slab debugging was enabled it made memory allocations non L1 cache line aligned. This is very bad for DMA on non-coherent cache arches (PPC440 is one of those archs). I have a hack for EMAC which tries to workaround this problem: http://kernel.ebshome.net/emac_slab_debug.diff which might help. Would you be opposed to including that patch in mainline? Yes. I don't think it's the right way to fix this issue. IMO, the right one is to fix slab allocator. You cannot change all drivers to do this kind of cache flushing, and yes, I saw the same problem with PCI based NIC I tried on Ocotea at the time. hm. It should be the case that providing SLAB_HWCACHE_ALIGN at kmem_cache_create() time will override slab-debugging's offsetting of the returned addresses. Or is the problem occurring with memory which is returned from kmalloc(), rather than from kmem_cache_alloc()? A complete description of the problem would help here, please. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y
On Wed, Jul 18, 2007 at 09:55:37AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 08:59:40 -0700 Eugene Surovegin [EMAIL PROTECTED] wrote: On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote: On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote: On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote: On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8778 Summary: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y Slab debugging is probably the culprit here. I had similar problem couple of years ago, not sure something has changed since then, haven't checked. When slab debugging was enabled it made memory allocations non L1 cache line aligned. This is very bad for DMA on non-coherent cache arches (PPC440 is one of those archs). I have a hack for EMAC which tries to workaround this problem: http://kernel.ebshome.net/emac_slab_debug.diff which might help. Would you be opposed to including that patch in mainline? Yes. I don't think it's the right way to fix this issue. IMO, the right one is to fix slab allocator. You cannot change all drivers to do this kind of cache flushing, and yes, I saw the same problem with PCI based NIC I tried on Ocotea at the time. hm. It should be the case that providing SLAB_HWCACHE_ALIGN at kmem_cache_create() time will override slab-debugging's offsetting of the returned addresses. Or is the problem occurring with memory which is returned from kmalloc(), rather than from kmem_cache_alloc()? It's kmalloc, at least this is how I think skbs are allocated. Andrew, I don't have access to PPC hw right now (doing MIPS development these days), so I cannot quickly check that my theory is still correct for the latest kernel. I'd wait for the reporter to try my hack and then we can decide what to do. IIRC there was some provision in slab allocator to enforce alignment, when I was debugging this problem more then a year ago, that option didn't work. BTW, I think slob allocator had the same issue with alignment as slab with enabled debugging (at least at the time I looked at it). -- Eugene - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html