Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-23 Thread Christoph Lameter
On Wed, 18 Jul 2007 09:55:37 -0700
Andrew Morton [EMAIL PROTECTED] wrote:

 hm.  It should be the case that providing SLAB_HWCACHE_ALIGN at
 kmem_cache_create() time will override slab-debugging's offsetting
 of the returned addresses.


That is true for SLUB but not in SLAB. SLAB has always ignored
SLAB_HWCACHE_ALIGN when debugging is on because of the issues involved
in placing the redzone values etc.  Could be fun to fix.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Andrew Morton
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8778
 
Summary: Ocotea board: kernel reports access of bad area during
 boot with DEBUG_SLAB=y
Product: Platform Specific/Hardware
Version: 2.5
  KernelVersion: 2.6.22
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: PPC-32
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: not known - was probably
 already an issue in 2.6.10
 Distribution: not relevant for this issue.
 Hardware Environment: AMCC Ocotea board
 Software Environment: not relevant for this issue.
 Problem Description: see title.
 
 Steps to reproduce:
 1. Compile the 2.6.22 kernel with the attached .config
 2. Boot an Ocotea  board with this kernel.
 3. Observe the output that appears on the serial console.
 
 U-Boot 1.1.1 (Nov 10 2005 - 16:29:34)
 
 IBM PowerPC 440 GUNKNOWN (PVR=51b21892)
 Board: IBM 440GX Evaluation Board
 VCO: 1066 MHz
 CPU: 533 MHz
 PLB: 152 MHz
 OPB: 76 MHz
 EPB: 76 MHz
 I2C:   ready
 DRAM:  I2c read: failed 4
 I2c read: failed 4
 256 MB
 FLASH:  5 MB
 PCI:   Bus Dev VenId DevId Class Int
 In:serial
 Out:   serial
 Err:   serial
 KGDB:  kgdb ready
 ready
 Net:   ppc_440x_eth0
 BEDBUG:ready
 = boot
 Waiting for PHY auto negotiation to complete.. done
 ENET Speed is 100 Mbps - FULL duplex connection
 Using ppc_440x_eth0 device
 TFTP from server 172.30.36.154; our IP address is 172.30.39.77
 Filename 'ocotea-vanassb'.
 Load address: 0x100
 Loading: T #
  #
  #
  #
  #
 done
 Bytes transferred = 1415440 (159910 hex)
 Automatic boot of image at addr 0x0100 ...
 ## Booting image at 0100 ...
Image Name:   Linux-2.6.22
Created:  2007-07-18   6:53:56 UTC
Image Type:   PowerPC Linux Kernel Image (gzip compressed)
Data Size:1415376 Bytes =  1.3 MB
Load Address: 
Entry Point:  
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK
 Linux version 2.6.22 ([EMAIL PROTECTED]) (gcc version 3.4.3 (MontaVista
 3.4.7
 IBM Ocotea port (MontaVista Software, Inc. [EMAIL PROTECTED])
 Zone PFN ranges:
   DMA 0 -65536
   Normal  65536 -65536
 early_node_map[1] active PFN ranges
 0:0 -65536
 Built 1 zonelists.  Total pages: 65024
 Kernel command line: root=/dev/nfs
 nfsroot=172.30.36.154:/nfs-export/RFS_MVL4-00
 PID hash table entries: 1024 (order: 10, 4096 bytes)
 
 | Locking API testsuite:
 
  | spin |wlock |rlock |mutex | wsem | rsem |
   --
  A-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-B-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-B-C-C-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-C-A-B-C deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-B-C-C-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-C-D-B-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-C-D-B-C-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
 double unlock:  ok  |  ok  |failed|  ok  |failed|failed|
   initialize held:failed|failed|failed|failed|failed|failed|
  bad unlock order:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
   --
   recursive read-lock: |  ok  | |failed|
recursive read-lock #2: |  ok  | |failed|
 mixed read-write-lock: |failed| |failed|
 mixed write-read-lock: |failed| |failed|
   --
  hard-irqs-on + irq-safe-A/12:failed|failed|  ok  |
  soft-irqs-on + irq-safe-A/12:failed|failed|  ok  |
  hard-irqs-on + irq-safe-A/21:failed|failed|  ok  |
  soft-irqs-on + irq-safe-A/21:failed|failed|  ok  |
sirq-safe-A = hirqs-on/12:failed|failed|  ok  |
sirq-safe-A = hirqs-on/21:failed|failed|  ok  |
  hard-safe-A + irqs-on/12:failed|failed|  ok  |
  soft-safe-A + irqs-on/12:failed|failed|  ok  |
 

Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Eugene Surovegin
On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
 On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=8778
  
 Summary: Ocotea board: kernel reports access of bad area during
  boot with DEBUG_SLAB=y

Slab debugging is probably the culprit here. I had similar problem 
couple of years ago, not sure something has changed since then, 
haven't checked.

When slab debugging was enabled it made memory allocations non L1 
cache line aligned. This is very bad for DMA on non-coherent cache 
arches (PPC440 is one of those archs).

I have a hack for EMAC which tries to workaround this problem:
http://kernel.ebshome.net/emac_slab_debug.diff
which might help.

-- 
Eugene


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Josh Boyer
On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
 On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
  On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
  
   http://bugzilla.kernel.org/show_bug.cgi?id=8778
   
  Summary: Ocotea board: kernel reports access of bad area during
   boot with DEBUG_SLAB=y
 
 Slab debugging is probably the culprit here. I had similar problem 
 couple of years ago, not sure something has changed since then, 
 haven't checked.
 
 When slab debugging was enabled it made memory allocations non L1 
 cache line aligned. This is very bad for DMA on non-coherent cache 
 arches (PPC440 is one of those archs).
 
 I have a hack for EMAC which tries to workaround this problem:
   http://kernel.ebshome.net/emac_slab_debug.diff
 which might help.

Would you be opposed to including that patch in mainline?  I'd like to
have the bug reporter try it and then get it in if it fixes the issue.

josh

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Eugene Surovegin
On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
 On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
  On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
   On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
   
http://bugzilla.kernel.org/show_bug.cgi?id=8778

   Summary: Ocotea board: kernel reports access of bad area 
during
boot with DEBUG_SLAB=y
  
  Slab debugging is probably the culprit here. I had similar problem 
  couple of years ago, not sure something has changed since then, 
  haven't checked.
  
  When slab debugging was enabled it made memory allocations non L1 
  cache line aligned. This is very bad for DMA on non-coherent cache 
  arches (PPC440 is one of those archs).
  
  I have a hack for EMAC which tries to workaround this problem:
  http://kernel.ebshome.net/emac_slab_debug.diff
  which might help.
 
 Would you be opposed to including that patch in mainline?

Yes. I don't think it's the right way to fix this issue. IMO, the 
right one is to fix slab allocator. You cannot change all drivers to 
do this kind of cache flushing, and yes, I saw the same problem with 
PCI based NIC I tried on Ocotea at the time.

-- 
Eugene
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Josh Boyer
On Wed, 2007-07-18 at 08:59 -0700, Eugene Surovegin wrote:
 On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
  On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
   On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8778
 
Summary: Ocotea board: kernel reports access of bad area 
 during
 boot with DEBUG_SLAB=y
   
   Slab debugging is probably the culprit here. I had similar problem 
   couple of years ago, not sure something has changed since then, 
   haven't checked.
   
   When slab debugging was enabled it made memory allocations non L1 
   cache line aligned. This is very bad for DMA on non-coherent cache 
   arches (PPC440 is one of those archs).
   
   I have a hack for EMAC which tries to workaround this problem:
 http://kernel.ebshome.net/emac_slab_debug.diff
   which might help.
  
  Would you be opposed to including that patch in mainline?
 
 Yes. I don't think it's the right way to fix this issue. IMO, the 
 right one is to fix slab allocator. You cannot change all drivers to 
 do this kind of cache flushing, and yes, I saw the same problem with 
 PCI based NIC I tried on Ocotea at the time.

Hm... good point.  I'd still like to see if your patch works around the
reporter's problem.

josh

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Andrew Morton
On Wed, 18 Jul 2007 08:59:40 -0700 Eugene Surovegin [EMAIL PROTECTED] wrote:

 On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
  On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
   On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8778
 
Summary: Ocotea board: kernel reports access of bad area 
 during
 boot with DEBUG_SLAB=y
   
   Slab debugging is probably the culprit here. I had similar problem 
   couple of years ago, not sure something has changed since then, 
   haven't checked.
   
   When slab debugging was enabled it made memory allocations non L1 
   cache line aligned. This is very bad for DMA on non-coherent cache 
   arches (PPC440 is one of those archs).
   
   I have a hack for EMAC which tries to workaround this problem:
 http://kernel.ebshome.net/emac_slab_debug.diff
   which might help.
  
  Would you be opposed to including that patch in mainline?
 
 Yes. I don't think it's the right way to fix this issue. IMO, the 
 right one is to fix slab allocator. You cannot change all drivers to 
 do this kind of cache flushing, and yes, I saw the same problem with 
 PCI based NIC I tried on Ocotea at the time.
 

hm.  It should be the case that providing SLAB_HWCACHE_ALIGN at
kmem_cache_create() time will override slab-debugging's offsetting
of the returned addresses.

Or is the problem occurring with memory which is returned from kmalloc(),
rather than from kmem_cache_alloc()?

A complete description of the problem would help here, please.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Eugene Surovegin
On Wed, Jul 18, 2007 at 09:55:37AM -0700, Andrew Morton wrote:
 On Wed, 18 Jul 2007 08:59:40 -0700 Eugene Surovegin [EMAIL PROTECTED] wrote:
 
  On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
   On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
 On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=8778
  
 Summary: Ocotea board: kernel reports access of bad area 
  during
  boot with DEBUG_SLAB=y

Slab debugging is probably the culprit here. I had similar problem 
couple of years ago, not sure something has changed since then, 
haven't checked.

When slab debugging was enabled it made memory allocations non L1 
cache line aligned. This is very bad for DMA on non-coherent cache 
arches (PPC440 is one of those archs).

I have a hack for EMAC which tries to workaround this problem:
http://kernel.ebshome.net/emac_slab_debug.diff
which might help.
   
   Would you be opposed to including that patch in mainline?
  
  Yes. I don't think it's the right way to fix this issue. IMO, the 
  right one is to fix slab allocator. You cannot change all drivers to 
  do this kind of cache flushing, and yes, I saw the same problem with 
  PCI based NIC I tried on Ocotea at the time.
  
 
 hm.  It should be the case that providing SLAB_HWCACHE_ALIGN at
 kmem_cache_create() time will override slab-debugging's offsetting
 of the returned addresses.
 
 Or is the problem occurring with memory which is returned from kmalloc(),
 rather than from kmem_cache_alloc()?

It's kmalloc, at least this is how I think skbs are allocated.

Andrew, I don't have access to PPC hw right now (doing MIPS 
development these days), so I cannot quickly check that my theory is 
still correct for the latest kernel. I'd wait for the reporter to try 
my hack and then we can decide what to do. IIRC there was some 
provision in slab allocator to enforce alignment, when I was debugging 
this problem more then a year ago, that option didn't work.

BTW, I think slob allocator had the same issue with alignment as slab 
with enabled debugging (at least at the time I looked at it).

-- 
Eugene

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html