Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Andrew Morton
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8778
 
Summary: Ocotea board: kernel reports access of bad area during
 boot with DEBUG_SLAB=y
Product: Platform Specific/Hardware
Version: 2.5
  KernelVersion: 2.6.22
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: PPC-32
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: not known - was probably
 already an issue in 2.6.10
 Distribution: not relevant for this issue.
 Hardware Environment: AMCC Ocotea board
 Software Environment: not relevant for this issue.
 Problem Description: see title.
 
 Steps to reproduce:
 1. Compile the 2.6.22 kernel with the attached .config
 2. Boot an Ocotea  board with this kernel.
 3. Observe the output that appears on the serial console.
 
 U-Boot 1.1.1 (Nov 10 2005 - 16:29:34)
 
 IBM PowerPC 440 GUNKNOWN (PVR=51b21892)
 Board: IBM 440GX Evaluation Board
 VCO: 1066 MHz
 CPU: 533 MHz
 PLB: 152 MHz
 OPB: 76 MHz
 EPB: 76 MHz
 I2C:   ready
 DRAM:  I2c read: failed 4
 I2c read: failed 4
 256 MB
 FLASH:  5 MB
 PCI:   Bus Dev VenId DevId Class Int
 In:serial
 Out:   serial
 Err:   serial
 KGDB:  kgdb ready
 ready
 Net:   ppc_440x_eth0
 BEDBUG:ready
 = boot
 Waiting for PHY auto negotiation to complete.. done
 ENET Speed is 100 Mbps - FULL duplex connection
 Using ppc_440x_eth0 device
 TFTP from server 172.30.36.154; our IP address is 172.30.39.77
 Filename 'ocotea-vanassb'.
 Load address: 0x100
 Loading: T #
  #
  #
  #
  #
 done
 Bytes transferred = 1415440 (159910 hex)
 Automatic boot of image at addr 0x0100 ...
 ## Booting image at 0100 ...
Image Name:   Linux-2.6.22
Created:  2007-07-18   6:53:56 UTC
Image Type:   PowerPC Linux Kernel Image (gzip compressed)
Data Size:1415376 Bytes =  1.3 MB
Load Address: 
Entry Point:  
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK
 Linux version 2.6.22 ([EMAIL PROTECTED]) (gcc version 3.4.3 (MontaVista
 3.4.7
 IBM Ocotea port (MontaVista Software, Inc. [EMAIL PROTECTED])
 Zone PFN ranges:
   DMA 0 -65536
   Normal  65536 -65536
 early_node_map[1] active PFN ranges
 0:0 -65536
 Built 1 zonelists.  Total pages: 65024
 Kernel command line: root=/dev/nfs
 nfsroot=172.30.36.154:/nfs-export/RFS_MVL4-00
 PID hash table entries: 1024 (order: 10, 4096 bytes)
 
 | Locking API testsuite:
 
  | spin |wlock |rlock |mutex | wsem | rsem |
   --
  A-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-B-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-B-C-C-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-C-A-B-C deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-B-C-C-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-C-D-B-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
  A-B-C-D-B-C-D-A deadlock:failed|failed|  ok  |failed|failed|failed|
 double unlock:  ok  |  ok  |failed|  ok  |failed|failed|
   initialize held:failed|failed|failed|failed|failed|failed|
  bad unlock order:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
   --
   recursive read-lock: |  ok  | |failed|
recursive read-lock #2: |  ok  | |failed|
 mixed read-write-lock: |failed| |failed|
 mixed write-read-lock: |failed| |failed|
   --
  hard-irqs-on + irq-safe-A/12:failed|failed|  ok  |
  soft-irqs-on + irq-safe-A/12:failed|failed|  ok  |
  hard-irqs-on + irq-safe-A/21:failed|failed|  ok  |
  soft-irqs-on + irq-safe-A/21:failed|failed|  ok  |
sirq-safe-A = hirqs-on/12:failed|failed|  ok  |
sirq-safe-A = hirqs-on/21:failed|failed|  ok  |
  hard-safe-A + irqs-on/12:failed|failed|  ok  |
  soft-safe-A + irqs-on/12:failed|failed|  ok  |
 

Re: WDT with 82xx

2007-07-18 Thread Laurent Pinchart
On Monday 16 July 2007 13:15, Matvejchikov Ilya wrote:
 Hi all!

 Does anybody use watchdog timer with mpc82xx?

I do.

-- 
Laurent Pinchart
CSE Semaphore Belgium

Chaussée de Bruxelles, 732A
B-1410 Waterloo
Belgium

T +32 (2) 387 42 59
F +32 (2) 387 42 75
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Eugene Surovegin
On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
 On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=8778
  
 Summary: Ocotea board: kernel reports access of bad area during
  boot with DEBUG_SLAB=y

Slab debugging is probably the culprit here. I had similar problem 
couple of years ago, not sure something has changed since then, 
haven't checked.

When slab debugging was enabled it made memory allocations non L1 
cache line aligned. This is very bad for DMA on non-coherent cache 
arches (PPC440 is one of those archs).

I have a hack for EMAC which tries to workaround this problem:
http://kernel.ebshome.net/emac_slab_debug.diff
which might help.

-- 
Eugene


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


RE: Machine check exception. 2.6.20 powerpc tree.

2007-07-18 Thread Ramirez-Ortiz, Jorge
Hi Kumar

The address we are trying to access corresponds to a mapped device in
the PCI space
Attached some additional debugging information (we have instrumented the
kernel)

Thanks
jorge


INFO [_probe]: Found Device [irq=58]
INFO [_open]: device opened with irq 58
INFO [_read]: waiting for interrupt
INFO [_intr]: ISR 58

PCI1: Error! ERR_DETECT=0040, ATTR=00516001, addr=80020034,
data=0005

machine_check_exception: task my_process, MCSR=0x10008, NIP=0x10153530
Machine check in user mode.
Caused by (from MCSR=10008): Guarded Load or Cache-Inhibited stwcx.
Bus - Read Data Bus Error

Call Trace:
[C7355EF0] [C0006E64] show_stack+0x48/0x19c (unreliable)
[C7355F20] [C000C04C] machine_check_exception+0x294/0x484
[C7355F40] [C000E48C] ret_from_mcheck_exc+0x0/0xe0

cat /proc/cpuinfo
processor   : 0
cpu : e500v2
clock   : 799.50MHz
revision: 2.0 (pvr 8021 0020)
bogomips: 99.84
timebase: 49968750
platform: MPC85xx CDS
Vendor  : Freescale Semiconductor
Machine : MPC85xx CDS (0xff)
PVR : 0x80210020
SVR : 0x80390220
PLL setting : 0x4
Memory  : 256 MB
LAW 1   : , 2000 - DDR SDRAM
LAW 2   : 8000, 1000 - PCI1
LAW 3   : 9000, 1000 - PCI2
LAW 4   : a000, 1000 - PCI Express
LAW 5   : e100, 0100 - PCI1
LAW 6   : e200, 0100 - PCI2
LAW 7   : e300, 0100 - PCI Express
LAW 8   : f000, 1000 - Local bus
DDR 0   : , 2000 - 2/14/10 addr bits
PCI1 Out_1  : 8000, 1000 - Mem: 8000
PCI1 Out_2  : e100, 0100 - I/O: 
PCI2 Out_1  : 9000, 1000 - Mem: 9000
PCI2 Out_2  : e200, 0100 - I/O: 
PCI3 Out_1  : a000, 1000 - Mem: a000
PCI3 Out_2  : e300, 0100 - I/O: 
PCI1 In_1   : , 2000 (Internal,R:snoop,W:snoop) -
 PF
PCI2 In_1   : , 2000 (Internal,R:snoop,W:snoop) -
 PF
PCI3 In_1   : , 2000 (Internal,R:snoop,W:snoop) -
 PF


__

 

-Original Message-
From: Kumar Gala [mailto:[EMAIL PROTECTED] 
Sent: 17 July 2007 17:29
To: Ramirez-Ortiz, Jorge
Cc: linuxppc-embedded@ozlabs.org
Subject: Re: Machine check exception. 2.6.20 powerpc tree. 


On Jul 17, 2007, at 9:21 AM, Ramirez-Ortiz, Jorge wrote:

 Running our multithreaded application on ppc8548 (E500 core)  
 generates a machine check exception when trying to access some  
 ASIC's registers mapped on the PCI space (This application maps a  
 PCI device to access its registers)



 machine_check_exception: task my_process, MCSR=0x10008, NIP=0x10153530

 Machine check in user mode.

 Caused by (from MCSR=10008): Guarded Load or Cache-Inhibited stwcx.

 Bus - Read Data Bus Error



 Here is the assembly dump of the region of code containing the  
 offending instruction in user-space, with SRR0 pointing us at  
 0x10153530 when the exception is raised:



 0x10153528 _ZN2vk7in_le32EPVKj+16:lwz r0,8(r31)

 0x1015352c _ZN2vk7in_le32EPVKj+20:lwz r9,8(r31)

 0x10153530 _ZN2vk7in_le32EPVKj+24:lwbrx   r0,0,r0

 0x10153534 _ZN2vk7in_le32EPVKj+28:twi 0,r0,0

 0x10153538 _ZN2vk7in_le32EPVKj+32:isync

Can you get the code to dump the value of r0.  I'm wondering if  
you're really getting a read data bus error due to the fact that r0  
is pointing to a PCI address that doesn't have a device that will  
respond.

- k


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


RE: OF devices and non OF devices

2007-07-18 Thread Kári Davíðsson
Yes there was indeed.

Combination of my misunderstanding, device trees and board specific 
initialization.

Things are working now.

Thanks,
kd




From: John Rigby [mailto:[EMAIL PROTECTED] 
Sent: 5. júlí 2007 17:21
To: Kári Davíðsson
Cc: linuxppc-embedded@ozlabs.org
Subject: Re: OF devices and non OF devices


There must be something else wrong with your configuration.
On my Lite5200B fsl_i2c_probe gets called with no changes to the driver.

The kernel version is 2.6.22-rc7

The relevant part of my device tree is: 

[EMAIL PROTECTED] {
device_type = i2c;
compatible = mpc5200b-i2c\0mpc5200-i2c\0fsl-i2c;
cell-index = 0;
reg = 3d00 40; 
interrupts = 2 f 0;
interrupt-parent = mpc5200_pic;
fsl5200-clocking;
};

[EMAIL PROTECTED] {
device_type = i2c; 
compatible = mpc5200b-i2c\0mpc5200-i2c\0fsl-i2c;
cell-index = 1;
reg = 3d40 40;
interrupts = 2 10 0;
interrupt-parent = mpc5200_pic; 
fsl5200-clocking;
};

I turned on DEBUG in drivers/base/dd.c and a call to pr_debug in the probe 
routine
and here are the relevant log messages:

[   27.258245] platform: Matched Device fsl-i2c.0 with Driver fsl-i2c
[   27.258269] platform: Probing driver fsl-i2c with device fsl-i2c.0
[   27.258299] I2C: here in fsl_i2c_probe
[   27.258732] bound device 'fsl-i2c.0' to driver 'fsl-i2c' 
[   27.258756] platform: Bound Device fsl-i2c.0 to Driver fsl-i2c
[   27.258776] platform: Matched Device fsl-i2c.1 with Driver fsl-i2c
[   27.258789] platform: Probing driver fsl-i2c with device fsl-i2c.1
[   27.258821] I2C: here in fsl_i2c_probe
[   27.259269] bound device 'fsl-i2c.1' to driver 'fsl-i2c'
[   27.259293] platform: Bound Device fsl-i2c.1 to Driver fsl-i2c

John


On 7/5/07, John Rigby [EMAIL PROTECTED] wrote: 

kd,

Ok, obviously It doesn't work the way I thought.  Hopefully someone who 
does
understand this will comment.

John 



On 7/4/07, Kári Davíðsson [EMAIL PROTECTED] wrote: 

John, thank you for your answare.

Enabling CONFIG_FSL_SOC only enabled the execution of the init 
function (fsl_i2c_init())
of the fsl-i2c driver (i2c-mpc.c). The .probe function of the 
driver was never called
until I converted the driver to the OF model and added the 
.match_table to the driver structure.

Then I get the .probe function (fsl_i2c_probe()) called and the 
i2c bus set up.

Similar thing happens for the i2c device PCF8563 i.e., the init 
functin of the driver (pcf8563_init())
is called the driver is registered with the kernel, but the 
.probe (pcf8563_probe()) is never called.

The driver pcf8563 has _NO_ exported structures or functions so 
I basically have no handle on it 
that I can utilize in board specific setup.

The way I suspect this is supposed to work is that from the 
board settup files I would do
rtc_dev = platform_device_register_simple(pcf8563, -1, NULL, 
0); 
which should later trigger the calling of the pcf8563_probe() 
function.

This is doing things in the same way as the fsl i2c code, i.e.
i2c_dev = platform_device_register_simple(i2c, i, r, 2);
which by the way does not work untill I have converted the 
fsl_i2c (i2c-mpc.c) driver to the OF structure.

So still the method of gluing together the OF drivers and non 
OF drivers eludes me.

rg
kd

P.S. I did check the 2.6.21-RC7-git3 and found that the 
i2c-mpc.c and the rtc-pcf85763.c are basically the same
as what I am working with in 2.6.20+


From: John Rigby [mailto: [EMAIL PROTECTED]
Sent: 3. júlí 2007 16:31
To: Kári Davíðsson
Cc: linuxppc-embedded@ozlabs.org 
Subject: Re: OF devices and non OF devices


One place to find binding between OF devices and non OF devices 
is in arch/powerpc/sysdev/fsl_soc.c 
The typical pattern is:
if of_find_compatible_node of-device-name
platform_device_register_simple platform-device-name
platform_device_add_data ...



On 7/3/07, Kári Davíðsson [EMAIL PROTECTED] wrote:

Hi,

 

Re: mpc52xx: Correct calculation of FEC RX errors???

2007-07-18 Thread Sylvain Munaut

 Hi

 We are showing figures of more than 4 billion error frames in our
 ethernet interfaces. We have tested that the problem is in a
 substraction (the number of errors decrements with the number of frames).

 So... looking in the fec driver (fec.c) for the calculations we have
 seen that the number of multicast packets is added to the number of
 correct frames in order to get the frame errors...

 But the interesting thing is that we have checked that this
 calculation is something that we have added with a patch by Grzegorz
 Bernacki in this list.

 So... The funny thing is... Why a patch that solves the problem for
 Grzegorz produces the same problem for us?

 And... by the way... I have seen IEEE802.3, and when they talk about
 aFramesReceivedOK (which I suppose is the ieee_r_frame_ok in the
 driver), and they do not say a word about not including multicast
 packets in it...

 Any comment will be appreciate.
The only comment I have, is that yes, the computation are flawed.
And that's not very high in my priority list.

I don't think the path posted on the list fully fix the issue but I
really don't want to spend hours trying to figure out exactly what
values are reported in all those counters. You're welcome to do so if
you have some free time ...

There are other stuff wrong in this driver (try ifconfig eth0 down, then
send some broad cast traffic on the network  you'll see some fifo
error popping up ) ...


Sylvain
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


mpc52xx: Correct calculation of FEC RX errors???

2007-07-18 Thread Miguel Angel Alvarez
Hi

We are showing figures of more than 4 billion error frames in our 
ethernet interfaces. We have tested that the problem is in a 
substraction (the number of errors decrements with the number of frames).

So... looking in the fec driver (fec.c) for the calculations we have 
seen that the number of multicast packets is added to the number of 
correct frames in order to get the frame errors...

But the interesting thing is that we have checked that this calculation 
is something that we have added with a patch by Grzegorz Bernacki in 
this list.

So... The funny thing is... Why a patch that solves the problem for 
Grzegorz produces the same problem for us?

And... by the way... I have seen IEEE802.3, and when they talk about 
aFramesReceivedOK (which I suppose is the ieee_r_frame_ok in the 
driver), and they do not say a word about not including multicast 
packets in it...

Any comment will be appreciate.

Miguel Ángel Álvarez
** 
 
- PLEASE NOTE 
---
This message, along with any attachments, may be confidential or legally 
privileged. 
It is intended only for the named person(s), who is/are the only authorized 
recipients.
If this message has reached you in error, kindly destroy it without review and 
notify the sender immediately.
Thank you for your help.
ZIV uses virus scanning software but excludes any liability for viruses 
contained in any attachment.
 
 ROGAMOS LEA ESTE TEXTO 
---
Este mensaje y sus anexos pueden contener información confidencial y/o con 
derecho legal. 
Está dirigido únicamente a la/s persona/s o entidad/es reseñadas como único 
destinatario autorizado.
Si este mensaje le hubiera llegado por error, por favor elimínelo sin revisarlo 
ni reenviarlo y notifíquelo inmediatamente al remitente. Gracias por su 
colaboración.  
ZIV utiliza software antivirus, pero no se hace responsable de los virus 
contenidos en los ficheros anexos.
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Josh Boyer
On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
 On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
  On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
  
   http://bugzilla.kernel.org/show_bug.cgi?id=8778
   
  Summary: Ocotea board: kernel reports access of bad area during
   boot with DEBUG_SLAB=y
 
 Slab debugging is probably the culprit here. I had similar problem 
 couple of years ago, not sure something has changed since then, 
 haven't checked.
 
 When slab debugging was enabled it made memory allocations non L1 
 cache line aligned. This is very bad for DMA on non-coherent cache 
 arches (PPC440 is one of those archs).
 
 I have a hack for EMAC which tries to workaround this problem:
   http://kernel.ebshome.net/emac_slab_debug.diff
 which might help.

Would you be opposed to including that patch in mainline?  I'd like to
have the bug reporter try it and then get it in if it fixes the issue.

josh

___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Eugene Surovegin
On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
 On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
  On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
   On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
   
http://bugzilla.kernel.org/show_bug.cgi?id=8778

   Summary: Ocotea board: kernel reports access of bad area 
during
boot with DEBUG_SLAB=y
  
  Slab debugging is probably the culprit here. I had similar problem 
  couple of years ago, not sure something has changed since then, 
  haven't checked.
  
  When slab debugging was enabled it made memory allocations non L1 
  cache line aligned. This is very bad for DMA on non-coherent cache 
  arches (PPC440 is one of those archs).
  
  I have a hack for EMAC which tries to workaround this problem:
  http://kernel.ebshome.net/emac_slab_debug.diff
  which might help.
 
 Would you be opposed to including that patch in mainline?

Yes. I don't think it's the right way to fix this issue. IMO, the 
right one is to fix slab allocator. You cannot change all drivers to 
do this kind of cache flushing, and yes, I saw the same problem with 
PCI based NIC I tried on Ocotea at the time.

-- 
Eugene
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Josh Boyer
On Wed, 2007-07-18 at 08:59 -0700, Eugene Surovegin wrote:
 On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
  On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
   On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8778
 
Summary: Ocotea board: kernel reports access of bad area 
 during
 boot with DEBUG_SLAB=y
   
   Slab debugging is probably the culprit here. I had similar problem 
   couple of years ago, not sure something has changed since then, 
   haven't checked.
   
   When slab debugging was enabled it made memory allocations non L1 
   cache line aligned. This is very bad for DMA on non-coherent cache 
   arches (PPC440 is one of those archs).
   
   I have a hack for EMAC which tries to workaround this problem:
 http://kernel.ebshome.net/emac_slab_debug.diff
   which might help.
  
  Would you be opposed to including that patch in mainline?
 
 Yes. I don't think it's the right way to fix this issue. IMO, the 
 right one is to fix slab allocator. You cannot change all drivers to 
 do this kind of cache flushing, and yes, I saw the same problem with 
 PCI based NIC I tried on Ocotea at the time.

Hm... good point.  I'd still like to see if your patch works around the
reporter's problem.

josh

___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Andrew Morton
On Wed, 18 Jul 2007 08:59:40 -0700 Eugene Surovegin [EMAIL PROTECTED] wrote:

 On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
  On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
   On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8778
 
Summary: Ocotea board: kernel reports access of bad area 
 during
 boot with DEBUG_SLAB=y
   
   Slab debugging is probably the culprit here. I had similar problem 
   couple of years ago, not sure something has changed since then, 
   haven't checked.
   
   When slab debugging was enabled it made memory allocations non L1 
   cache line aligned. This is very bad for DMA on non-coherent cache 
   arches (PPC440 is one of those archs).
   
   I have a hack for EMAC which tries to workaround this problem:
 http://kernel.ebshome.net/emac_slab_debug.diff
   which might help.
  
  Would you be opposed to including that patch in mainline?
 
 Yes. I don't think it's the right way to fix this issue. IMO, the 
 right one is to fix slab allocator. You cannot change all drivers to 
 do this kind of cache flushing, and yes, I saw the same problem with 
 PCI based NIC I tried on Ocotea at the time.
 

hm.  It should be the case that providing SLAB_HWCACHE_ALIGN at
kmem_cache_create() time will override slab-debugging's offsetting
of the returned addresses.

Or is the problem occurring with memory which is returned from kmalloc(),
rather than from kmem_cache_alloc()?

A complete description of the problem would help here, please.
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Eugene Surovegin
On Wed, Jul 18, 2007 at 09:55:37AM -0700, Andrew Morton wrote:
 On Wed, 18 Jul 2007 08:59:40 -0700 Eugene Surovegin [EMAIL PROTECTED] wrote:
 
  On Wed, Jul 18, 2007 at 08:41:10AM -0500, Josh Boyer wrote:
   On Wed, 2007-07-18 at 01:34 -0700, Eugene Surovegin wrote:
On Wed, Jul 18, 2007 at 12:52:53AM -0700, Andrew Morton wrote:
 On Wed, 18 Jul 2007 00:07:50 -0700 (PDT) [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=8778
  
 Summary: Ocotea board: kernel reports access of bad area 
  during
  boot with DEBUG_SLAB=y

Slab debugging is probably the culprit here. I had similar problem 
couple of years ago, not sure something has changed since then, 
haven't checked.

When slab debugging was enabled it made memory allocations non L1 
cache line aligned. This is very bad for DMA on non-coherent cache 
arches (PPC440 is one of those archs).

I have a hack for EMAC which tries to workaround this problem:
http://kernel.ebshome.net/emac_slab_debug.diff
which might help.
   
   Would you be opposed to including that patch in mainline?
  
  Yes. I don't think it's the right way to fix this issue. IMO, the 
  right one is to fix slab allocator. You cannot change all drivers to 
  do this kind of cache flushing, and yes, I saw the same problem with 
  PCI based NIC I tried on Ocotea at the time.
  
 
 hm.  It should be the case that providing SLAB_HWCACHE_ALIGN at
 kmem_cache_create() time will override slab-debugging's offsetting
 of the returned addresses.
 
 Or is the problem occurring with memory which is returned from kmalloc(),
 rather than from kmem_cache_alloc()?

It's kmalloc, at least this is how I think skbs are allocated.

Andrew, I don't have access to PPC hw right now (doing MIPS 
development these days), so I cannot quickly check that my theory is 
still correct for the latest kernel. I'd wait for the reporter to try 
my hack and then we can decide what to do. IIRC there was some 
provision in slab allocator to enforce alignment, when I was debugging 
this problem more then a year ago, that option didn't work.

BTW, I think slob allocator had the same issue with alignment as slab 
with enabled debugging (at least at the time I looked at it).

-- 
Eugene

___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Gdbserver syscall clobber

2007-07-18 Thread Bill Gatliff
Daniel Jacobowitz wrote:
 On Mon, Jul 16, 2007 at 10:43:41AM -0500, Bill Gatliff wrote:
   
 recv(4, 0x7d60, 1, 0)   = ? ERESTARTSYS (To be restarted)
 --- SIGIO (I/O possible) @ 0 (0) ---
 syscall_4294966784(0xa, 0x7d34, 0x1, 0, 0x1008a3c7, 0x1008b5a3, 
 0x1008b5a4, 
 

 That's -512, a.k.a. the errno value used by syscall restarting.  I'd
 say your glibc does not obey the restartable syscall convention used
 by your kernel, and when it tries to restart the syscall the errno
 value is not being replaced by the syscall number.  Check the assembly
 for recv.

   

Very good catch!  Thanks s much.  Here's the code, from my libc.a:

 __libc_recv:
0:   94 21 ff d0 stwur1,-48(r1)
4:   90 61 00 14 stw r3,20(r1)
8:   90 81 00 18 stw r4,24(r1)
c:   90 a1 00 1c stw r5,28(r1)
   10:   90 c1 00 20 stw r6,32(r1)
   14:   81 42 00 0c lwz r10,12(r2)
   18:   2c 0a 00 00 cmpwi   r10,0
   1c:   40 82 00 20 bne-3c __libc_recv+0x3c
   20:   38 60 00 0a li  r3,10
   24:   38 81 00 14 addir4,r1,20
   28:   38 00 00 66 li  r0,102
   2c:   44 00 00 02 sc
   30:   38 21 00 30 addir1,r1,48
   34:   4c a3 00 20 bnslr+
   38:   48 00 00 00 b   38 __libc_recv+0x38

Again, this is 603e on linux-2.4.16 glibc-2.2.5 gcc-2.95.3.  (Odd, I
can't seem to find this function in a statically-linked gdbserver, nor
any reference to it in the gdbserver-6.5 source code).

On the kernel side:

_GLOBAL(DoSyscall)
...
 blrl/* Call handler */
 .globl  ret_from_syscall_1
ret_from_syscall_1:
20: stw r3,RESULT(r1)   /* Save result */
 li  r10,-_LAST_ERRNO
 cmpl0,r3,r10
 blt 30f
 neg r3,r3
 cmpi0,r3,ERESTARTNOHAND
 bne 22f
 li  r3,EINTR
22: lwz r10,_CCR(r1)/* Set SO bit in CR */
 orisr10,r10,0x1000
 stw r10,_CCR(r1)
30: stw r3,GPR3(r1) /* Update return value */
 b   ret_from_except
...
ret_from_except:
...
 lwz r3,_CCR(r1)
...
 mtcrf   0xFF,r3
...
 RFI


Now, I'm a little rusty on PPC asm (I've been doing a lot of ARM
lately), but it looks to me like the kernel is setting bit 0 in CR0
(oris r10, r10, 0x1000) a.k.a LT, but the user side is looking at CR0
(bnslr+) bit 3 a.k.a. SO.  Or maybe the other way around, I'm not sure
after reading Sections 1.2 and 2.1 of the Programming Environments manual.

Or am I misinterpreting something?  I must be, this is well-trodden code
I'm thinking...

The readchar() in gdbserver's remote-utils.c just calls read() on the
file descriptor for the socket.  Still trying to track that code down...



b.g.

-- 
Bill Gatliff
[EMAIL PROTECTED]


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: [Bugme-new] [Bug 8778] New: Ocotea board: kernel reports access of bad area during boot with DEBUG_SLAB=y

2007-07-18 Thread Bart Van Assche

On 7/18/07, Eugene Surovegin [EMAIL PROTECTED] wrote:



It's kmalloc, at least this is how I think skbs are allocated.

Andrew, I don't have access to PPC hw right now (doing MIPS
development these days), so I cannot quickly check that my theory is
still correct for the latest kernel. I'd wait for the reporter to try
my hack and then we can decide what to do. IIRC there was some
provision in slab allocator to enforce alignment, when I was debugging
this problem more then a year ago, that option didn't work.

BTW, I think slob allocator had the same issue with alignment as slab
with enabled debugging (at least at the time I looked at it).




Hello Eugene,

In case you didn't notice yet, I have added the following comment to the
kernel bugzilla item:


--- *Comment #5
http://bugzilla.kernel.org/show_bug.cgi?id=8778#c5From Bart
Van Assche [EMAIL PROTECTED] 2007-07-18 07:12:49 *
[replyhttp://bugzilla.kernel.org/show_bug.cgi?id=8778#add_comment]
---

I have downloaded the patch from
http://kernel.ebshome.net/emac_slab_debug.diff, and I have tried it. Hereby I
confirm that this patch solves the reported kernel oops.



--
Regards,

Bart Van Assche.
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

Re: Gdbserver syscall clobber

2007-07-18 Thread Daniel Jacobowitz
On Wed, Jul 18, 2007 at 12:59:42PM -0500, Bill Gatliff wrote:
 Now, I'm a little rusty on PPC asm (I've been doing a lot of ARM
 lately), but it looks to me like the kernel is setting bit 0 in CR0
 (oris r10, r10, 0x1000) a.k.a LT, but the user side is looking at CR0
 (bnslr+) bit 3 a.k.a. SO.  Or maybe the other way around, I'm not sure
 after reading Sections 1.2 and 2.1 of the Programming Environments manual.

It's not checking for restart here - userspace isn't supposed to have to.
It's probably checking for error.  Check for the bit of kernel code
that's supposed to back you up two instructions.

-- 
Daniel Jacobowitz
CodeSourcery
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Machine check exception. 2.6.20 powerpc tree.

2007-07-18 Thread Kumar Gala

On Jul 18, 2007, at 4:27 AM, Ramirez-Ortiz, Jorge wrote:

 Hi Kumar

 The address we are trying to access corresponds to a mapped device in
 the PCI space
 Attached some additional debugging information (we have  
 instrumented the
 kernel)

 Thanks
 jorge


 INFO [_probe]: Found Device [irq=58]
 INFO [_open]: device opened with irq 58
 INFO [_read]: waiting for interrupt
 INFO [_intr]: ISR 58

 PCI1: Error! ERR_DETECT=0040, ATTR=00516001, addr=80020034,
 data=0005

So you are getting a master abort from the target.  I think you need  
to look at your PCI device and see what's going on there.

 machine_check_exception: task my_process, MCSR=0x10008, NIP=0x10153530
 Machine check in user mode.
 Caused by (from MCSR=10008): Guarded Load or Cache-Inhibited stwcx.
 Bus - Read Data Bus Error

 Call Trace:
 [C7355EF0] [C0006E64] show_stack+0x48/0x19c (unreliable)
 [C7355F20] [C000C04C] machine_check_exception+0x294/0x484
 [C7355F40] [C000E48C] ret_from_mcheck_exc+0x0/0xe0

 cat /proc/cpuinfo
 processor   : 0
 cpu : e500v2
 clock   : 799.50MHz
 revision: 2.0 (pvr 8021 0020)
 bogomips: 99.84
 timebase: 49968750
 platform: MPC85xx CDS
 Vendor  : Freescale Semiconductor
 Machine : MPC85xx CDS (0xff)
 PVR : 0x80210020
 SVR : 0x80390220
 PLL setting : 0x4
 Memory  : 256 MB
 LAW 1   : , 2000 - DDR SDRAM
 LAW 2   : 8000, 1000 - PCI1
 LAW 3   : 9000, 1000 - PCI2
 LAW 4   : a000, 1000 - PCI Express
 LAW 5   : e100, 0100 - PCI1
 LAW 6   : e200, 0100 - PCI2
 LAW 7   : e300, 0100 - PCI Express
 LAW 8   : f000, 1000 - Local bus
 DDR 0   : , 2000 - 2/14/10 addr bits
 PCI1 Out_1  : 8000, 1000 - Mem: 8000
 PCI1 Out_2  : e100, 0100 - I/O: 
 PCI2 Out_1  : 9000, 1000 - Mem: 9000
 PCI2 Out_2  : e200, 0100 - I/O: 
 PCI3 Out_1  : a000, 1000 - Mem: a000
 PCI3 Out_2  : e300, 0100 - I/O: 
 PCI1 In_1   : , 2000 (Internal,R:snoop,W:snoop) -
  PF
 PCI2 In_1   : , 2000 (Internal,R:snoop,W:snoop) -
  PF
 PCI3 In_1   : , 2000 (Internal,R:snoop,W:snoop) -
  PF


 __



 -Original Message-
 From: Kumar Gala [mailto:[EMAIL PROTECTED]
 Sent: 17 July 2007 17:29
 To: Ramirez-Ortiz, Jorge
 Cc: linuxppc-embedded@ozlabs.org
 Subject: Re: Machine check exception. 2.6.20 powerpc tree.


 On Jul 17, 2007, at 9:21 AM, Ramirez-Ortiz, Jorge wrote:

 Running our multithreaded application on ppc8548 (E500 core)
 generates a machine check exception when trying to access some
 ASIC's registers mapped on the PCI space (This application maps a
 PCI device to access its registers)



 machine_check_exception: task my_process, MCSR=0x10008,  
 NIP=0x10153530

 Machine check in user mode.

 Caused by (from MCSR=10008): Guarded Load or Cache-Inhibited stwcx.

 Bus - Read Data Bus Error



 Here is the assembly dump of the region of code containing the
 offending instruction in user-space, with SRR0 pointing us at
 0x10153530 when the exception is raised:



 0x10153528 _ZN2vk7in_le32EPVKj+16:lwz r0,8(r31)

 0x1015352c _ZN2vk7in_le32EPVKj+20:lwz r9,8(r31)

 0x10153530 _ZN2vk7in_le32EPVKj+24:lwbrx   r0,0,r0

 0x10153534 _ZN2vk7in_le32EPVKj+28:twi 0,r0,0

 0x10153538 _ZN2vk7in_le32EPVKj+32:isync

 Can you get the code to dump the value of r0.  I'm wondering if
 you're really getting a read data bus error due to the fact that r0
 is pointing to a PCI address that doesn't have a device that will
 respond.

 - k



___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Gdbserver syscall clobber

2007-07-18 Thread Bill Gatliff

Daniel Jacobowitz wrote:

On Mon, Jul 16, 2007 at 10:43:41AM -0500, Bill Gatliff wrote:
  

recv(4, 0x7d60, 1, 0)   = ? ERESTARTSYS (To be restarted)
--- SIGIO (I/O possible) @ 0 (0) ---
syscall_4294966784(0xa, 0x7d34, 0x1, 0, 0x1008a3c7, 0x1008b5a3, 0x1008b5a4, 



That's -512, a.k.a. the errno value used by syscall restarting.  I'd
say your glibc does not obey the restartable syscall convention used
by your kernel, and when it tries to restart the syscall the errno
value is not being replaced by the syscall number.  Check the assembly
for recv.

  


Very good catch!  Thanks s much.  Here's the code, from my libc.a:

 __libc_recv:
  0:   94 21 ff d0 stwur1,-48(r1)
  4:   90 61 00 14 stw r3,20(r1)
  8:   90 81 00 18 stw r4,24(r1)
  c:   90 a1 00 1c stw r5,28(r1)
 10:   90 c1 00 20 stw r6,32(r1)
 14:   81 42 00 0c lwz r10,12(r2)
 18:   2c 0a 00 00 cmpwi   r10,0
 1c:   40 82 00 20 bne-3c __libc_recv+0x3c
 20:   38 60 00 0a li  r3,10
 24:   38 81 00 14 addir4,r1,20
 28:   38 00 00 66 li  r0,102
 2c:   44 00 00 02 sc
 30:   38 21 00 30 addir1,r1,48
 34:   4c a3 00 20 bnslr+
 38:   48 00 00 00 b   38 __libc_recv+0x38

Again, this is 603e on linux-2.4.16 glibc-2.2.5 gcc-2.95.3.  (Odd, I 
can't seem to find this function in a statically-linked gdbserver, nor 
any reference to it in the gdbserver-6.5 source code).


On the kernel side:

_GLOBAL(DoSyscall)
...
   blrl/* Call handler */
   .globl  ret_from_syscall_1
ret_from_syscall_1:
20: stw r3,RESULT(r1)   /* Save result */
   li  r10,-_LAST_ERRNO
   cmpl0,r3,r10
   blt 30f
   neg r3,r3
   cmpi0,r3,ERESTARTNOHAND
   bne 22f
   li  r3,EINTR
22: lwz r10,_CCR(r1)/* Set SO bit in CR */
   orisr10,r10,0x1000
   stw r10,_CCR(r1)
30: stw r3,GPR3(r1) /* Update return value */
   b   ret_from_except
...
ret_from_except:
...
   lwz r3,_CCR(r1)
...
   mtcrf   0xFF,r3
...
   RFI


Now, I'm a little rusty on PPC asm (I've been doing a lot of ARM 
lately), but it looks to me like the kernel is setting bit 0 in CR0 
(oris r10, r10, 0x1000) a.k.a LT, but the user side is looking at CR0 
(bnslr+) bit 3 a.k.a. SO.  Or maybe the other way around, I'm not sure 
after reading Sections 1.2 and 2.1 of the Programming Environments manual.


Or am I misinterpreting something?  I must be, this is well-trodden code 
I'm thinking...


The readchar() in gdbserver's remote-utils.c just calls read() on the 
file descriptor for the socket.  Still trying to track that code down...




b.g.

--
Bill Gatliff
[EMAIL PROTECTED]

___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

Re: can anyone help me to test my ac97 driver

2007-07-18 Thread Joachim Förster
Hi silicom,

On Fri, 2007-07-13 at 13:22 +0800, silicom wrote:
 I have a simple oss ac97 playback driver for xilinx ml403 and
 linux2.6.17 kernel,but when I test it with a *.wav file with sample
 rate 44.1k, there is much noisy, and I want to know whether there's
 problem with my ml403 board or ac97 driver,could anyone be kind to
 help me test it on your board or point out my problem?

Today I took some time and tried to compile your driver, but one of the
first things I saw was that there is a file missing: xac97_l.h. I
think, it contains your low level functions. If you post the file, and
I have some time, I'll test your driver and what it does on the board,
which I have available.
Furthermore your driver depends on xbasic_types.c/h, xio.c/h, which some
people might not have, too ... especially xio.c/h (in my
experience ;-) ) ...

In my last mail (last week) I announced my driver for the AC97
Controller of Xilinx and said that I'm going to release/post it. Since
then, I worked on it once more and now it seems to be pretty stable and
usable (playback support). I added capture support, too.
[Capturing basically works, but there is a problem with higher rates and
ALSA not reading from the intermediate buffer anymore ... not yet
investigated.]

My driver will be published on a website soon. I'll post the link as
soon as possible. Meanwhile, if you or anyone want to try it, just mail
me ([EMAIL PROTECTED]).

 Joachim




signature.asc
Description: This is a digitally signed message part
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

Re: Memory Corruption in Linux kernel MPC8347 revision 3

2007-07-18 Thread Bhupender Saharan

Hi Boris,

When you are running the memory test make sure Data cahe and Instruction
caches are enabled.

Also check your BAT setting, there also Cache enable BIT shall be set.

As the burst transcation will happen only when cache is enabled.

How abt ECC...?

Bhupi



On 7/17/07, Boris Shteinbock [EMAIL PROTECTED] wrote:


 Hi Everyone.
I am working on the Linux port for MPC8347 revision 3 custom build board
with DDR2 memory.

I've successfully ported U-boot (latest git) and the kernel itself,
however during kernel boot I am encountering serious memory corruption
errors. The log for one of the examples is at the bottom of this
message.

Basically, the corruption is always happening somewhere at memory
management intensive tasks such as networking, JFFS2 mounting etc.

As far as I can see, it is not related to some specific driver, because
even it happens even at kernel configured at absolute minimum, ( console
serial driver only and even without it)
The place of the corruption depends on kernel configuration.

The DDR2 memory controller is configured correctly as far as I can tell,
since :
1. DDR2 controller register values are taken from VxWorks bootrom
that works on this board without any problems.
2. u-boot mtest passes successfully
3. u-boot alternative mtest passes successfully
4. My own custom mem tests in u-boot pass successfully
5. If I manage two boot the board into shell prompt (with absolute
minimum configuration) memtester application is also successful.

The minimum configuration that is one I am able to boot into shell is a
kernel configured with serial console and small busybox JFFS2 file
system in the flash. In this configuration, the boot fails the first
time JFFS2 root FS is mounted. However it does boot after reset.

I've tried different kernels with the same results  starting from, I
think, 2.6.16  up to 2.6.22
I tried the kernel that is provided by Freescale for 834x reference
boards. ( with my board support of course)

I tried booting both OF flat trees (powerpc) and bd_t based builds (ppc)

I've also tried all memory management options :
SLAB, SLOB and SLUB (in the latest kernel). They all failed at some
point of time, so the assumption is that the problem is not in the
memory management facilities.

The board manufacturer swears that DDR2 memory controller values are
correct and should work perfectly.

So now I almost out of options and I am seeking your help.
Any type of input on this issue would be greatly appreciated.

Thanks,
Boris

PS. Note that an below example represents failure during DHCP
autoconfiguration. However the similar error happens even when
networking is disabled completely. just in a different place.

= bootm
## Booting image at 0040 ...
   Image Name:   Linux-.6.21.5
   Created:  2007-07-10  14:20:19 UTC
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:898361 Bytes = 877.3 kB
   Load Address: 
   Entry Point:  
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
## Current stack ends at 0x07FA3CF8 = set upper limit to 0x0080
## cmdline at 0x007FFF00 ... 0x007FFF41
bd address  = 0x07FA3FBC
memstart= 0x
memsize = 0x0800
flashstart  = 0xFE00
flashsize   = 0x0200
flashoffset = 0x00033000
sramstart   = 0x
sramsize= 0x
bootflags   = 0x0001
intfreq =528 MHz
busfreq =264 MHz
ethaddr = 00:04:9F:EF:23:35
eth1addr= 00:E0:0C:00:7E:25
IP addr = 10.2.222.20
baudrate= 115200 bps
No initrd
## Transferring control to Linux (at address ) ...
 of_flat_tree = 
Booting without OF Flat tree
Linux version .6.21.5 ([EMAIL PROTECTED]) (gcc version
4.0.0 (DENX ELDK 4.1 4.0.0)) #24 Tue Jul 10 17:20:09 IDT 2007
Zone PFN ranges:
  DMA 0 -32768
  Normal  32768 -32768
early_node_map[1] active PFN ranges
0:0 -32768
Built 1 zonelists.  Total pages: 32512
Kernel command line: console=ttyS0,115200 root=/dev/mtdblock1
rootfstype=jffs2 ip=dhcp
IPIC (128 IRQ sources, 8 External IRQs) at fe000700
PID hash table entries: 512 (order: 9, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 127744k available (1584k kernel code, 444k data, 84k init, 0k
highmem)
Mount-cache hash table entries: 512
NET: Registered protocol family 16
Setup MTD partitions
Generic PHY: Registered new driver
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
JFFS2 version 2.2. (NAND) (C) 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Serial: 8250/16550 driver