Re: powerpc/4xx: Regression failed on sil24 (and other) drivers

2011-06-29 Thread Ayman El-Khashab
On Wed, Jun 29, 2011 at 11:42:03AM +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote:
  On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote:
   On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote:
I noticed during a recent development with the 460SX that a
simple device that once worked stopped.  I did a bisect to
find the offending commit and it turns out to be this one:

0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad
commit
commit 0e52247a2ed1f211f0c4f682dc999610a368903f
Author: Cam Macdonell c...@cs.ualberta.ca
Date:   Tue Sep 7 17:25:20 2010 -0700

PCI: fix pci_resource_alignment prototype

 

 
 I suspect you don't have CONFIG_PCI_QUIRKS enabled... I think that's the
 cause of your problem.
 
 It looks like this config option controls both compiling the generic
 quirks in from drivers/pci/quirk.c, and the actually mechanism for
 having quirks in the first place (pci_fixup_device() goes away without
 that config option).
 
 I think we probably want to unconditionally select that if CONFIG_PCI is
 enabled in arch/powerpc...
 
 Can you try changing it and tell us if that helps ?

Yes, that fixed our problem, thanks for your time.  I am
going to try to get the MSI to work.

Ayman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: powerpc/4xx: Regression failed on sil24 (and other) drivers

2011-06-28 Thread Benjamin Herrenschmidt
On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote:
 On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote:
  On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote:
   I noticed during a recent development with the 460SX that a
   simple device that once worked stopped.  I did a bisect to
   find the offending commit and it turns out to be this one:
   
   0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad
   commit
   commit 0e52247a2ed1f211f0c4f682dc999610a368903f
   Author: Cam Macdonell c...@cs.ualberta.ca
   Date:   Tue Sep 7 17:25:20 2010 -0700
   
   PCI: fix pci_resource_alignment prototype
   

Ok, let's see what I can dig out of those logs (sorry for the delay)

Let's start with iomem  ioport, stripped of the legacy  common stuff:

/proc/iomem, bad:

e-e7fff : /plb/pciex@d
  e-e7fff : :40:00.0
e8000-e : /plb/pciex@d2000
  e8000-e : 0001:80:00.0

good:

e-e7fff : /plb/pciex@d
e8000-e : /plb/pciex@d2000
  e8000-e800f : PCI Bus 0001:81
e8000-e80001fff : 0001:81:00.0
  e8000-e80001fff : sata_sil24
e80002000-e8000207f : 0001:81:00.0
  e80002000-e8000207f : sata_sil24

So now that's interesting, you have a device at :40:00.0 that
appears on your first PHB in the bad case and doesn't show up in the
good case.

In addition, on the other PHB, the bus itself doesn't show up in the
bad case. (Let's ignore IOs and focus on mem. for now).

Let's see what lead us to that from the logs. First setup before probing
is all identical. The device at :40:00.0 is detected in both cases,
it's the root complex bridge. So the scanning is identical as expected.

Now the fixup/resource allocation, we start seeing some differences:

Bad:

pci :40:00.0: BAR 0: assigned [mem 0xe-0xe7fff pref]
pci :40:00.0: BAR 0: set to [mem 0xe-0xe7fff pref] (PCI address 
[0x8000-0x]

vs Good:

pci :40:00.0: BAR 0: can't assign mem pref (size 0x8000)

So the bad case succeeds in giving out resources to the root complex,
while the good case fails... fun.

And similarily for the other PHB, bad:

pci 0001:80:00.0: BAR 0: assigned [mem 0xe8000-0xe pref]
pci 0001:80:00.0: BAR 0: set to [mem 0xe8000-0xe pref] (PCI address 
[0x8000-0x]

vs good:

pci 0001:80:00.0: BAR 0: can't assign mem pref (size 0x8000)

This then goes down to the bad case:

pci 0001:80:00.0: BAR 8: can't assign mem (size 0x10)
pci 0001:80:00.0: BAR 7: assigned [io  0xfffe1000-0xfffe1fff]
pci 0001:81:00.0: BAR 2: can't assign mem (size 0x2000)
pci 0001:81:00.0: BAR 0: can't assign mem (size 0x80)

while the good one succeeds assigning BAR 8,2 and 0 :

pci 0001:80:00.0: BAR 8: assigned [mem 0xe8000-0xe800f]
pci 0001:81:00.0: BAR 2: assigned [mem 0xe8000-0xe80001fff 64bit]
pci 0001:81:00.0: BAR 2: set to [mem 0xe8000-0xe80001fff 64bit] (PCI 
address [0x8000-0x80001fff]
pci 0001:81:00.0: BAR 0: assigned [mem 0xe80002000-0xe8000207f 64bit]
pci 0001:81:00.0: BAR 0: set to [mem 0xe80002000-0xe8000207f 64bit] (PCI 
address [0x80002000-0x8000207f]

It looks to me like the BAR 0 of the host bridges are basically taking the
resource aways from the rest of the devices. Now BAR 0 are not bridge
resources, which would have been OK, but they are MMIO resources of the
bridge itself.

On 44x, the problem is that those bridges (stupidly) expose BARs that represent
main memory (inbound DMA). It would make sense if these weren't host bridges
but in this case that's totally non sensical (and thus IMHO a HW bug).

I thought we had code to hide them to avoid that problem, so I wonder what's
going on... If you look at arch/powerpc/sysdev/ppc4xx_pci.c, there's this
quirk:

static void fixup_ppc4xx_pci_bridge(struct pci_dev *dev)
{
struct pci_controller *hose;
int i;

if (dev-devfn != 0 || dev-bus-self != NULL)
return;

hose = pci_bus_to_host(dev-bus);
if (hose == NULL)
return;

if (!of_device_is_compatible(hose-dn, ibm,plb-pciex) 
!of_device_is_compatible(hose-dn, ibm,plb-pcix) 
!of_device_is_compatible(hose-dn, ibm,plb-pci))
return;

if (of_device_is_compatible(hose-dn, ibm,plb440epx-pci) ||
of_device_is_compatible(hose-dn, ibm,plb440grx-pci)) {
hose-indirect_type |= PPC_INDIRECT_TYPE_BROKEN_MRM;
}

/* Hide the PCI host BARs from the kernel as their content doesn't
 * fit well in the resource management
 */
for (i = 0; i  DEVICE_COUNT_RESOURCE; i++) {
dev-resource[i].start = dev-resource[i].end = 0;
dev-resource[i].flags = 0;
}

printk(KERN_INFO PCI: Hiding 4xx host bridge resources %s\n,
   pci_name(dev));
}
DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, 

Re: powerpc/4xx: Regression failed on sil24 (and other) drivers

2011-06-27 Thread Benjamin Herrenschmidt
On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote:
 I noticed during a recent development with the 460SX that a
 simple device that once worked stopped.  I did a bisect to
 find the offending commit and it turns out to be this one:
 
 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad
 commit
 commit 0e52247a2ed1f211f0c4f682dc999610a368903f
 Author: Cam Macdonell c...@cs.ualberta.ca
 Date:   Tue Sep 7 17:25:20 2010 -0700
 
 PCI: fix pci_resource_alignment prototype
 
 I found it working with 2.6.36 but it seems that it is in
 the current trunk as well.
 
 I patched my code to take out this commit and (quickly)
 verified it was ok.  I am guessing the patch is ok since it
 converts int types to resource_size_t.  My guess is that the
 problem is in the sil24 driver but I did not see anything 
 obvious in that code.  Any tips on what could be wrong?  Is
 the problem potentially somewhere being called by that code?
 
 The device driver fails with error -22 on a 460SX (which 
 has the 36 bit pci space).
 
 sil24 /drivers/ata/sata_sil24.c

Can you send a dmesg  output of /proc/iomem  ioport with and without
the patch (same kernel otherwise) ?

Also can you try to figure out (printk's) where in the driver does it
fail ? (Which function fails)

It's possible that this changes something in the core resource
assignment code causing something else to fail elsewhere or exposing
another bug elsewhere with the consequence of leaving the SiL with badly
assigned resources.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: powerpc/4xx: Regression failed on sil24 (and other) drivers

2011-06-27 Thread Ayman El-Khashab
On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote:
 On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote:
  I noticed during a recent development with the 460SX that a
  simple device that once worked stopped.  I did a bisect to
  find the offending commit and it turns out to be this one:
  
  0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad
  commit
  commit 0e52247a2ed1f211f0c4f682dc999610a368903f
  Author: Cam Macdonell c...@cs.ualberta.ca
  Date:   Tue Sep 7 17:25:20 2010 -0700
  
  PCI: fix pci_resource_alignment prototype
  

snip

  
  The device driver fails with error -22 on a 460SX (which 
  has the 36 bit pci space).
  
  sil24 /drivers/ata/sata_sil24.c
 
 Can you send a dmesg  output of /proc/iomem  ioport with and without
 the patch (same kernel otherwise) ?
 
 Also can you try to figure out (printk's) where in the driver does it
 fail ? (Which function fails)

Yes, here is the output from a canyonlands (460ex) that exhibits
the same problem and in the same place.  Of the two devices
I have that fail (sil24 and one other), both fail in exactly 
the same place in lib/devres.c within the function
pcim_iomap_regions.  In that function, there is the
following call -- it fails b/c len returns 0 and tha failure
bubbles up to error -22.

 len = pci_resource_len(pdev, i);

 It's possible that this changes something in the core resource
 assignment code causing something else to fail elsewhere or exposing
 another bug elsewhere with the consequence of leaving the SiL with badly
 assigned resources.

That was my initial thought as well, but I wasn't versed
enough in the pci magic in order to completely figure it
out.

Here is the output, it is dmesg, iomem, then ioports for the
passing and then the failing cases.

thanks
ayman

== Passing ==

Using PowerPC 44x Platform machine description
Linux version 2.6.36-rc3-00186-g0e52247-dirty (aymane@lablinux) (gcc version 
4.2.2) #18 Sat Jun 25 13:51:44 CDT 2011
Found initrd at 0xdfa5c000:0xdfe4cbfa
Found legacy serial port 0 for /plb/opb/serial@ef600300
  mem=4ef600300, taddr=4ef600300, irq=0, clk=6451612, speed=0
Found legacy serial port 1 for /plb/opb/serial@ef600400
  mem=4ef600400, taddr=4ef600400, irq=0, clk=6451612, speed=0
Top of RAM: 0x2000, Total RAM: 0x2000
Memory hole size: 0MB
Zone PFN ranges:
  DMA  0x - 0x0002
  Normal   empty
  HighMem  empty
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0x - 0x0002
On node 0 totalpages: 131072
free_area_init_node: node 0, pgdat c03b9f48, node_mem_map c03ed000
  DMA zone: 1024 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 130048 pages, LIFO batch:31
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
Kernel command line: root=/dev/ram rw mem=512M 
ip=169.254.0.180:169.254.0.100:169.254.0.100:255.255.255.0:tanosx:eth0:off 
panic=1 console=ttyS0,57600
PID hash table entries: 2048 (order: 1, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
High memory: 0k
Memory: 511668k/524288k available (3692k kernel code, 12620k reserved, 176k 
data, 141k bss, 184k init)
Kernel virtual memory layout:
  * 0xfffcf000..0xf000  : fixmap
  * 0xffc0..0xffe0  : highmem PTEs
  * 0xffa0..0xffc0  : consistent mem
  * 0xffa0..0xffa0  : early ioremap
  * 0xe100..0xffa0  : vmalloc  ioremap
SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS:512
UIC0 (32 IRQ sources) at DCR 0xc0
UIC1 (32 IRQ sources) at DCR 0xd0
irq: irq 30 on host /interrupt-controller0 mapped to virtual irq 30
UIC2 (32 IRQ sources) at DCR 0xe0
irq: irq 10 on host /interrupt-controller0 mapped to virtual irq 16
UIC3 (32 IRQ sources) at DCR 0xf0
irq: irq 16 on host /interrupt-controller0 mapped to virtual irq 17
time_init: decrementer frequency = 800.10 MHz
time_init: processor frequency   = 800.10 MHz
clocksource: timebase mult[50] shift[22] registered
clockevent: decrementer mult[ccf7] shift[32] cpu[0]
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
NET: Registered protocol family 16
i2c-core: driver [dummy] registered
irq: irq 11 on host /interrupt-controller1 mapped to virtual irq 18
256k L2-cache enabled
PCIE0: Checking link...
PCIE0: No device detected.
PCI host bridge /plb/pciex@d (primary) ranges:
 MEM 0x000e..0x000e7fff - 0x8000 
 MEM 0x000f..0x000f000f - 0x 
  IO 0x000f8000..0x000f8000 - 0x
 Removing ISA hole at 0x000f
4xx PCI DMA offset set to 0x
/plb/pciex@d: Legacy ISA memory support enabled
PCIE0: successfully set as root-complex
PCIE1: Checking link...
PCIE1: Device detected, waiting for link...

Re: powerpc/4xx: Regression failed on sil24 (and other) drivers

2011-06-27 Thread Benjamin Herrenschmidt
On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote:

 That was my initial thought as well, but I wasn't versed
 enough in the pci magic in order to completely figure it
 out.
 
 Here is the output, it is dmesg, iomem, then ioports for the
 passing and then the failing cases.

Ok, I can see some resource allocation errors in the log, I don't have
enough active brain cells left today to figure out what's going on but
I'll have a look tomorrow.

Cheers,
Ben.

 thanks
 ayman
 
 == Passing ==
 
 Using PowerPC 44x Platform machine description
 Linux version 2.6.36-rc3-00186-g0e52247-dirty (aymane@lablinux) (gcc version 
 4.2.2) #18 Sat Jun 25 13:51:44 CDT 2011
 Found initrd at 0xdfa5c000:0xdfe4cbfa
 Found legacy serial port 0 for /plb/opb/serial@ef600300
   mem=4ef600300, taddr=4ef600300, irq=0, clk=6451612, speed=0
 Found legacy serial port 1 for /plb/opb/serial@ef600400
   mem=4ef600400, taddr=4ef600400, irq=0, clk=6451612, speed=0
 Top of RAM: 0x2000, Total RAM: 0x2000
 Memory hole size: 0MB
 Zone PFN ranges:
   DMA  0x - 0x0002
   Normal   empty
   HighMem  empty
 Movable zone start PFN for each node
 early_node_map[1] active PFN ranges
 0: 0x - 0x0002
 On node 0 totalpages: 131072
 free_area_init_node: node 0, pgdat c03b9f48, node_mem_map c03ed000
   DMA zone: 1024 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 130048 pages, LIFO batch:31
 MMU: Allocated 1088 bytes of context maps for 255 contexts
 Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
 Kernel command line: root=/dev/ram rw mem=512M 
 ip=169.254.0.180:169.254.0.100:169.254.0.100:255.255.255.0:tanosx:eth0:off 
 panic=1 console=ttyS0,57600
 PID hash table entries: 2048 (order: 1, 8192 bytes)
 Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
 Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
 High memory: 0k
 Memory: 511668k/524288k available (3692k kernel code, 12620k reserved, 176k 
 data, 141k bss, 184k init)
 Kernel virtual memory layout:
   * 0xfffcf000..0xf000  : fixmap
   * 0xffc0..0xffe0  : highmem PTEs
   * 0xffa0..0xffc0  : consistent mem
   * 0xffa0..0xffa0  : early ioremap
   * 0xe100..0xffa0  : vmalloc  ioremap
 SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
 NR_IRQS:512
 UIC0 (32 IRQ sources) at DCR 0xc0
 UIC1 (32 IRQ sources) at DCR 0xd0
 irq: irq 30 on host /interrupt-controller0 mapped to virtual irq 30
 UIC2 (32 IRQ sources) at DCR 0xe0
 irq: irq 10 on host /interrupt-controller0 mapped to virtual irq 16
 UIC3 (32 IRQ sources) at DCR 0xf0
 irq: irq 16 on host /interrupt-controller0 mapped to virtual irq 17
 time_init: decrementer frequency = 800.10 MHz
 time_init: processor frequency   = 800.10 MHz
 clocksource: timebase mult[50] shift[22] registered
 clockevent: decrementer mult[ccf7] shift[32] cpu[0]
 pid_max: default: 32768 minimum: 301
 Mount-cache hash table entries: 512
 NET: Registered protocol family 16
 i2c-core: driver [dummy] registered
 irq: irq 11 on host /interrupt-controller1 mapped to virtual irq 18
 256k L2-cache enabled
 PCIE0: Checking link...
 PCIE0: No device detected.
 PCI host bridge /plb/pciex@d (primary) ranges:
  MEM 0x000e..0x000e7fff - 0x8000 
  MEM 0x000f..0x000f000f - 0x 
   IO 0x000f8000..0x000f8000 - 0x
  Removing ISA hole at 0x000f
 4xx PCI DMA offset set to 0x
 /plb/pciex@d: Legacy ISA memory support enabled
 PCIE0: successfully set as root-complex
 PCIE1: Checking link...
 PCIE1: Device detected, waiting for link...
 PCIE1: link is up !
 PCI host bridge /plb/pciex@d2000 (primary) ranges:
  MEM 0x000e8000..0x000e - 0x8000 
  MEM 0x000f0010..0x000f001f - 0x 
   IO 0x000f8001..0x000f8001 - 0x
  Removing ISA hole at 0x000f0010
 4xx PCI DMA offset set to 0x
 /plb/pciex@d2000: Legacy ISA memory support enabled
 PCIE1: successfully set as root-complex
 PCI host bridge /plb/pci@c0ec0 (primary) ranges:
  MEM 0x000d8000..0x000d - 0x8000 
  MEM 0x000c0ee0..0x000c0eef - 0x 
   IO 0x000c0800..0x000c0800 - 0x
  Removing ISA hole at 0x000c0ee0
 4xx PCI DMA offset set to 0x
 /plb/pci@c0ec0: Legacy ISA memory support enabled
 PCI: Probing PCI hardware
 pci_bus :40: scanning bus
 pci :40:00.0: found [aaa0:bed0] class 000604 header type 01
 pci :40:00.0: reg 10: [mem 0x-0x7fff pref]
 pci_bus :40: fixups for bus
 pci :40:00.0: scanning [bus 41-7f] behind bridge, pass 0
 pci_bus :41: scanning bus
 pci_bus :41: fixups for bus
 pci :40:00.0: PCI bridge to [bus 41-7f]
 pci 

powerpc/4xx: Regression failed on sil24 (and other) drivers

2011-06-25 Thread Ayman El-Khashab
I noticed during a recent development with the 460SX that a
simple device that once worked stopped.  I did a bisect to
find the offending commit and it turns out to be this one:

0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad
commit
commit 0e52247a2ed1f211f0c4f682dc999610a368903f
Author: Cam Macdonell c...@cs.ualberta.ca
Date:   Tue Sep 7 17:25:20 2010 -0700

PCI: fix pci_resource_alignment prototype

I found it working with 2.6.36 but it seems that it is in
the current trunk as well.

I patched my code to take out this commit and (quickly)
verified it was ok.  I am guessing the patch is ok since it
converts int types to resource_size_t.  My guess is that the
problem is in the sil24 driver but I did not see anything 
obvious in that code.  Any tips on what could be wrong?  Is
the problem potentially somewhere being called by that code?

The device driver fails with error -22 on a 460SX (which 
has the 36 bit pci space).

sil24 /drivers/ata/sata_sil24.c

Thanks
Ayman
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev