Re: powerpc/4xx: Regression failed on sil24 (and other) drivers
On Wed, Jun 29, 2011 at 11:42:03AM +1000, Benjamin Herrenschmidt wrote: On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote: On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote: On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote: I noticed during a recent development with the 460SX that a simple device that once worked stopped. I did a bisect to find the offending commit and it turns out to be this one: 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad commit commit 0e52247a2ed1f211f0c4f682dc999610a368903f Author: Cam Macdonell c...@cs.ualberta.ca Date: Tue Sep 7 17:25:20 2010 -0700 PCI: fix pci_resource_alignment prototype I suspect you don't have CONFIG_PCI_QUIRKS enabled... I think that's the cause of your problem. It looks like this config option controls both compiling the generic quirks in from drivers/pci/quirk.c, and the actually mechanism for having quirks in the first place (pci_fixup_device() goes away without that config option). I think we probably want to unconditionally select that if CONFIG_PCI is enabled in arch/powerpc... Can you try changing it and tell us if that helps ? Yes, that fixed our problem, thanks for your time. I am going to try to get the MSI to work. Ayman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/4xx: Regression failed on sil24 (and other) drivers
On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote: On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote: On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote: I noticed during a recent development with the 460SX that a simple device that once worked stopped. I did a bisect to find the offending commit and it turns out to be this one: 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad commit commit 0e52247a2ed1f211f0c4f682dc999610a368903f Author: Cam Macdonell c...@cs.ualberta.ca Date: Tue Sep 7 17:25:20 2010 -0700 PCI: fix pci_resource_alignment prototype Ok, let's see what I can dig out of those logs (sorry for the delay) Let's start with iomem ioport, stripped of the legacy common stuff: /proc/iomem, bad: e-e7fff : /plb/pciex@d e-e7fff : :40:00.0 e8000-e : /plb/pciex@d2000 e8000-e : 0001:80:00.0 good: e-e7fff : /plb/pciex@d e8000-e : /plb/pciex@d2000 e8000-e800f : PCI Bus 0001:81 e8000-e80001fff : 0001:81:00.0 e8000-e80001fff : sata_sil24 e80002000-e8000207f : 0001:81:00.0 e80002000-e8000207f : sata_sil24 So now that's interesting, you have a device at :40:00.0 that appears on your first PHB in the bad case and doesn't show up in the good case. In addition, on the other PHB, the bus itself doesn't show up in the bad case. (Let's ignore IOs and focus on mem. for now). Let's see what lead us to that from the logs. First setup before probing is all identical. The device at :40:00.0 is detected in both cases, it's the root complex bridge. So the scanning is identical as expected. Now the fixup/resource allocation, we start seeing some differences: Bad: pci :40:00.0: BAR 0: assigned [mem 0xe-0xe7fff pref] pci :40:00.0: BAR 0: set to [mem 0xe-0xe7fff pref] (PCI address [0x8000-0x] vs Good: pci :40:00.0: BAR 0: can't assign mem pref (size 0x8000) So the bad case succeeds in giving out resources to the root complex, while the good case fails... fun. And similarily for the other PHB, bad: pci 0001:80:00.0: BAR 0: assigned [mem 0xe8000-0xe pref] pci 0001:80:00.0: BAR 0: set to [mem 0xe8000-0xe pref] (PCI address [0x8000-0x] vs good: pci 0001:80:00.0: BAR 0: can't assign mem pref (size 0x8000) This then goes down to the bad case: pci 0001:80:00.0: BAR 8: can't assign mem (size 0x10) pci 0001:80:00.0: BAR 7: assigned [io 0xfffe1000-0xfffe1fff] pci 0001:81:00.0: BAR 2: can't assign mem (size 0x2000) pci 0001:81:00.0: BAR 0: can't assign mem (size 0x80) while the good one succeeds assigning BAR 8,2 and 0 : pci 0001:80:00.0: BAR 8: assigned [mem 0xe8000-0xe800f] pci 0001:81:00.0: BAR 2: assigned [mem 0xe8000-0xe80001fff 64bit] pci 0001:81:00.0: BAR 2: set to [mem 0xe8000-0xe80001fff 64bit] (PCI address [0x8000-0x80001fff] pci 0001:81:00.0: BAR 0: assigned [mem 0xe80002000-0xe8000207f 64bit] pci 0001:81:00.0: BAR 0: set to [mem 0xe80002000-0xe8000207f 64bit] (PCI address [0x80002000-0x8000207f] It looks to me like the BAR 0 of the host bridges are basically taking the resource aways from the rest of the devices. Now BAR 0 are not bridge resources, which would have been OK, but they are MMIO resources of the bridge itself. On 44x, the problem is that those bridges (stupidly) expose BARs that represent main memory (inbound DMA). It would make sense if these weren't host bridges but in this case that's totally non sensical (and thus IMHO a HW bug). I thought we had code to hide them to avoid that problem, so I wonder what's going on... If you look at arch/powerpc/sysdev/ppc4xx_pci.c, there's this quirk: static void fixup_ppc4xx_pci_bridge(struct pci_dev *dev) { struct pci_controller *hose; int i; if (dev-devfn != 0 || dev-bus-self != NULL) return; hose = pci_bus_to_host(dev-bus); if (hose == NULL) return; if (!of_device_is_compatible(hose-dn, ibm,plb-pciex) !of_device_is_compatible(hose-dn, ibm,plb-pcix) !of_device_is_compatible(hose-dn, ibm,plb-pci)) return; if (of_device_is_compatible(hose-dn, ibm,plb440epx-pci) || of_device_is_compatible(hose-dn, ibm,plb440grx-pci)) { hose-indirect_type |= PPC_INDIRECT_TYPE_BROKEN_MRM; } /* Hide the PCI host BARs from the kernel as their content doesn't * fit well in the resource management */ for (i = 0; i DEVICE_COUNT_RESOURCE; i++) { dev-resource[i].start = dev-resource[i].end = 0; dev-resource[i].flags = 0; } printk(KERN_INFO PCI: Hiding 4xx host bridge resources %s\n, pci_name(dev)); } DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID,
Re: powerpc/4xx: Regression failed on sil24 (and other) drivers
On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote: I noticed during a recent development with the 460SX that a simple device that once worked stopped. I did a bisect to find the offending commit and it turns out to be this one: 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad commit commit 0e52247a2ed1f211f0c4f682dc999610a368903f Author: Cam Macdonell c...@cs.ualberta.ca Date: Tue Sep 7 17:25:20 2010 -0700 PCI: fix pci_resource_alignment prototype I found it working with 2.6.36 but it seems that it is in the current trunk as well. I patched my code to take out this commit and (quickly) verified it was ok. I am guessing the patch is ok since it converts int types to resource_size_t. My guess is that the problem is in the sil24 driver but I did not see anything obvious in that code. Any tips on what could be wrong? Is the problem potentially somewhere being called by that code? The device driver fails with error -22 on a 460SX (which has the 36 bit pci space). sil24 /drivers/ata/sata_sil24.c Can you send a dmesg output of /proc/iomem ioport with and without the patch (same kernel otherwise) ? Also can you try to figure out (printk's) where in the driver does it fail ? (Which function fails) It's possible that this changes something in the core resource assignment code causing something else to fail elsewhere or exposing another bug elsewhere with the consequence of leaving the SiL with badly assigned resources. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc/4xx: Regression failed on sil24 (and other) drivers
On Mon, Jun 27, 2011 at 08:19:56PM +1000, Benjamin Herrenschmidt wrote: On Sat, 2011-06-25 at 18:52 -0500, Ayman El-Khashab wrote: I noticed during a recent development with the 460SX that a simple device that once worked stopped. I did a bisect to find the offending commit and it turns out to be this one: 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad commit commit 0e52247a2ed1f211f0c4f682dc999610a368903f Author: Cam Macdonell c...@cs.ualberta.ca Date: Tue Sep 7 17:25:20 2010 -0700 PCI: fix pci_resource_alignment prototype snip The device driver fails with error -22 on a 460SX (which has the 36 bit pci space). sil24 /drivers/ata/sata_sil24.c Can you send a dmesg output of /proc/iomem ioport with and without the patch (same kernel otherwise) ? Also can you try to figure out (printk's) where in the driver does it fail ? (Which function fails) Yes, here is the output from a canyonlands (460ex) that exhibits the same problem and in the same place. Of the two devices I have that fail (sil24 and one other), both fail in exactly the same place in lib/devres.c within the function pcim_iomap_regions. In that function, there is the following call -- it fails b/c len returns 0 and tha failure bubbles up to error -22. len = pci_resource_len(pdev, i); It's possible that this changes something in the core resource assignment code causing something else to fail elsewhere or exposing another bug elsewhere with the consequence of leaving the SiL with badly assigned resources. That was my initial thought as well, but I wasn't versed enough in the pci magic in order to completely figure it out. Here is the output, it is dmesg, iomem, then ioports for the passing and then the failing cases. thanks ayman == Passing == Using PowerPC 44x Platform machine description Linux version 2.6.36-rc3-00186-g0e52247-dirty (aymane@lablinux) (gcc version 4.2.2) #18 Sat Jun 25 13:51:44 CDT 2011 Found initrd at 0xdfa5c000:0xdfe4cbfa Found legacy serial port 0 for /plb/opb/serial@ef600300 mem=4ef600300, taddr=4ef600300, irq=0, clk=6451612, speed=0 Found legacy serial port 1 for /plb/opb/serial@ef600400 mem=4ef600400, taddr=4ef600400, irq=0, clk=6451612, speed=0 Top of RAM: 0x2000, Total RAM: 0x2000 Memory hole size: 0MB Zone PFN ranges: DMA 0x - 0x0002 Normal empty HighMem empty Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x - 0x0002 On node 0 totalpages: 131072 free_area_init_node: node 0, pgdat c03b9f48, node_mem_map c03ed000 DMA zone: 1024 pages used for memmap DMA zone: 0 pages reserved DMA zone: 130048 pages, LIFO batch:31 MMU: Allocated 1088 bytes of context maps for 255 contexts Built 1 zonelists in Zone order, mobility grouping on. Total pages: 130048 Kernel command line: root=/dev/ram rw mem=512M ip=169.254.0.180:169.254.0.100:169.254.0.100:255.255.255.0:tanosx:eth0:off panic=1 console=ttyS0,57600 PID hash table entries: 2048 (order: 1, 8192 bytes) Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) High memory: 0k Memory: 511668k/524288k available (3692k kernel code, 12620k reserved, 176k data, 141k bss, 184k init) Kernel virtual memory layout: * 0xfffcf000..0xf000 : fixmap * 0xffc0..0xffe0 : highmem PTEs * 0xffa0..0xffc0 : consistent mem * 0xffa0..0xffa0 : early ioremap * 0xe100..0xffa0 : vmalloc ioremap SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 NR_IRQS:512 UIC0 (32 IRQ sources) at DCR 0xc0 UIC1 (32 IRQ sources) at DCR 0xd0 irq: irq 30 on host /interrupt-controller0 mapped to virtual irq 30 UIC2 (32 IRQ sources) at DCR 0xe0 irq: irq 10 on host /interrupt-controller0 mapped to virtual irq 16 UIC3 (32 IRQ sources) at DCR 0xf0 irq: irq 16 on host /interrupt-controller0 mapped to virtual irq 17 time_init: decrementer frequency = 800.10 MHz time_init: processor frequency = 800.10 MHz clocksource: timebase mult[50] shift[22] registered clockevent: decrementer mult[ccf7] shift[32] cpu[0] pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 512 NET: Registered protocol family 16 i2c-core: driver [dummy] registered irq: irq 11 on host /interrupt-controller1 mapped to virtual irq 18 256k L2-cache enabled PCIE0: Checking link... PCIE0: No device detected. PCI host bridge /plb/pciex@d (primary) ranges: MEM 0x000e..0x000e7fff - 0x8000 MEM 0x000f..0x000f000f - 0x IO 0x000f8000..0x000f8000 - 0x Removing ISA hole at 0x000f 4xx PCI DMA offset set to 0x /plb/pciex@d: Legacy ISA memory support enabled PCIE0: successfully set as root-complex PCIE1: Checking link... PCIE1: Device detected, waiting for link...
Re: powerpc/4xx: Regression failed on sil24 (and other) drivers
On Mon, 2011-06-27 at 06:31 -0500, Ayman El-Khashab wrote: That was my initial thought as well, but I wasn't versed enough in the pci magic in order to completely figure it out. Here is the output, it is dmesg, iomem, then ioports for the passing and then the failing cases. Ok, I can see some resource allocation errors in the log, I don't have enough active brain cells left today to figure out what's going on but I'll have a look tomorrow. Cheers, Ben. thanks ayman == Passing == Using PowerPC 44x Platform machine description Linux version 2.6.36-rc3-00186-g0e52247-dirty (aymane@lablinux) (gcc version 4.2.2) #18 Sat Jun 25 13:51:44 CDT 2011 Found initrd at 0xdfa5c000:0xdfe4cbfa Found legacy serial port 0 for /plb/opb/serial@ef600300 mem=4ef600300, taddr=4ef600300, irq=0, clk=6451612, speed=0 Found legacy serial port 1 for /plb/opb/serial@ef600400 mem=4ef600400, taddr=4ef600400, irq=0, clk=6451612, speed=0 Top of RAM: 0x2000, Total RAM: 0x2000 Memory hole size: 0MB Zone PFN ranges: DMA 0x - 0x0002 Normal empty HighMem empty Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0x - 0x0002 On node 0 totalpages: 131072 free_area_init_node: node 0, pgdat c03b9f48, node_mem_map c03ed000 DMA zone: 1024 pages used for memmap DMA zone: 0 pages reserved DMA zone: 130048 pages, LIFO batch:31 MMU: Allocated 1088 bytes of context maps for 255 contexts Built 1 zonelists in Zone order, mobility grouping on. Total pages: 130048 Kernel command line: root=/dev/ram rw mem=512M ip=169.254.0.180:169.254.0.100:169.254.0.100:255.255.255.0:tanosx:eth0:off panic=1 console=ttyS0,57600 PID hash table entries: 2048 (order: 1, 8192 bytes) Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) High memory: 0k Memory: 511668k/524288k available (3692k kernel code, 12620k reserved, 176k data, 141k bss, 184k init) Kernel virtual memory layout: * 0xfffcf000..0xf000 : fixmap * 0xffc0..0xffe0 : highmem PTEs * 0xffa0..0xffc0 : consistent mem * 0xffa0..0xffa0 : early ioremap * 0xe100..0xffa0 : vmalloc ioremap SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 NR_IRQS:512 UIC0 (32 IRQ sources) at DCR 0xc0 UIC1 (32 IRQ sources) at DCR 0xd0 irq: irq 30 on host /interrupt-controller0 mapped to virtual irq 30 UIC2 (32 IRQ sources) at DCR 0xe0 irq: irq 10 on host /interrupt-controller0 mapped to virtual irq 16 UIC3 (32 IRQ sources) at DCR 0xf0 irq: irq 16 on host /interrupt-controller0 mapped to virtual irq 17 time_init: decrementer frequency = 800.10 MHz time_init: processor frequency = 800.10 MHz clocksource: timebase mult[50] shift[22] registered clockevent: decrementer mult[ccf7] shift[32] cpu[0] pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 512 NET: Registered protocol family 16 i2c-core: driver [dummy] registered irq: irq 11 on host /interrupt-controller1 mapped to virtual irq 18 256k L2-cache enabled PCIE0: Checking link... PCIE0: No device detected. PCI host bridge /plb/pciex@d (primary) ranges: MEM 0x000e..0x000e7fff - 0x8000 MEM 0x000f..0x000f000f - 0x IO 0x000f8000..0x000f8000 - 0x Removing ISA hole at 0x000f 4xx PCI DMA offset set to 0x /plb/pciex@d: Legacy ISA memory support enabled PCIE0: successfully set as root-complex PCIE1: Checking link... PCIE1: Device detected, waiting for link... PCIE1: link is up ! PCI host bridge /plb/pciex@d2000 (primary) ranges: MEM 0x000e8000..0x000e - 0x8000 MEM 0x000f0010..0x000f001f - 0x IO 0x000f8001..0x000f8001 - 0x Removing ISA hole at 0x000f0010 4xx PCI DMA offset set to 0x /plb/pciex@d2000: Legacy ISA memory support enabled PCIE1: successfully set as root-complex PCI host bridge /plb/pci@c0ec0 (primary) ranges: MEM 0x000d8000..0x000d - 0x8000 MEM 0x000c0ee0..0x000c0eef - 0x IO 0x000c0800..0x000c0800 - 0x Removing ISA hole at 0x000c0ee0 4xx PCI DMA offset set to 0x /plb/pci@c0ec0: Legacy ISA memory support enabled PCI: Probing PCI hardware pci_bus :40: scanning bus pci :40:00.0: found [aaa0:bed0] class 000604 header type 01 pci :40:00.0: reg 10: [mem 0x-0x7fff pref] pci_bus :40: fixups for bus pci :40:00.0: scanning [bus 41-7f] behind bridge, pass 0 pci_bus :41: scanning bus pci_bus :41: fixups for bus pci :40:00.0: PCI bridge to [bus 41-7f] pci
powerpc/4xx: Regression failed on sil24 (and other) drivers
I noticed during a recent development with the 460SX that a simple device that once worked stopped. I did a bisect to find the offending commit and it turns out to be this one: 0e52247a2ed1f211f0c4f682dc999610a368903f is the first bad commit commit 0e52247a2ed1f211f0c4f682dc999610a368903f Author: Cam Macdonell c...@cs.ualberta.ca Date: Tue Sep 7 17:25:20 2010 -0700 PCI: fix pci_resource_alignment prototype I found it working with 2.6.36 but it seems that it is in the current trunk as well. I patched my code to take out this commit and (quickly) verified it was ok. I am guessing the patch is ok since it converts int types to resource_size_t. My guess is that the problem is in the sil24 driver but I did not see anything obvious in that code. Any tips on what could be wrong? Is the problem potentially somewhere being called by that code? The device driver fails with error -22 on a 460SX (which has the 36 bit pci space). sil24 /drivers/ata/sata_sil24.c Thanks Ayman ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev