Re: drm/radeon: ring test failed on PA-RISC Linux
Let's go futher. 25.09.2013, 22:58, Alex Ivanov gnido...@p0n4ik.tk: 25.09.2013, 21:28, Konrad Rzeszutek Wilk konrad.w...@oracle.com: I took a look at the arch/parisc/kernel/pci-dma.c and I see that is mostly a flat platform. That is bus addresses == physical addresses. Unless it is an pclx or pclx2 CPU type (huh?) - if its it that then any calls to dma_alloc_coherent will map memory out of a pool. In essence it will look like a SWIOTLB bounce buffer. arch/parisc/kernel/pci-dma.c: ** PARISC 1.1 Dynamic DMA mapping support. ** This implementation is for PA-RISC platforms that do not support ** I/O TLBs (aka DMA address translation hardware). That's very old. PA-RISC 2.0 came into the game circa 1996. PA-RISC 1.1 is 32-bit only and i even not sure whether these machines had PCI bus. Only old boxes (PA7200 CPU and lower) cannot use dma_alloc_coherent() (and forced to do syncs iirc). That's not our case. And PA-RISC configs have 'Discontiguous Memory' choosen. But interestingly enough there is a lot of 'flush_kernel_dcache_range' call for every DMA operation. And I think the you need to do dma_sync_for_cpu call in the radeon_test_writeback for it to use the flush_kernel_dcache_range. I was correct regarding syncs. In our case (SBA IOMMU) dma_sync* calls are no-ops: sba_iommu.c: static struct hppa_dma_ops sba_ops = { ... .dma_sync_single_for_cpu = NULL, .dma_sync_single_for_device = NULL, .dma_sync_sg_for_cpu = NULL, .dma_sync_sg_for_device = NULL, } dma-mapping.h: dma_cache_sync(struct device *dev, void *vaddr, size_t size, enum dma_data_direction direction) { if(hppa_dma_ops-dma_sync_single_for_cpu) flush_kernel_dcache_range((unsigned long)vaddr, size); } So i'll skip doing the flush_kernel_dcache_range(). I don't know what the flush_kernel_dcache_range does thought so I could be wrong. D-cache is a CPU cache (if they meant it). Seems to be L1-level. There is an I-cache at same level. You are missing a translation here (you were comparing the virtual address to the bus address). I was thinking something along this: Yes, this confused me. I've translated your suggestion literally :\ unsigned int pfn = page_to_pfn(ttm-pages[i]); dma_addr_t bus = gtt-ttm.dma_address[i]; void *va_bus, *va, *va_pfn; if ((pfn PAGE_SHIFT) != bus) printk(Bus 0x%lx != PFN 0x%lx, bus, pfn PAGE_SHIFT); /* OK, that means bus addresses are different */ va_bus = bus_to_virt(gtt-ttm.dma_address[i]); va_pfn = __va(pfn PAGE_SHIFT); if (!virt_addr_valid(va_bus)) printk(va_bus (0x%lx) not good!\n, va_bus); if (!virt_addr_valid(va_pfn)) printk(va_pfn (0x%lx) not good!\n, va_pfn); /* We got VA for both bus - va, and pfn - va. Should be the same if bus and physical addresses are on the same namespace. */ if (va_bus != va_pfn) printk(va bus:%lx != va pfn: %lx\n, va_bus, va_pfn); /* Now that we have bus - pa - va (va_bus) try to go va_bus - bus address. The bus address should be the same */ if (gtt-tmm.dma_address[i] != virt_to_bus(va_bus)) printk(bus-pa-va:%lx != bus-pa-va-ba: %lx\n, gtt-tmm.dma_address[i],virt_to_bus(va_bus)); Ok, slightly modified: struct page *page = ttm-pages[i]; unsigned long pfn = page_to_pfn(page); dma_addr_t bus = gtt-ttm.dma_address[i]; void *va_bus, *va, *va_pfn; BUG_ON(!pfn_valid(pfn)); //BUG_ON(!page_mapping(page)); // Leads to a kernel BUG /* Avoid floodage */ if (i % 100 == 0) { if ((pfn PAGE_SHIFT) != bus) printk(Bus 0x%lx != PFN 0x%lx\n, bus, pfn PAGE_SHIFT); /* OK, that means bus addresses are different */ va_bus = bus_to_virt(bus); va_pfn = __va(pfn PAGE_SHIFT); if (!virt_addr_valid(va_bus)) printk(va_bus (0x%lx) not good!\n, va_bus); if (!virt_addr_valid(va_pfn)) printk(va_pfn (0x%lx) not good!\n, va_pfn); /* We got VA for both bus - va, and pfn - va. Should be the same if bus and physical addresses are on the same namespace. */ if (va_bus != va_pfn) printk(va bus: %lx != va pfn: %lx\n, va_bus, va_pfn); /* Now that we have bus - pa - va (va_bus) try to go va_bus - bus address. The bus address should be the same */ if (bus != virt_to_bus(va_bus)) printk(bus-pa-va: %lx != bus-pa-va-ba: %lx\n, bus,virt_to_bus(va_bus)); } Output: Bus 0x4028 != PFN
Re: drm/radeon: ring test failed on PA-RISC Linux
24.09.2013, 00:11, Konrad Rzeszutek Wilk konrad.w...@oracle.com: On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote: 21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а): The register writes seems to be going through the register backbone correctly: [0x00B] 0x15E0=0x [0x00C] 0x15E4=0xCAFEDEAD [0x00D] 0x4274=0x000F [0x00E] 0x42C8=0x0007 [0x00F] 0x4018=0x001D [0x010] 0x170C=0x8000 [0x011] 0x3428=0x00020100 [0x012] 0x15E4=0xCAFEDEAD You can see the 0xCAFEDEAD written to the scratch register via MMIO from the ring_test(). The CP fifo however seems to be full of garbage. The CP is busy though, so it seems to be functional. I guess it's just fetching garbage rather than commands. If it is fetching garbage, that would imply the DMA (or bus addresses) that are programmed in the GART are bogus. If you dump them and try to figure out if bus adress - physical address - virtual address == virtual address - bus address that could help. And perhaps seeing what the virtual address has - and or poisoning it with known data? Or perhaps the the card has picked up an incorrect page table? Meaning the (bus) address given to it is not the correct one? Konrad, Let's see. Please notice that i'm not PA-RISC or general linux kernel developer, just the user, so i may do things completely wrong. I was hoping that PA-RISC smarties will join me here, but they seem to be busy with other duties. Even port's mail list activity is low during last weeks. If you dump them and try to figure out if bus adress - physical address - virtual address == virtual address - bus address that could help With following radeon/radeon_ttm.c: radeon_ttm_tt_populate(): ... for (i = 0; i ttm-num_pages; i++) { gtt-ttm.dma_address[i] = pci_map_page(rdev-pdev, ttm-pages[i], 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); void *va = bus_to_virt(gtt-ttm.dma_address[i]); if ((phys_addr_t) va != virt_to_bus(va)) { DRM_INFO(MISMATCH: %p != %p\n, va, (void *) virt_to_bus(va)); /*DRM_INFO(CONTENTS: %x\n, *((uint32_t *)va));*/ // Leads to a Kernel Fault ... } I'm getting the output: [drm] MISMATCH: 8028 != 4028 [drm] MISMATCH: 80281000 != 40281000 ... How can i check the same for an AGP mode? Or perhaps the the card has picked up an incorrect page table? Meaning the (bus) address given to it is not the correct one? I'll see. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
On Wed, Sep 25, 2013 at 1:28 PM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: On Wed, Sep 25, 2013 at 08:29:07PM +0400, Alex Ivanov wrote: 24.09.2013, 00:11, Konrad Rzeszutek Wilk konrad.w...@oracle.com: On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote: 21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а): The register writes seems to be going through the register backbone correctly: [0x00B] 0x15E0=0x [0x00C] 0x15E4=0xCAFEDEAD [0x00D] 0x4274=0x000F [0x00E] 0x42C8=0x0007 [0x00F] 0x4018=0x001D [0x010] 0x170C=0x8000 [0x011] 0x3428=0x00020100 [0x012] 0x15E4=0xCAFEDEAD You can see the 0xCAFEDEAD written to the scratch register via MMIO from the ring_test(). The CP fifo however seems to be full of garbage. The CP is busy though, so it seems to be functional. I guess it's just fetching garbage rather than commands. If it is fetching garbage, that would imply the DMA (or bus addresses) that are programmed in the GART are bogus. If you dump them and try to figure out if bus adress - physical address - virtual address == virtual address - bus address that could help. And perhaps seeing what the virtual address has - and or poisoning it with known data? Or perhaps the the card has picked up an incorrect page table? Meaning the (bus) address given to it is not the correct one? Konrad, Let's see. Please notice that i'm not PA-RISC or general linux kernel developer, just the user, so i may do things completely wrong. I was hoping that PA-RISC smarties will join me here, but they seem to be busy with other duties. Even port's mail list activity is low during last weeks. I took a look at the arch/parisc/kernel/pci-dma.c and I see that is mostly a flat platform. That is bus addresses == physical addresses. Unless it is an pclx or pclx2 CPU type (huh?) - if its it that then any calls to dma_alloc_coherent will map memory out of a pool. In essence it will look like a SWIOTLB bounce buffer. But interestingly enough there is a lot of 'flush_kernel_dcache_range' call for every DMA operation. And I think the you need to do dma_sync_for_cpu call in the radeon_test_writeback for it to use the flush_kernel_dcache_range. I don't know what the flush_kernel_dcache_range does thought so I could be wrong. That means you can ignore the little code below I wrote and see about doing something like this: diff --git a/drivers/gpu/drm/radeon/radeon_cp.c b/drivers/gpu/drm/radeon/radeon_cp.c index 3cae2bb..9e5923d 100644 --- a/drivers/gpu/drm/radeon/radeon_cp.c +++ b/drivers/gpu/drm/radeon/radeon_cp.c @@ -876,6 +876,7 @@ static void radeon_test_writeback(drm_radeon_private_t * dev_priv) RADEON_WRITE(RADEON_SCRATCH_REG1, 0xdeadbeef); + flush_kernel_dcache_range(dev_priv-ring_rptr, PAGE_SIZE); for (tmp = 0; tmp dev_priv-usec_timeout; tmp++) { u32 val; You'd want to add the add the flush to r100_ring_test() in r100.c. radeon_cp.c is for the old UMS support. Alex But that is probably a shot in the dark. I have no clue what the flush_.. is doing. [edit: And then I noticed sba_iommu.c, which is a complete IOMMU driver where bus and physical addresses are different. sigh. What type of machine is this? Does it have the IOMMU in it?] If you dump them and try to figure out if bus adress - physical address - virtual address == virtual address - bus address that could help With following radeon/radeon_ttm.c: radeon_ttm_tt_populate(): ... for (i = 0; i ttm-num_pages; i++) { gtt-ttm.dma_address[i] = pci_map_page(rdev-pdev, ttm-pages[i], 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); void *va = bus_to_virt(gtt-ttm.dma_address[i]); if ((phys_addr_t) va != virt_to_bus(va)) { You are missing a translation here (you were comparing the virtual address to the bus address). I was thinking something along this: unsigned int pfn = page_to_pfn(ttm-pages[i]); dma_addr_t bus = gtt-ttm.dma_address[i]; void *va_bus, *va, *va_pfn; if ((pfn PAGE_SHIFT) != bus) printk(Bus 0x%lx != PFN 0x%lx, bus, pfn PAGE_SHIFT); /* OK, that means bus addresses are different */ va_bus = bus_to_virt(gtt-ttm.dma_address[i]); va_pfn = __va(pfn PAGE_SHIFT); if (!virt_addr_valid(va_bus)) printk(va_bus (0x%lx) not good!\n, va_bus); if (!virt_addr_valid(va_pfn)) printk(va_pfn (0x%lx) not good!\n, va_pfn); /* We got VA for both bus - va, and pfn - va. Should be the same if bus and physical addresses are on the same namespace.
Re: drm/radeon: ring test failed on PA-RISC Linux
Alex, You'd want to add the add the flush to r100_ring_test() in r100.c. radeon_cp.c is for the old UMS support. Right! Konrad, Thanks for the code! I'll try asap. 25.09.2013, 21:28, Konrad Rzeszutek Wilk konrad.w...@oracle.com: I took a look at the arch/parisc/kernel/pci-dma.c and I see that is mostly a flat platform. That is bus addresses == physical addresses. Unless it is an pclx or pclx2 CPU type (huh?) - if its it that then any calls to dma_alloc_coherent will map memory out of a pool. In essence it will look like a SWIOTLB bounce buffer. arch/parisc/kernel/pci-dma.c: ** PARISC 1.1 Dynamic DMA mapping support. ** This implementation is for PA-RISC platforms that do not support ** I/O TLBs (aka DMA address translation hardware). That's very old. PA-RISC 2.0 came into the game circa 1996. PA-RISC 1.1 is 32-bit only and i even not sure whether these machines had PCI bus. Only old boxes (PA7200 CPU and lower) cannot use dma_alloc_coherent() (and forced to do syncs iirc). That's not our case. And PA-RISC configs have 'Discontiguous Memory' choosen. But interestingly enough there is a lot of 'flush_kernel_dcache_range' call for every DMA operation. And I think the you need to do dma_sync_for_cpu call in the radeon_test_writeback for it to use the flush_kernel_dcache_range. I don't know what the flush_kernel_dcache_range does thought so I could be wrong. D-cache is a CPU cache (if they meant it). Seems to be L1-level. There is an I-cache at same level. That means you can ignore the little code below I wrote and see about doing something like this: diff --git a/drivers/gpu/drm/radeon/radeon_cp.c b/drivers/gpu/drm/radeon/radeon_cp.c index 3cae2bb..9e5923d 100644 --- a/drivers/gpu/drm/radeon/radeon_cp.c +++ b/drivers/gpu/drm/radeon/radeon_cp.c @@ -876,6 +876,7 @@ static void radeon_test_writeback(drm_radeon_private_t * dev_priv) RADEON_WRITE(RADEON_SCRATCH_REG1, 0xdeadbeef); + flush_kernel_dcache_range(dev_priv-ring_rptr, PAGE_SIZE); for (tmp = 0; tmp dev_priv-usec_timeout; tmp++) { u32 val; But that is probably a shot in the dark. I have no clue what the flush_.. is doing. [edit: And then I noticed sba_iommu.c, which is a complete IOMMU driver where bus and physical addresses are different. sigh. What type of machine is this? Does it have the IOMMU in it?] That's our case. Yes, recent IA64 and PA-RISC machines have SBA IOMMU device. PCI I/O seem to go through it. There is a note for my chipset in sba_iommu.c: /* We are just encouraging 32-bit DMA masks here since we can * never allow IOMMU bypass unless we add special support for ZX1. */ And it indeed right. When i've tried to bypass hw IOMMU like in ia64 code it lead to the faults from drivers which do the DMA (like Fusion MPT SCSI driver). void *va = bus_to_virt(gtt-ttm.dma_address[i]); if ((phys_addr_t) va != virt_to_bus(va)) { You are missing a translation here (you were comparing the virtual address to the bus address). I was thinking something along this: Yes, this confused me. I've translated your suggestion literally :\ unsigned int pfn = page_to_pfn(ttm-pages[i]); dma_addr_t bus = gtt-ttm.dma_address[i]; void *va_bus, *va, *va_pfn; if ((pfn PAGE_SHIFT) != bus) printk(Bus 0x%lx != PFN 0x%lx, bus, pfn PAGE_SHIFT); /* OK, that means bus addresses are different */ va_bus = bus_to_virt(gtt-ttm.dma_address[i]); va_pfn = __va(pfn PAGE_SHIFT); if (!virt_addr_valid(va_bus)) printk(va_bus (0x%lx) not good!\n, va_bus); if (!virt_addr_valid(va_pfn)) printk(va_pfn (0x%lx) not good!\n, va_pfn); /* We got VA for both bus - va, and pfn - va. Should be the same if bus and physical addresses are on the same namespace. */ if (va_bus != va_pfn) printk(va bus:%lx != va pfn: %lx\n, va_bus, va_pfn); /* Now that we have bus - pa - va (va_bus) try to go va_bus - bus address. The bus address should be the same */ if (gtt-tmm.dma_address[i] != virt_to_bus(va_bus)) printk(bus-pa-va:%lx != bus-pa-va-ba: %lx\n, gtt-tmm.dma_address[i],virt_to_bus(va_bus)); DRM_INFO(MISMATCH: %p != %p\n, va, (void *) virt_to_bus(va)); /*DRM_INFO(CONTENTS: %x\n, *((uint32_t *)va));*/ // Leads to a Kernel Fault That is odd. I would have thought it would be usuable. ... } I'm getting the output: [drm] MISMATCH: 8028 != 4028 In theory that means the bus address that is programmed in (gtt-dma_address[i]) is 4028 (which is what
Re: drm/radeon: ring test failed on PA-RISC Linux
On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote: 21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. After the driver is loaded, you can dump some info from debugfs: r100_rbbm_info r100_cp_ring_info r100_cp_csq_fifo Which will dump a bunch of registers and internal fifos so we can see that the chip actually processed. Alex Reading of r100_cp_ring_info leads to a KP: r100_debugfs_cp_ring_info(): count = (rdp + ring-ring_size - wdp) ring-ptr_mask; i = (rdp + j) ring-ptr_mask; for (j = 0; j = count; j++) { i = (rdp + j) ring-ptr_mask; -- Here at first iteration -- -- count = 262080, i = 0 -- seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]); } Reading of radeon_ring_gfx (which i've additionally tried to read) throws an MCE: radeon_debugfs_ring_info(): count = (ring-ring_size / 4) - ring-ring_free_dw; i = (ring-rptr + ring-ptr_mask + 1 - 32) ring-ptr_mask; for (j = 0; j = (count + 32); j++) { -- Here at first iteration -- -- i = 262112, j = 0 -- seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]); i = (i + 1) ring-ptr_mask; } I'm attaching debug outputs on kernel built with these loops commented. The register writes seems to be going through the register backbone correctly: [0x00B] 0x15E0=0x [0x00C] 0x15E4=0xCAFEDEAD [0x00D] 0x4274=0x000F [0x00E] 0x42C8=0x0007 [0x00F] 0x4018=0x001D [0x010] 0x170C=0x8000 [0x011] 0x3428=0x00020100 [0x012] 0x15E4=0xCAFEDEAD You can see the 0xCAFEDEAD written to the scratch register via MMIO from the ring_test(). The CP fifo however seems to be full of garbage. The CP is busy though, so it seems to be functional. I guess it's just fetching garbage rather than commands. If it is fetching garbage, that would imply the DMA (or bus addresses) that are programmed in the GART are bogus. If you dump them and try to figure out if bus adress - physical address - virtual address == virtual address - bus address that could help. And perhaps seeing what the virtual address has - and or poisoning it with known data? Or perhaps the the card has picked up an incorrect page table? Meaning the (bus) address given to it is not the correct one? Does doing a posted write when writing to the ring buffer help? Unfortunately, no. diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index a890756..b4f04d2 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ri */ void radeon_ring_write(struct radeon_ring *ring, uint32_t v) { + u32 tmp; #if DRM_DEBUG_CODE if (ring-count_dw = 0) { DRM_ERROR(radeon: writing more dwords to the ring than expected!\n); } #endif ring-ring[ring-wptr++] = v; + tmp = ring-ring[ring-wptr - 1]; ring-wptr = ring-ptr_mask; ring-count_dw--; ring-ring_free_dw--; ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
17.09.2013, в 23:33, Alex Ivanov gnido...@p0n4ik.tk написал(а): 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. After the driver is loaded, you can dump some info from debugfs: r100_rbbm_info r100_cp_ring_info r100_cp_csq_fifo Which will dump a bunch of registers and internal fifos so we can see that the chip actually processed. Alex Reading of r100_cp_ring_info leads to a KP: r100_debugfs_cp_ring_info(): count = (rdp + ring-ring_size - wdp) ring-ptr_mask; i = (rdp + j) ring-ptr_mask; for (j = 0; j = count; j++) { i = (rdp + j) ring-ptr_mask; -- Here at first iteration -- -- count = 262080, i = 0 -- seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]); } Reading of radeon_ring_gfx (which i've additionally tried to read) throws an MCE: radeon_debugfs_ring_info(): count = (ring-ring_size / 4) - ring-ring_free_dw; i = (ring-rptr + ring-ptr_mask + 1 - 32) ring-ptr_mask; for (j = 0; j = (count + 32); j++) { -- Here at first iteration -- -- count = 64, i = 262112 -- seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]); i = (i + 1) ring-ptr_mask; } I'm attaching debug outputs on kernel built with these loops commented. drm_parisc_debug.tgz___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel The ring-ring is NULL... ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. After the driver is loaded, you can dump some info from debugfs: r100_rbbm_info r100_cp_ring_info r100_cp_csq_fifo Which will dump a bunch of registers and internal fifos so we can see that the chip actually processed. Alex Reading of r100_cp_ring_info leads to a KP: r100_debugfs_cp_ring_info(): count = (rdp + ring-ring_size - wdp) ring-ptr_mask; i = (rdp + j) ring-ptr_mask; for (j = 0; j = count; j++) { i = (rdp + j) ring-ptr_mask; -- Here at first iteration -- -- count = 262080, i = 0 -- seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]); } Reading of radeon_ring_gfx (which i've additionally tried to read) throws an MCE: radeon_debugfs_ring_info(): count = (ring-ring_size / 4) - ring-ring_free_dw; i = (ring-rptr + ring-ptr_mask + 1 - 32) ring-ptr_mask; for (j = 0; j = (count + 32); j++) { -- Here at first iteration -- -- i = 262112, j = 0 -- seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]); i = (i + 1) ring-ptr_mask; } I'm attaching debug outputs on kernel built with these loops commented. The register writes seems to be going through the register backbone correctly: [0x00B] 0x15E0=0x [0x00C] 0x15E4=0xCAFEDEAD [0x00D] 0x4274=0x000F [0x00E] 0x42C8=0x0007 [0x00F] 0x4018=0x001D [0x010] 0x170C=0x8000 [0x011] 0x3428=0x00020100 [0x012] 0x15E4=0xCAFEDEAD You can see the 0xCAFEDEAD written to the scratch register via MMIO from the ring_test(). The CP fifo however seems to be full of garbage. The CP is busy though, so it seems to be functional. I guess it's just fetching garbage rather than commands. Does doing a posted write when writing to the ring buffer help? diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index a890756..b4f04d2 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ri */ void radeon_ring_write(struct radeon_ring *ring, uint32_t v) { + u32 tmp; #if DRM_DEBUG_CODE if (ring-count_dw = 0) { DRM_ERROR(radeon: writing more dwords to the ring than expected!\n); } #endif ring-ring[ring-wptr++] = v; + tmp = ring-ring[ring-wptr - 1]; ring-wptr = ring-ptr_mask; ring-count_dw--; ring-ring_free_dw--; ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. After the driver is loaded, you can dump some info from debugfs: r100_rbbm_info r100_cp_ring_info r100_cp_csq_fifo Which will dump a bunch of registers and internal fifos so we can see that the chip actually processed. Alex Reading of r100_cp_ring_info leads to a KP: r100_debugfs_cp_ring_info(): count = (rdp + ring-ring_size - wdp) ring-ptr_mask; i = (rdp + j) ring-ptr_mask; for (j = 0; j = count; j++) { i = (rdp + j) ring-ptr_mask; -- Here at first iteration -- -- count = 262080, i = 0 -- seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]); } Reading of radeon_ring_gfx (which i've additionally tried to read) throws an MCE: radeon_debugfs_ring_info(): count = (ring-ring_size / 4) - ring-ring_free_dw; i = (ring-rptr + ring-ptr_mask + 1 - 32) ring-ptr_mask; for (j = 0; j = (count + 32); j++) { -- Here at first iteration -- -- i = 262112, j = 0 -- seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]); i = (i + 1) ring-ptr_mask; } I'm attaching debug outputs on kernel built with these loops commented. The register writes seems to be going through the register backbone correctly: [0x00B] 0x15E0=0x [0x00C] 0x15E4=0xCAFEDEAD [0x00D] 0x4274=0x000F [0x00E] 0x42C8=0x0007 [0x00F] 0x4018=0x001D [0x010] 0x170C=0x8000 [0x011] 0x3428=0x00020100 [0x012] 0x15E4=0xCAFEDEAD You can see the 0xCAFEDEAD written to the scratch register via MMIO from the ring_test(). The CP fifo however seems to be full of garbage. The CP is busy though, so it seems to be functional. I guess it's just fetching garbage rather than commands. Does doing a posted write when writing to the ring buffer help? Unfortunately, no. diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index a890756..b4f04d2 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ri */ void radeon_ring_write(struct radeon_ring *ring, uint32_t v) { + u32 tmp; #if DRM_DEBUG_CODE if (ring-count_dw = 0) { DRM_ERROR(radeon: writing more dwords to the ring than expected!\n); } #endif ring-ring[ring-wptr++] = v; + tmp = ring-ring[ring-wptr - 1]; ring-wptr = ring-ptr_mask; ring-count_dw--; ring-ring_free_dw--; ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. Konrad, 10.09.2013, 17:25, Konrad Rzeszutek Wilk konrad.w...@oracle.com: Is this platform enabling the SWIOTLB layer? Doesn't look like. The reason I am asking is b/c if you do indeed enable it you end up using the TTM DMA pool which allocates pages using the dma_alloc_coherent - which means that all of the pages that come out of TTM are already 'DMA' mapped. And that means the radeon_gart_bind and all its friends use the DMA addresses that have been constructed by SWIOTLB IOMMU. Perhaps the PA-RISC IOMMU creates the DMA addresses differently? When the card gets programmed, you do end up using ttm_agp_bind right? I am wondering if something like this: https://lkml.org/lkml/2010/12/6/512 is needed to pass in the right DMA address? No idea how to modify ttm_agp_bind() this way, though doesn't matter if swiotlb isn't used anyway? ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. After the driver is loaded, you can dump some info from debugfs: r100_rbbm_info r100_cp_ring_info r100_cp_csq_fifo Which will dump a bunch of registers and internal fifos so we can see that the chip actually processed. Alex Konrad, 10.09.2013, 17:25, Konrad Rzeszutek Wilk konrad.w...@oracle.com: Is this platform enabling the SWIOTLB layer? Doesn't look like. The reason I am asking is b/c if you do indeed enable it you end up using the TTM DMA pool which allocates pages using the dma_alloc_coherent - which means that all of the pages that come out of TTM are already 'DMA' mapped. And that means the radeon_gart_bind and all its friends use the DMA addresses that have been constructed by SWIOTLB IOMMU. Perhaps the PA-RISC IOMMU creates the DMA addresses differently? When the card gets programmed, you do end up using ttm_agp_bind right? I am wondering if something like this: https://lkml.org/lkml/2010/12/6/512 is needed to pass in the right DMA address? No idea how to modify ttm_agp_bind() this way, though doesn't matter if swiotlb isn't used anyway? ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а): On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а): The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Ah, sorry. Indeed. Though, my idea with: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( doesn't make a sense at TTM part as well. After the driver is loaded, you can dump some info from debugfs: r100_rbbm_info r100_cp_ring_info r100_cp_csq_fifo Which will dump a bunch of registers and internal fifos so we can see that the chip actually processed. Alex Reading of r100_cp_ring_info leads to a KP: r100_debugfs_cp_ring_info(): count = (rdp + ring-ring_size - wdp) ring-ptr_mask; i = (rdp + j) ring-ptr_mask; for (j = 0; j = count; j++) { i = (rdp + j) ring-ptr_mask; -- Here at first iteration -- -- count = 262080, i = 0 -- seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]); } Reading of radeon_ring_gfx (which i've additionally tried to read) throws an MCE: radeon_debugfs_ring_info(): count = (ring-ring_size / 4) - ring-ring_free_dw; i = (ring-rptr + ring-ptr_mask + 1 - 32) ring-ptr_mask; for (j = 0; j = (count + 32); j++) { -- Here at first iteration -- -- i = 262112, j = 0 -- seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]); i = (i + 1) ring-ptr_mask; } I'm attaching debug outputs on kernel built with these loops commented. drm_parisc_debug.tgz Description: Binary data ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
Alex, 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а): On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: Folks, We (people at linux-parisc @ vger.kernel.org mail list) are trying to make native video options of the latest PA-RISC servers and workstations (these are ATIs, most of which are based on R100/R300/R420 chips) work correctly on this platform (big endian pa-risc). However, we hadn't much success. DRM fails every time with ring test failed for both AGP PCI. Maybe you would give us some suggestions that we could check? Topic started here: http://www.spinics.net/lists/linux-parisc/msg04908.html And continued there: http://www.spinics.net/lists/linux-parisc/msg04995.html http://www.spinics.net/lists/linux-parisc/msg05006.html Problems we've already resolved without any signs of progress: - Checked the successful microcode load parisc AGP GART code writes IOMMU entries in the wrong byte order and doesn't add the coherency information SBA code adds our PCI BAR setup doesn't really work very well together with the Radeon DRM address setup. DRM will generate addresses, which are even outside of the connected LBA Things planned for a check: The drivers/video/aty uses an endian config bit DRM doesn't use, but I haven't tested whether this makes a difference and how it is connected to the overall picture. I don't think that will any difference. radeon kms works fine on other big endian platforms such as powerpc. Good! I'll opt it out then. The Rage128 product revealed a weakness in some motherboard chipsets in that there is no mechanism to guarantee that data written by the CPU to memory is actually in a readable state before the Graphics Controller receives an update to its copy of the Write Pointer. In an effort to alleviate this problem, weve introduced a mechanism into the Graphics Controller that will delay the actual write to the Write Pointer for some programmable amount of time, in order to give the chipset time to flush its internal write buffers to memory. There are two register fields that control this mechanism: PRE_WRITE_TIMER and PRE_WRITE_LIMIT. In the radeon DRM codebase I didn't found anyone using/setting those registers. Maybe PA-RISC has some problem here?... I doubt it. If you are using AGP, I'd suggest disabling it and first try to get things working using the on chip gart rather than AGP. Load radeon with agpmode=-1. Already tried this without any luck. Anyway, a radeon driver fallbacks to the PCI mode in our case, so does it really matter? In addition, people with PCI cards experiencing the same issue... The on chip gart always uses cache snooped pci transactions and the driver assumes pci is cache coherent. On AGP/PCI chips, the on-chip gart mechanism stores the gart table in system ram. On PCIE asics, the gart table is stored in vram. The gart page table maps system pages to a contiguous aperture in the GPU's address space. The ring lives in gart memory. The GPU sees a contiguous buffer and the gart mechanism handles the access to the backing pages via the page table. I'd suggest verifying that the entries written to the gart page table are valid and then the information written to the ring buffer is valid before updating the ring's wptr in radeon_ring_unlock_commit(). Changing the wptr is what causes the CP to start fetching data from the ring. Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( --- radeon_device.c.orig2013-09-10 08:55:05.0 + +++ radeon_device.c 2013-09-10 09:12:17.0 + @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon { if (rdev-dummy_page.page) return 0; - rdev-dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO); - if (rdev-dummy_page.page == NULL) + rdev-dummy_page.page = dma_alloc_coherent(rdev-pdev-dev, PAGE_SIZE, + rdev-dummy_page.addr, GFP_DMA32|GFP_KERNEL); + if (!rdev-dummy_page.page) return -ENOMEM; - rdev-dummy_page.addr = pci_map_page(rdev-pdev, rdev-dummy_page.page, - 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); if (pci_dma_mapping_error(rdev-pdev, rdev-dummy_page.addr)) { dev_err(rdev-pdev-dev, Failed to DMA MAP the dummy page\n); - __free_page(rdev-dummy_page.page); - rdev-dummy_page.page = NULL; + radeon_dummy_page_fini(rdev); return -ENOMEM; } return 0; @@ -698,9 +696,8 @@ void radeon_dummy_page_fini(struct radeo { if (rdev-dummy_page.page == NULL) return; - pci_unmap_page(rdev-pdev, rdev-dummy_page.addr, - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); - __free_page(rdev-dummy_page.page); +
Re: drm/radeon: ring test failed on PA-RISC Linux
On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а): On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: Folks, We (people at linux-parisc @ vger.kernel.org mail list) are trying to make native video options of the latest PA-RISC servers and workstations (these are ATIs, most of which are based on R100/R300/R420 chips) work correctly on this platform (big endian pa-risc). However, we hadn't much success. DRM fails every time with ring test failed for both AGP PCI. Maybe you would give us some suggestions that we could check? Topic started here: http://www.spinics.net/lists/linux-parisc/msg04908.html And continued there: http://www.spinics.net/lists/linux-parisc/msg04995.html http://www.spinics.net/lists/linux-parisc/msg05006.html Problems we've already resolved without any signs of progress: - Checked the successful microcode load parisc AGP GART code writes IOMMU entries in the wrong byte order and doesn't add the coherency information SBA code adds our PCI BAR setup doesn't really work very well together with the Radeon DRM address setup. DRM will generate addresses, which are even outside of the connected LBA Things planned for a check: The drivers/video/aty uses an endian config bit DRM doesn't use, but I haven't tested whether this makes a difference and how it is connected to the overall picture. I don't think that will any difference. radeon kms works fine on other big endian platforms such as powerpc. Good! I'll opt it out then. The Rage128 product revealed a weakness in some motherboard chipsets in that there is no mechanism to guarantee that data written by the CPU to memory is actually in a readable state before the Graphics Controller receives an update to its copy of the Write Pointer. In an effort to alleviate this problem, weve introduced a mechanism into the Graphics Controller that will delay the actual write to the Write Pointer for some programmable amount of time, in order to give the chipset time to flush its internal write buffers to memory. There are two register fields that control this mechanism: PRE_WRITE_TIMER and PRE_WRITE_LIMIT. In the radeon DRM codebase I didn't found anyone using/setting those registers. Maybe PA-RISC has some problem here?... I doubt it. If you are using AGP, I'd suggest disabling it and first try to get things working using the on chip gart rather than AGP. Load radeon with agpmode=-1. Already tried this without any luck. Anyway, a radeon driver fallbacks to the PCI mode in our case, so does it really matter? In addition, people with PCI cards experiencing the same issue... The on chip gart always uses cache snooped pci transactions and the driver assumes pci is cache coherent. On AGP/PCI chips, the on-chip gart mechanism stores the gart table in system ram. On PCIE asics, the gart table is stored in vram. The gart page table maps system pages to a contiguous aperture in the GPU's address space. The ring lives in gart memory. The GPU sees a contiguous buffer and the gart mechanism handles the access to the backing pages via the page table. I'd suggest verifying that the entries written to the gart page table are valid and then the information written to the ring buffer is valid before updating the ring's wptr in radeon_ring_unlock_commit(). Changing the wptr is what causes the CP to start fetching data from the ring. Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( The dummy page isn't really going to help much. That page is just used as a safety placeholder for gart entries that aren't mapped on the GPU. TTM (drivers/gpu/drm/ttm) actually does the allocation of the backing pages for the gart. You may want to look there. Alex --- radeon_device.c.orig2013-09-10 08:55:05.0 + +++ radeon_device.c 2013-09-10 09:12:17.0 + @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon { if (rdev-dummy_page.page) return 0; - rdev-dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO); - if (rdev-dummy_page.page == NULL) + rdev-dummy_page.page = dma_alloc_coherent(rdev-pdev-dev, PAGE_SIZE, + rdev-dummy_page.addr, GFP_DMA32|GFP_KERNEL); + if (!rdev-dummy_page.page) return -ENOMEM; - rdev-dummy_page.addr = pci_map_page(rdev-pdev, rdev-dummy_page.page, - 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); if (pci_dma_mapping_error(rdev-pdev, rdev-dummy_page.addr)) { dev_err(rdev-pdev-dev, Failed to DMA MAP the dummy page\n); - __free_page(rdev-dummy_page.page); - rdev-dummy_page.page = NULL; + radeon_dummy_page_fini(rdev);
Re: drm/radeon: ring test failed on PA-RISC Linux
On 09/10/2013 02:37 PM, Alex Deucher wrote: On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote: Alex, 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а): On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: Folks, We (people at linux-parisc @ vger.kernel.org mail list) are trying to make native video options of the latest PA-RISC servers and workstations (these are ATIs, most of which are based on R100/R300/R420 chips) work correctly on this platform (big endian pa-risc). However, we hadn't much success. DRM fails every time with ring test failed for both AGP PCI. Maybe you would give us some suggestions that we could check? Topic started here: http://www.spinics.net/lists/linux-parisc/msg04908.html And continued there: http://www.spinics.net/lists/linux-parisc/msg04995.html http://www.spinics.net/lists/linux-parisc/msg05006.html Problems we've already resolved without any signs of progress: - Checked the successful microcode load parisc AGP GART code writes IOMMU entries in the wrong byte order and doesn't add the coherency information SBA code adds our PCI BAR setup doesn't really work very well together with the Radeon DRM address setup. DRM will generate addresses, which are even outside of the connected LBA Things planned for a check: The drivers/video/aty uses an endian config bit DRM doesn't use, but I haven't tested whether this makes a difference and how it is connected to the overall picture. I don't think that will any difference. radeon kms works fine on other big endian platforms such as powerpc. Good! I'll opt it out then. Actually, I am experiencing exactly the same problem on a Sam460ex ppc system, at least as of 3.9 (the last time I tried it). Very rarely the ringtest would pass, but then it would fail somewhere else. I never could figure it out since as far as I could tell all the addresses and logic was correct. It wasn't important enough for me to work more on it, but I'd be happy to test code. I'm travelling for the next week and a half, so I can't do anything right now. One bug I found when working on drm/kms support for the ppc was that in struct ttm_bus_placement the base address type was wrong: it should be phys_addr_t, not unsigned long. The PPC460 is in 32-bit mode but physical addresses are 32 bits. The patch below fixes that. I always wanted to post this fix, but never got around to it... Regards, Hans Signed-off-by: Hans Verkuil hans.verk...@cisco.com --- arch/powerpc/sysdev/ppc4xx_msi.c |6 +++--- drivers/gpu/drm/radeon/radeon_device.c |2 +- include/drm/ttm/ttm_bo_api.h |2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 49b0659..fa33568 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1066,7 +1066,7 @@ int radeon_device_init(struct radeon_device *rdev, if (rdev-rmmio == NULL) { return -ENOMEM; } - DRM_INFO(register mmio base: 0x%08X\n, (uint32_t)rdev-rmmio_base); + DRM_INFO(register mmio base: 0x%llx\n, (uint64_t)rdev-rmmio_base); DRM_INFO(register mmio size: %u\n, (unsigned)rdev-rmmio_size); /* io port mapping */ diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 3cb5d84..fcdb208 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -81,7 +81,7 @@ struct ttm_placement { */ struct ttm_bus_placement { void*addr; - unsigned long base; + phys_addr_t base; unsigned long size; unsigned long offset; boolis_iomem; -- 1.7.10.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm/radeon: ring test failed on PA-RISC Linux
On Tue, Sep 10, 2013 at 01:20:57PM +0400, Alex Ivanov wrote: Alex, 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а): On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: Folks, We (people at linux-parisc @ vger.kernel.org mail list) are trying to make native video options of the latest PA-RISC servers and workstations (these are ATIs, most of which are based on R100/R300/R420 chips) work correctly on this platform (big endian pa-risc). However, we hadn't much success. DRM fails every time with ring test failed for both AGP PCI. Maybe you would give us some suggestions that we could check? Topic started here: http://www.spinics.net/lists/linux-parisc/msg04908.html And continued there: http://www.spinics.net/lists/linux-parisc/msg04995.html http://www.spinics.net/lists/linux-parisc/msg05006.html Problems we've already resolved without any signs of progress: - Checked the successful microcode load parisc AGP GART code writes IOMMU entries in the wrong byte order and doesn't add the coherency information SBA code adds our PCI BAR setup doesn't really work very well together with the Radeon DRM address setup. DRM will generate addresses, which are even outside of the connected LBA Things planned for a check: The drivers/video/aty uses an endian config bit DRM doesn't use, but I haven't tested whether this makes a difference and how it is connected to the overall picture. I don't think that will any difference. radeon kms works fine on other big endian platforms such as powerpc. Good! I'll opt it out then. The Rage128 product revealed a weakness in some motherboard chipsets in that there is no mechanism to guarantee that data written by the CPU to memory is actually in a readable state before the Graphics Controller receives an update to its copy of the Write Pointer. In an effort to alleviate this problem, weve introduced a mechanism into the Graphics Controller that will delay the actual write to the Write Pointer for some programmable amount of time, in order to give the chipset time to flush its internal write buffers to memory. There are two register fields that control this mechanism: PRE_WRITE_TIMER and PRE_WRITE_LIMIT. In the radeon DRM codebase I didn't found anyone using/setting those registers. Maybe PA-RISC has some problem here?... I doubt it. If you are using AGP, I'd suggest disabling it and first try to get things working using the on chip gart rather than AGP. Load radeon with agpmode=-1. Already tried this without any luck. Anyway, a radeon driver fallbacks to the PCI mode in our case, so does it really matter? In addition, people with PCI cards experiencing the same issue... The on chip gart always uses cache snooped pci transactions and the driver assumes pci is cache coherent. On AGP/PCI chips, the on-chip gart mechanism stores the gart table in system ram. On PCIE asics, the gart table is stored in vram. The gart page table maps system pages to a contiguous aperture in the GPU's address space. The ring lives in gart memory. The GPU sees a contiguous buffer and the gart mechanism handles the access to the backing pages via the page table. I'd suggest verifying that the entries written to the gart page table are valid and then the information written to the ring buffer is valid before updating the ring's wptr in radeon_ring_unlock_commit(). Changing the wptr is what causes the CP to start fetching data from the ring. Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :( Is this platform enabling the SWIOTLB layer? The reason I am asking is b/c if you do indeed enable it you end up using the TTM DMA pool which allocates pages using the dma_alloc_coherent - which means that all of the pages that come out of TTM are already 'DMA' mapped. And that means the radeon_gart_bind and all its friends use the DMA addresses that have been constructed by SWIOTLB IOMMU. Perhaps the PA-RISC IOMMU creates the DMA addresses differently? When the card gets programmed, you do end up using ttm_agp_bind right? I am wondering if something like this: https://lkml.org/lkml/2010/12/6/512 is needed to pass in the right DMA address? --- radeon_device.c.orig 2013-09-10 08:55:05.0 + +++ radeon_device.c 2013-09-10 09:12:17.0 + @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon { if (rdev-dummy_page.page) return 0; - rdev-dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO); - if (rdev-dummy_page.page == NULL) + rdev-dummy_page.page = dma_alloc_coherent(rdev-pdev-dev, PAGE_SIZE, + rdev-dummy_page.addr, GFP_DMA32|GFP_KERNEL); + if (!rdev-dummy_page.page) return -ENOMEM; -
Re: drm/radeon: ring test failed on PA-RISC Linux
On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: Folks, We (people at linux-parisc @ vger.kernel.org mail list) are trying to make native video options of the latest PA-RISC servers and workstations (these are ATIs, most of which are based on R100/R300/R420 chips) work correctly on this platform (big endian pa-risc). However, we hadn't much success. DRM fails every time with ring test failed for both AGP PCI. Maybe you would give us some suggestions that we could check? Topic started here: http://www.spinics.net/lists/linux-parisc/msg04908.html And continued there: http://www.spinics.net/lists/linux-parisc/msg04995.html http://www.spinics.net/lists/linux-parisc/msg05006.html Problems we've already resolved without any signs of progress: - Checked the successful microcode load parisc AGP GART code writes IOMMU entries in the wrong byte order and doesn't add the coherency information SBA code adds our PCI BAR setup doesn't really work very well together with the Radeon DRM address setup. DRM will generate addresses, which are even outside of the connected LBA Things planned for a check: The drivers/video/aty uses an endian config bit DRM doesn't use, but I haven't tested whether this makes a difference and how it is connected to the overall picture. I don't think that will any difference. radeon kms works fine on other big endian platforms such as powerpc. The Rage128 product revealed a weakness in some motherboard chipsets in that there is no mechanism to guarantee that data written by the CPU to memory is actually in a readable state before the Graphics Controller receives an update to its copy of the Write Pointer. In an effort to alleviate this problem, weve introduced a mechanism into the Graphics Controller that will delay the actual write to the Write Pointer for some programmable amount of time, in order to give the chipset time to flush its internal write buffers to memory. There are two register fields that control this mechanism: PRE_WRITE_TIMER and PRE_WRITE_LIMIT. In the radeon DRM codebase I didn't found anyone using/setting those registers. Maybe PA-RISC has some problem here?... I doubt it. If you are using AGP, I'd suggest disabling it and first try to get things working using the on chip gart rather than AGP. Load radeon with agpmode=-1. The on chip gart always uses cache snooped pci transactions and the driver assumes pci is cache coherent. On AGP/PCI chips, the on-chip gart mechanism stores the gart table in system ram. On PCIE asics, the gart table is stored in vram. The gart page table maps system pages to a contiguous aperture in the GPU's address space. The ring lives in gart memory. The GPU sees a contiguous buffer and the gart mechanism handles the access to the backing pages via the page table. I'd suggest verifying that the entries written to the gart page table are valid and then the information written to the ring buffer is valid before updating the ring's wptr in radeon_ring_unlock_commit(). Changing the wptr is what causes the CP to start fetching data from the ring. Alex Thanks. Пересылаемое сообщение 04.08.2013, 15:06, Alex Ivanov gnido...@p0n4ik.tk: 11.07.2013, 23:48, Helge Deller del...@gmx.de: adding linux parisc mailing list...: On 07/11/2013 09:46 PM, Helge Deller wrote: On 07/10/2013 11:29 PM, Alex Ivanov wrote: 11.07.2013, 01:14, Matt Turner matts...@gmail.com: On Wed, Jul 10, 2013 at 1:19 PM, Alex Ivanov gnido...@p0n4ik.tk wrote: Thank you so much! Your guess looks to be right. After applying of your patch there was no more KP and X just worked. Nice! Does DRI work? Not on my side. Plus i can't visually jump over 8bit depth, although Xorg states 24bit in it's log. As for DRI, i'm experiencing ring test failed (scratch(0x15E4)=0xCAFEDEAD) with a firegl x3. FWIW, I'm seeing the same failure on my FireGL X1: 80:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Radeon R300 NG [FireGL X1] (rev 80) [drm] radeon: irq initialized. [drm] Loading R300 Microcode [drm] radeon: ring at 0x60001000 [drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD) [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22). radeon :80:00.0: failed initializing CP (-22). radeon :80:00.0: Disabling GPU acceleration [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP. [drm] radeon: cp finalized [drm] radeon: cp finalized I still have no clue why this happens. Broken SBA IOMMU / DRM code? Missing syncing primitives? Should we forward this to dri-devel mail list? -- To unsubscribe from this list: send the line unsubscribe linux-parisc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Завершение пересылаемого сообщения