Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-26 Thread Alex Ivanov
Let's go futher.

25.09.2013, 22:58, Alex Ivanov gnido...@p0n4ik.tk:

  25.09.2013, 21:28, Konrad Rzeszutek Wilk konrad.w...@oracle.com:
   I took a look at the arch/parisc/kernel/pci-dma.c and I see that
   is mostly a flat platform. That is bus addresses == physical addresses.
   Unless it is an pclx or pclx2 CPU type (huh?) - if its it that
   then any calls to dma_alloc_coherent will map memory out of a pool.
   In essence it will look like a SWIOTLB bounce buffer.
  arch/parisc/kernel/pci-dma.c:
  ** PARISC 1.1 Dynamic DMA mapping support.
  ** This implementation is for PA-RISC platforms that do not support
  ** I/O TLBs (aka DMA address translation hardware).

  That's very old. PA-RISC 2.0 came into the game circa 1996.
  PA-RISC 1.1 is 32-bit only and i even not sure whether these machines
  had PCI bus.

  Only old boxes (PA7200 CPU and lower) cannot use dma_alloc_coherent()
  (and forced to do syncs iirc). That's not our case.
  And PA-RISC configs have 'Discontiguous Memory' choosen.
   But interestingly enough there is a lot of 'flush_kernel_dcache_range'
   call for every DMA operation.
  And I think the you need to do
   dma_sync_for_cpu call in the radeon_test_writeback for it to
   use the flush_kernel_dcache_range.

I was correct regarding syncs.
In our case (SBA IOMMU) dma_sync* calls are no-ops:

sba_iommu.c:
static struct hppa_dma_ops sba_ops = {
...
 .dma_sync_single_for_cpu =  NULL,
.dma_sync_single_for_device =   NULL,
 .dma_sync_sg_for_cpu =  NULL,
.dma_sync_sg_for_device =   NULL,
}

dma-mapping.h:
 dma_cache_sync(struct device *dev, void *vaddr, size_t size,
   enum dma_data_direction direction)
 {
if(hppa_dma_ops-dma_sync_single_for_cpu)
flush_kernel_dcache_range((unsigned long)vaddr, size);
 }

So i'll skip doing the flush_kernel_dcache_range().

  I don't know what the
   flush_kernel_dcache_range does thought so I could be wrong.
  D-cache is a CPU cache (if they meant it).
  Seems to be L1-level. There is an I-cache at same level.
   You are missing a translation here (you were comparing the virtual address
   to the bus address). I was thinking something along this:
  Yes, this confused me. I've translated your suggestion literally :\
   unsigned int pfn = page_to_pfn(ttm-pages[i]);
   dma_addr_t bus =  gtt-ttm.dma_address[i];
   void *va_bus, *va, *va_pfn;

   if ((pfn  PAGE_SHIFT) != bus)
   printk(Bus 0x%lx != PFN 0x%lx, bus, pfn  
 PAGE_SHIFT); /* OK, that means
   bus addresses are different */

   va_bus = bus_to_virt(gtt-ttm.dma_address[i]);
   va_pfn = __va(pfn  PAGE_SHIFT);

   if (!virt_addr_valid(va_bus))
   printk(va_bus (0x%lx) not good!\n, va_bus);
   if (!virt_addr_valid(va_pfn))
   printk(va_pfn (0x%lx) not good!\n, va_pfn);

   /* We got VA for both bus - va, and pfn - va. Should be 
 the
  same if bus and physical addresses are on the same 
 namespace. */
   if (va_bus != va_pfn)
   printk(va bus:%lx != va pfn: %lx\n, va_bus, 
 va_pfn);

   /* Now that we have bus - pa - va (va_bus) try to go 
 va_bus - bus address.
  The bus address should be the same */
   if (gtt-tmm.dma_address[i] != virt_to_bus(va_bus))
   printk(bus-pa-va:%lx != bus-pa-va-ba: 
 %lx\n, gtt-tmm.dma_address[i],virt_to_bus(va_bus));

Ok, slightly modified:

struct page *page = ttm-pages[i];
unsigned long pfn = page_to_pfn(page);
dma_addr_t bus = gtt-ttm.dma_address[i];
void *va_bus, *va, *va_pfn;

BUG_ON(!pfn_valid(pfn));
//BUG_ON(!page_mapping(page)); // Leads to a kernel BUG

/* Avoid floodage */
if (i % 100 == 0) {
if ((pfn  PAGE_SHIFT) != bus)
printk(Bus 0x%lx != PFN 0x%lx\n, bus, pfn  PAGE_SHIFT); /*
OK, that means bus addresses are different */

va_bus = bus_to_virt(bus);
va_pfn = __va(pfn  PAGE_SHIFT);

if (!virt_addr_valid(va_bus))
 printk(va_bus (0x%lx) not good!\n, va_bus);

if (!virt_addr_valid(va_pfn))
printk(va_pfn (0x%lx) not good!\n, va_pfn);

/* We got VA for both bus - va, and pfn - va. Should be the
same if bus and physical addresses are on the same namespace. */
if (va_bus != va_pfn)
printk(va bus: %lx != va pfn: %lx\n, va_bus, va_pfn);

/* Now that we have bus - pa - va (va_bus) try to go va_bus - bus 
address.
The bus address should be the same */
if (bus != virt_to_bus(va_bus))
printk(bus-pa-va: %lx != bus-pa-va-ba: %lx\n, 
bus,virt_to_bus(va_bus));
}

Output:
Bus 0x4028 != PFN 

Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-25 Thread Alex Ivanov
24.09.2013, 00:11, Konrad Rzeszutek Wilk konrad.w...@oracle.com:
 On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote:

  21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а):
  The register writes seems to be going through the register backbone 
 correctly:

  [0x00B] 0x15E0=0x
  [0x00C] 0x15E4=0xCAFEDEAD
  [0x00D] 0x4274=0x000F
  [0x00E] 0x42C8=0x0007
  [0x00F] 0x4018=0x001D
  [0x010] 0x170C=0x8000
  [0x011] 0x3428=0x00020100
  [0x012] 0x15E4=0xCAFEDEAD

  You can see the 0xCAFEDEAD written to the scratch register via MMIO
  from the ring_test(). The CP fifo however seems to be full of garbage.
  The CP is busy though, so it seems to be functional.  I guess it's
  just fetching garbage rather than commands.

 If it is fetching garbage, that would imply the DMA (or bus addresses)
 that are programmed in the GART are bogus. If you dump them and try
 to figure out if bus adress - physical address - virtual address ==
 virtual address - bus address that could help. And perhaps seeing what
 the virtual address has - and or poisoning it with known data?

 Or perhaps the the card has picked up an incorrect page table? Meaning
 the (bus) address given to it is not the correct one?


Konrad,

Let's see. Please notice that i'm not PA-RISC or general linux kernel
developer, just the user, so i may do things completely wrong. 
I was hoping that PA-RISC smarties will join me here, but they seem
to be busy with other duties. Even port's mail list activity is low 
during last weeks.

 If you dump them and try
 to figure out if bus adress - physical address - virtual address ==
 virtual address - bus address that could help

With following

radeon/radeon_ttm.c:

radeon_ttm_tt_populate():
...
for (i = 0; i  ttm-num_pages; i++) {
gtt-ttm.dma_address[i] = pci_map_page(rdev-pdev, 
ttm-pages[i],
   0, PAGE_SIZE,
   PCI_DMA_BIDIRECTIONAL);

void *va = bus_to_virt(gtt-ttm.dma_address[i]);
if ((phys_addr_t) va != virt_to_bus(va)) {
 DRM_INFO(MISMATCH: %p != %p\n, va, (void *) 
virt_to_bus(va));
 /*DRM_INFO(CONTENTS: %x\n, *((uint32_t *)va));*/ // 
Leads to a Kernel Fault
 ...
}

I'm getting the output:

[drm] MISMATCH: 8028 != 4028
[drm] MISMATCH: 80281000 != 40281000
...

How can i check the same for an AGP mode?

 Or perhaps the the card has picked up an incorrect page table? Meaning
 the (bus) address given to it is not the correct one?

I'll see.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-25 Thread Alex Deucher
On Wed, Sep 25, 2013 at 1:28 PM, Konrad Rzeszutek Wilk
konrad.w...@oracle.com wrote:
 On Wed, Sep 25, 2013 at 08:29:07PM +0400, Alex Ivanov wrote:
 24.09.2013, 00:11, Konrad Rzeszutek Wilk konrad.w...@oracle.com:
  On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote:
 
   21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а):
   The register writes seems to be going through the register backbone 
  correctly:
 
   [0x00B] 0x15E0=0x
   [0x00C] 0x15E4=0xCAFEDEAD
   [0x00D] 0x4274=0x000F
   [0x00E] 0x42C8=0x0007
   [0x00F] 0x4018=0x001D
   [0x010] 0x170C=0x8000
   [0x011] 0x3428=0x00020100
   [0x012] 0x15E4=0xCAFEDEAD
 
   You can see the 0xCAFEDEAD written to the scratch register via MMIO
   from the ring_test(). The CP fifo however seems to be full of garbage.
   The CP is busy though, so it seems to be functional.  I guess it's
   just fetching garbage rather than commands.
 
  If it is fetching garbage, that would imply the DMA (or bus addresses)
  that are programmed in the GART are bogus. If you dump them and try
  to figure out if bus adress - physical address - virtual address ==
  virtual address - bus address that could help. And perhaps seeing what
  the virtual address has - and or poisoning it with known data?
 
  Or perhaps the the card has picked up an incorrect page table? Meaning
  the (bus) address given to it is not the correct one?
 

 Konrad,

 Let's see. Please notice that i'm not PA-RISC or general linux kernel
 developer, just the user, so i may do things completely wrong.
 I was hoping that PA-RISC smarties will join me here, but they seem
 to be busy with other duties. Even port's mail list activity is low
 during last weeks.

 I took a look at the arch/parisc/kernel/pci-dma.c and I see that
 is mostly a flat platform. That is bus addresses == physical addresses.
 Unless it is an pclx or pclx2 CPU type (huh?) - if its it that
 then any calls to dma_alloc_coherent will map memory out of a pool.
 In essence it will look like a SWIOTLB bounce buffer.

 But interestingly enough there is a lot of 'flush_kernel_dcache_range'
 call for every DMA operation. And I think the you need to do
 dma_sync_for_cpu call in the radeon_test_writeback for it to
 use the flush_kernel_dcache_range. I don't know what the
 flush_kernel_dcache_range does thought so I could be wrong.

 That means you can ignore the little code below I wrote and
 see about doing something like this:


 diff --git a/drivers/gpu/drm/radeon/radeon_cp.c 
 b/drivers/gpu/drm/radeon/radeon_cp.c
 index 3cae2bb..9e5923d 100644
 --- a/drivers/gpu/drm/radeon/radeon_cp.c
 +++ b/drivers/gpu/drm/radeon/radeon_cp.c
 @@ -876,6 +876,7 @@ static void radeon_test_writeback(drm_radeon_private_t * 
 dev_priv)

 RADEON_WRITE(RADEON_SCRATCH_REG1, 0xdeadbeef);

 +   flush_kernel_dcache_range(dev_priv-ring_rptr, PAGE_SIZE);
 for (tmp = 0; tmp  dev_priv-usec_timeout; tmp++) {
 u32 val;



You'd want to add the add the flush to r100_ring_test() in r100.c.
radeon_cp.c is for the old UMS support.

Alex


 But that is probably a shot in the dark. I have no clue what the flush_..
 is doing.

 [edit: And then I noticed sba_iommu.c, which is a complete IOMMU driver
 where bus and physical addresses are different. sigh. What type of machine
 is this? Does it have the IOMMU in it?]

  If you dump them and try
  to figure out if bus adress - physical address - virtual address ==
  virtual address - bus address that could help

 With following

 radeon/radeon_ttm.c:

 radeon_ttm_tt_populate():
 ...
 for (i = 0; i  ttm-num_pages; i++) {
 gtt-ttm.dma_address[i] = pci_map_page(rdev-pdev, 
 ttm-pages[i],
0, PAGE_SIZE,

 PCI_DMA_BIDIRECTIONAL);

 void *va = bus_to_virt(gtt-ttm.dma_address[i]);
 if ((phys_addr_t) va != virt_to_bus(va)) {

 You are missing a translation here (you were comparing the virtual address
 to the bus address). I was thinking something along this:

 unsigned int pfn = page_to_pfn(ttm-pages[i]);
 dma_addr_t bus =  gtt-ttm.dma_address[i];
 void *va_bus, *va, *va_pfn;

 if ((pfn  PAGE_SHIFT) != bus)
 printk(Bus 0x%lx != PFN 0x%lx, bus, pfn  
 PAGE_SHIFT); /* OK, that means
 bus addresses are different */

 va_bus = bus_to_virt(gtt-ttm.dma_address[i]);
 va_pfn = __va(pfn  PAGE_SHIFT);

 if (!virt_addr_valid(va_bus))
 printk(va_bus (0x%lx) not good!\n, va_bus);
 if (!virt_addr_valid(va_pfn))
 printk(va_pfn (0x%lx) not good!\n, va_pfn);

 /* We got VA for both bus - va, and pfn - va. Should be the
same if bus and physical addresses are on the same 
 namespace. 

Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-25 Thread Alex Ivanov
Alex,

 You'd want to add the add the flush to r100_ring_test() in r100.c.
 radeon_cp.c is for the old UMS support.

Right!

Konrad,
Thanks for the code! I'll try asap.

25.09.2013, 21:28, Konrad Rzeszutek Wilk konrad.w...@oracle.com:
 I took a look at the arch/parisc/kernel/pci-dma.c and I see that
 is mostly a flat platform. That is bus addresses == physical addresses.
 Unless it is an pclx or pclx2 CPU type (huh?) - if its it that
 then any calls to dma_alloc_coherent will map memory out of a pool.
 In essence it will look like a SWIOTLB bounce buffer.

arch/parisc/kernel/pci-dma.c:
** PARISC 1.1 Dynamic DMA mapping support.
** This implementation is for PA-RISC platforms that do not support
** I/O TLBs (aka DMA address translation hardware).

That's very old. PA-RISC 2.0 came into the game circa 1996.
PA-RISC 1.1 is 32-bit only and i even not sure whether these machines
had PCI bus.

Only old boxes (PA7200 CPU and lower) cannot use dma_alloc_coherent()
(and forced to do syncs iirc). That's not our case.
And PA-RISC configs have 'Discontiguous Memory' choosen.


 But interestingly enough there is a lot of 'flush_kernel_dcache_range'
 call for every DMA operation. And I think the you need to do
 dma_sync_for_cpu call in the radeon_test_writeback for it to
 use the flush_kernel_dcache_range. I don't know what the
 flush_kernel_dcache_range does thought so I could be wrong.

D-cache is a CPU cache (if they meant it). 
Seems to be L1-level. There is an I-cache at same level.


 That means you can ignore the little code below I wrote and
 see about doing something like this:

 diff --git a/drivers/gpu/drm/radeon/radeon_cp.c 
 b/drivers/gpu/drm/radeon/radeon_cp.c
 index 3cae2bb..9e5923d 100644
 --- a/drivers/gpu/drm/radeon/radeon_cp.c
 +++ b/drivers/gpu/drm/radeon/radeon_cp.c
 @@ -876,6 +876,7 @@ static void radeon_test_writeback(drm_radeon_private_t * 
 dev_priv)

  RADEON_WRITE(RADEON_SCRATCH_REG1, 0xdeadbeef);

 + flush_kernel_dcache_range(dev_priv-ring_rptr, PAGE_SIZE);
  for (tmp = 0; tmp  dev_priv-usec_timeout; tmp++) {
  u32 val;

 But that is probably a shot in the dark. I have no clue what the flush_..
 is doing.

 [edit: And then I noticed sba_iommu.c, which is a complete IOMMU driver
 where bus and physical addresses are different. sigh. What type of machine
 is this? Does it have the IOMMU in it?]

That's our case.
Yes, recent IA64 and PA-RISC machines have SBA IOMMU device. PCI I/O
seem to go through it. There is a note for my chipset in sba_iommu.c:

/* We are just encouraging 32-bit DMA masks here since we can
 * never allow IOMMU bypass unless we add special support for ZX1.
 */

And it indeed right. When i've tried to bypass hw IOMMU like in ia64
code it lead to the faults from drivers which do the DMA (like Fusion MPT SCSI
driver).

  void *va = bus_to_virt(gtt-ttm.dma_address[i]);
  if ((phys_addr_t) va != virt_to_bus(va)) {

 You are missing a translation here (you were comparing the virtual address
 to the bus address). I was thinking something along this:

Yes, this confused me. I've translated your suggestion literally :\


 unsigned int pfn = page_to_pfn(ttm-pages[i]);
 dma_addr_t bus =  gtt-ttm.dma_address[i];
 void *va_bus, *va, *va_pfn;

 if ((pfn  PAGE_SHIFT) != bus)
 printk(Bus 0x%lx != PFN 0x%lx, bus, pfn  
 PAGE_SHIFT); /* OK, that means
 bus addresses are different */

 va_bus = bus_to_virt(gtt-ttm.dma_address[i]);
 va_pfn = __va(pfn  PAGE_SHIFT);

 if (!virt_addr_valid(va_bus))
 printk(va_bus (0x%lx) not good!\n, va_bus);
 if (!virt_addr_valid(va_pfn))
 printk(va_pfn (0x%lx) not good!\n, va_pfn);

 /* We got VA for both bus - va, and pfn - va. Should be the
    same if bus and physical addresses are on the same 
 namespace. */
 if (va_bus != va_pfn)
 printk(va bus:%lx != va pfn: %lx\n, va_bus, va_pfn);

 /* Now that we have bus - pa - va (va_bus) try to go va_bus 
 - bus address.
    The bus address should be the same */
 if (gtt-tmm.dma_address[i] != virt_to_bus(va_bus))
 printk(bus-pa-va:%lx != bus-pa-va-ba: %lx\n, 
 gtt-tmm.dma_address[i],virt_to_bus(va_bus));

   DRM_INFO(MISMATCH: %p != %p\n, va, (void *) 
 virt_to_bus(va));
   /*DRM_INFO(CONTENTS: %x\n, *((uint32_t *)va));*/ // 
 Leads to a Kernel Fault

 That is odd. I would have thought it would be usuable.

   ...
  }

  I'm getting the output:

  [drm] MISMATCH: 8028 != 4028

 In theory that means the bus address that is programmed in 
 (gtt-dma_address[i])
 is 4028 (which is what 

Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-23 Thread Konrad Rzeszutek Wilk
On Sat, Sep 21, 2013 at 07:39:10AM +0400, Alex Ivanov wrote:
 21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а):
 
  On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
  17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а):
  
  On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
  Alex,
  
  10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):
  
  The dummy page isn't really going to help much.  That page is just
  used as a safety placeholder for gart entries that aren't mapped on
  the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
  the backing pages for the gart.
  
  You may want to look there.
  
  Ah, sorry. Indeed. Though, my idea with:
  
  On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
  
  Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
  dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
  
  doesn't make a sense at TTM part as well.
  
  After the driver is loaded, you can dump some info from debugfs:
  r100_rbbm_info
  r100_cp_ring_info
  r100_cp_csq_fifo
  Which will dump a bunch of registers and internal fifos so we can see
  that the chip actually processed.
  
  Alex
  
  Reading of r100_cp_ring_info leads to a KP:
  
  r100_debugfs_cp_ring_info():
  count = (rdp + ring-ring_size - wdp)  ring-ptr_mask;
  i = (rdp + j)  ring-ptr_mask;
  
 for (j = 0; j = count; j++) {
 i = (rdp + j)  ring-ptr_mask;
 -- Here at first iteration --
 -- count = 262080, i = 0 --
 seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]);
 }
  
  Reading of radeon_ring_gfx (which i've additionally tried to read)
  throws an MCE:
  
  radeon_debugfs_ring_info():
  count = (ring-ring_size / 4) - ring-ring_free_dw;
  i = (ring-rptr + ring-ptr_mask + 1 - 32)  ring-ptr_mask;
  
 for (j = 0; j = (count + 32); j++) {
 -- Here at first iteration --
 -- i = 262112, j = 0 --
 seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]);
 i = (i + 1)  ring-ptr_mask;
 }
  
  I'm attaching debug outputs on kernel built with these loops commented.
  
  The register writes seems to be going through the register backbone 
  correctly:
  
  [0x00B] 0x15E0=0x
  [0x00C] 0x15E4=0xCAFEDEAD
  [0x00D] 0x4274=0x000F
  [0x00E] 0x42C8=0x0007
  [0x00F] 0x4018=0x001D
  [0x010] 0x170C=0x8000
  [0x011] 0x3428=0x00020100
  [0x012] 0x15E4=0xCAFEDEAD
  
  You can see the 0xCAFEDEAD written to the scratch register via MMIO
  from the ring_test(). The CP fifo however seems to be full of garbage.
  The CP is busy though, so it seems to be functional.  I guess it's
  just fetching garbage rather than commands.

If it is fetching garbage, that would imply the DMA (or bus addresses)
that are programmed in the GART are bogus. If you dump them and try
to figure out if bus adress - physical address - virtual address ==
virtual address - bus address that could help. And perhaps seeing what
the virtual address has - and or poisoning it with known data?

Or perhaps the the card has picked up an incorrect page table? Meaning
the (bus) address given to it is not the correct one?

  
  Does doing a posted write when writing to the ring buffer help?
 
 Unfortunately, no.
 
  
  diff --git a/drivers/gpu/drm/radeon/radeon_ring.c
  b/drivers/gpu/drm/radeon/radeon_ring.c
  index a890756..b4f04d2 100644
  --- a/drivers/gpu/drm/radeon/radeon_ring.c
  +++ b/drivers/gpu/drm/radeon/radeon_ring.c
  @@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct
  radeon_device *rdev, struct radeon_ri
   */
  void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
  {
  +   u32 tmp;
  #if DRM_DEBUG_CODE
 if (ring-count_dw = 0) {
 DRM_ERROR(radeon: writing more dwords to the ring
  than expected!\n);
 }
  #endif
 ring-ring[ring-wptr++] = v;
  +   tmp = ring-ring[ring-wptr - 1];
 ring-wptr = ring-ptr_mask;
 ring-count_dw--;
 ring-ring_free_dw--;
 
 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-20 Thread Alex Ivanov
17.09.2013, в 23:33, Alex Ivanov gnido...@p0n4ik.tk написал(а):

 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а):
 
 On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,
 
 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):
 
 The dummy page isn't really going to help much.  That page is just
 used as a safety placeholder for gart entries that aren't mapped on
 the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
 the backing pages for the gart.
 
 You may want to look there.
 
 Ah, sorry. Indeed. Though, my idea with:
 
 On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 
 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
 
 doesn't make a sense at TTM part as well.
 
 After the driver is loaded, you can dump some info from debugfs:
 r100_rbbm_info
 r100_cp_ring_info
 r100_cp_csq_fifo
 Which will dump a bunch of registers and internal fifos so we can see
 that the chip actually processed.
 
 Alex
 
 Reading of r100_cp_ring_info leads to a KP:
 
 r100_debugfs_cp_ring_info():
 count = (rdp + ring-ring_size - wdp)  ring-ptr_mask;
 i = (rdp + j)  ring-ptr_mask;
 
for (j = 0; j = count; j++) {
i = (rdp + j)  ring-ptr_mask;
   -- Here at first iteration --
   -- count = 262080, i = 0 --
seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]);
}
 
 Reading of radeon_ring_gfx (which i've additionally tried to read) 
 throws an MCE:
 
 radeon_debugfs_ring_info():
 count = (ring-ring_size / 4) - ring-ring_free_dw;
 i = (ring-rptr + ring-ptr_mask + 1 - 32)  ring-ptr_mask;
 
for (j = 0; j = (count + 32); j++) {
   -- Here at first iteration --
   -- count = 64, i = 262112 --
seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]);
i = (i + 1)  ring-ptr_mask;
}
 
 I'm attaching debug outputs on kernel built with these loops commented.
 drm_parisc_debug.tgz___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel

The ring-ring is NULL...
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-20 Thread Alex Deucher
On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а):

 On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,

 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):

 The dummy page isn't really going to help much.  That page is just
 used as a safety placeholder for gart entries that aren't mapped on
 the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
 the backing pages for the gart.

 You may want to look there.

 Ah, sorry. Indeed. Though, my idea with:

 On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:

 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

 doesn't make a sense at TTM part as well.

 After the driver is loaded, you can dump some info from debugfs:
 r100_rbbm_info
 r100_cp_ring_info
 r100_cp_csq_fifo
 Which will dump a bunch of registers and internal fifos so we can see
 that the chip actually processed.

 Alex

 Reading of r100_cp_ring_info leads to a KP:

 r100_debugfs_cp_ring_info():
 count = (rdp + ring-ring_size - wdp)  ring-ptr_mask;
 i = (rdp + j)  ring-ptr_mask;

 for (j = 0; j = count; j++) {
 i = (rdp + j)  ring-ptr_mask;
 -- Here at first iteration --
 -- count = 262080, i = 0 --
 seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]);
 }

 Reading of radeon_ring_gfx (which i've additionally tried to read)
 throws an MCE:

 radeon_debugfs_ring_info():
 count = (ring-ring_size / 4) - ring-ring_free_dw;
 i = (ring-rptr + ring-ptr_mask + 1 - 32)  ring-ptr_mask;

 for (j = 0; j = (count + 32); j++) {
 -- Here at first iteration --
 -- i = 262112, j = 0 --
 seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]);
 i = (i + 1)  ring-ptr_mask;
 }

 I'm attaching debug outputs on kernel built with these loops commented.

The register writes seems to be going through the register backbone correctly:

[0x00B] 0x15E0=0x
[0x00C] 0x15E4=0xCAFEDEAD
[0x00D] 0x4274=0x000F
[0x00E] 0x42C8=0x0007
[0x00F] 0x4018=0x001D
[0x010] 0x170C=0x8000
[0x011] 0x3428=0x00020100
[0x012] 0x15E4=0xCAFEDEAD

You can see the 0xCAFEDEAD written to the scratch register via MMIO
from the ring_test(). The CP fifo however seems to be full of garbage.
 The CP is busy though, so it seems to be functional.  I guess it's
just fetching garbage rather than commands.

Does doing a posted write when writing to the ring buffer help?

diff --git a/drivers/gpu/drm/radeon/radeon_ring.c
b/drivers/gpu/drm/radeon/radeon_ring.c
index a890756..b4f04d2 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct
radeon_device *rdev, struct radeon_ri
  */
 void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
 {
+   u32 tmp;
 #if DRM_DEBUG_CODE
if (ring-count_dw = 0) {
DRM_ERROR(radeon: writing more dwords to the ring
than expected!\n);
}
 #endif
ring-ring[ring-wptr++] = v;
+   tmp = ring-ring[ring-wptr - 1];
ring-wptr = ring-ptr_mask;
ring-count_dw--;
ring-ring_free_dw--;
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-20 Thread Alex Ivanov
21.09.2013, в 1:27, Alex Deucher alexdeuc...@gmail.com написал(а):

 On Tue, Sep 17, 2013 at 3:33 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а):
 
 On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,
 
 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):
 
 The dummy page isn't really going to help much.  That page is just
 used as a safety placeholder for gart entries that aren't mapped on
 the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
 the backing pages for the gart.
 
 You may want to look there.
 
 Ah, sorry. Indeed. Though, my idea with:
 
 On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 
 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
 
 doesn't make a sense at TTM part as well.
 
 After the driver is loaded, you can dump some info from debugfs:
 r100_rbbm_info
 r100_cp_ring_info
 r100_cp_csq_fifo
 Which will dump a bunch of registers and internal fifos so we can see
 that the chip actually processed.
 
 Alex
 
 Reading of r100_cp_ring_info leads to a KP:
 
 r100_debugfs_cp_ring_info():
 count = (rdp + ring-ring_size - wdp)  ring-ptr_mask;
 i = (rdp + j)  ring-ptr_mask;
 
for (j = 0; j = count; j++) {
i = (rdp + j)  ring-ptr_mask;
-- Here at first iteration --
-- count = 262080, i = 0 --
seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]);
}
 
 Reading of radeon_ring_gfx (which i've additionally tried to read)
 throws an MCE:
 
 radeon_debugfs_ring_info():
 count = (ring-ring_size / 4) - ring-ring_free_dw;
 i = (ring-rptr + ring-ptr_mask + 1 - 32)  ring-ptr_mask;
 
for (j = 0; j = (count + 32); j++) {
-- Here at first iteration --
-- i = 262112, j = 0 --
seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]);
i = (i + 1)  ring-ptr_mask;
}
 
 I'm attaching debug outputs on kernel built with these loops commented.
 
 The register writes seems to be going through the register backbone correctly:
 
 [0x00B] 0x15E0=0x
 [0x00C] 0x15E4=0xCAFEDEAD
 [0x00D] 0x4274=0x000F
 [0x00E] 0x42C8=0x0007
 [0x00F] 0x4018=0x001D
 [0x010] 0x170C=0x8000
 [0x011] 0x3428=0x00020100
 [0x012] 0x15E4=0xCAFEDEAD
 
 You can see the 0xCAFEDEAD written to the scratch register via MMIO
 from the ring_test(). The CP fifo however seems to be full of garbage.
 The CP is busy though, so it seems to be functional.  I guess it's
 just fetching garbage rather than commands.
 
 Does doing a posted write when writing to the ring buffer help?

Unfortunately, no.

 
 diff --git a/drivers/gpu/drm/radeon/radeon_ring.c
 b/drivers/gpu/drm/radeon/radeon_ring.c
 index a890756..b4f04d2 100644
 --- a/drivers/gpu/drm/radeon/radeon_ring.c
 +++ b/drivers/gpu/drm/radeon/radeon_ring.c
 @@ -324,12 +324,14 @@ static int radeon_debugfs_ring_init(struct
 radeon_device *rdev, struct radeon_ri
  */
 void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
 {
 +   u32 tmp;
 #if DRM_DEBUG_CODE
if (ring-count_dw = 0) {
DRM_ERROR(radeon: writing more dwords to the ring
 than expected!\n);
}
 #endif
ring-ring[ring-wptr++] = v;
 +   tmp = ring-ring[ring-wptr - 1];
ring-wptr = ring-ptr_mask;
ring-count_dw--;
ring-ring_free_dw--;

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-17 Thread Alex Ivanov
Alex,

10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):

 The dummy page isn't really going to help much.  That page is just
 used as a safety placeholder for gart entries that aren't mapped on
 the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
 the backing pages for the gart.  

 You may want to look there.

Ah, sorry. Indeed. Though, my idea with:

On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:

 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

doesn't make a sense at TTM part as well.

Konrad,

10.09.2013, 17:25, Konrad Rzeszutek Wilk konrad.w...@oracle.com:

 Is this platform enabling the SWIOTLB layer? 

Doesn't look like. 

 The reason I am asking is
 b/c if you do indeed enable it you end up using the TTM DMA pool
 which allocates pages using the dma_alloc_coherent - which means that
 all of the pages that come out of TTM are already 'DMA' mapped.

 And that means the radeon_gart_bind and all its friends
 use the DMA addresses that have been constructed by SWIOTLB IOMMU.

 Perhaps the PA-RISC IOMMU creates the DMA addresses differently?

 When the card gets programmed, you do end up using ttm_agp_bind right?
 I am wondering if something like this:

 https://lkml.org/lkml/2010/12/6/512

 is needed to pass in the right DMA address?

No idea how to modify ttm_agp_bind() this way, though doesn't matter if
swiotlb isn't used anyway?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-17 Thread Alex Deucher
On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,

 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):

 The dummy page isn't really going to help much.  That page is just
 used as a safety placeholder for gart entries that aren't mapped on
 the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
 the backing pages for the gart.

 You may want to look there.

 Ah, sorry. Indeed. Though, my idea with:

 On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:

 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

 doesn't make a sense at TTM part as well.

After the driver is loaded, you can dump some info from debugfs:
r100_rbbm_info
r100_cp_ring_info
r100_cp_csq_fifo
Which will dump a bunch of registers and internal fifos so we can see
that the chip actually processed.

Alex


 Konrad,

 10.09.2013, 17:25, Konrad Rzeszutek Wilk konrad.w...@oracle.com:

 Is this platform enabling the SWIOTLB layer?

 Doesn't look like.

 The reason I am asking is
 b/c if you do indeed enable it you end up using the TTM DMA pool
 which allocates pages using the dma_alloc_coherent - which means that
 all of the pages that come out of TTM are already 'DMA' mapped.

 And that means the radeon_gart_bind and all its friends
 use the DMA addresses that have been constructed by SWIOTLB IOMMU.

 Perhaps the PA-RISC IOMMU creates the DMA addresses differently?

 When the card gets programmed, you do end up using ttm_agp_bind right?
 I am wondering if something like this:

 https://lkml.org/lkml/2010/12/6/512

 is needed to pass in the right DMA address?

 No idea how to modify ttm_agp_bind() this way, though doesn't matter if
 swiotlb isn't used anyway?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-17 Thread Alex Ivanov
17.09.2013, в 18:24, Alex Deucher alexdeuc...@gmail.com написал(а):

 On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,
 
 10.09.2013, в 16:37, Alex Deucher alexdeuc...@gmail.com написал(а):
 
 The dummy page isn't really going to help much.  That page is just
 used as a safety placeholder for gart entries that aren't mapped on
 the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
 the backing pages for the gart.
 
 You may want to look there.
 
 Ah, sorry. Indeed. Though, my idea with:
 
 On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 
 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
 
 doesn't make a sense at TTM part as well.
 
 After the driver is loaded, you can dump some info from debugfs:
 r100_rbbm_info
 r100_cp_ring_info
 r100_cp_csq_fifo
 Which will dump a bunch of registers and internal fifos so we can see
 that the chip actually processed.
 
 Alex

Reading of r100_cp_ring_info leads to a KP:

r100_debugfs_cp_ring_info():
count = (rdp + ring-ring_size - wdp)  ring-ptr_mask;
i = (rdp + j)  ring-ptr_mask;

for (j = 0; j = count; j++) {
i = (rdp + j)  ring-ptr_mask;
-- Here at first iteration --
-- count = 262080, i = 0 --
seq_printf(m, r[%04d]=0x%08x\n, i, ring-ring[i]);
}

Reading of radeon_ring_gfx (which i've additionally tried to read) 
throws an MCE:

radeon_debugfs_ring_info():
count = (ring-ring_size / 4) - ring-ring_free_dw;
i = (ring-rptr + ring-ptr_mask + 1 - 32)  ring-ptr_mask;

for (j = 0; j = (count + 32); j++) {
-- Here at first iteration --
-- i = 262112, j = 0 --
seq_printf(m, r[%5d]=0x%08x\n, i, ring-ring[i]);
i = (i + 1)  ring-ptr_mask;
}

I'm attaching debug outputs on kernel built with these loops commented.


drm_parisc_debug.tgz
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-10 Thread Alex Ivanov
Alex,

09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а):

 On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Folks,
 
 We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
 native video options of the latest PA-RISC servers and workstations
 (these are ATIs, most of which are based on R100/R300/R420 chips) work
 correctly on this platform (big endian pa-risc).
 
 However, we hadn't much success. DRM fails every time with
 ring test failed for both AGP  PCI.
 
 Maybe you would give us some suggestions that we could check?
 
 Topic started here:
 http://www.spinics.net/lists/linux-parisc/msg04908.html
 And continued there:
 http://www.spinics.net/lists/linux-parisc/msg04995.html
 http://www.spinics.net/lists/linux-parisc/msg05006.html
 
 Problems we've already resolved without any signs of progress:
 - Checked the successful microcode load
 parisc AGP GART code writes IOMMU entries in the wrong byte order and
 doesn't add the coherency information SBA code adds
 our PCI BAR setup doesn't really work very well together with the Radeon
 DRM address setup. DRM will generate addresses, which are even outside
 of the connected LBA
 
 Things planned for a check:
 The drivers/video/aty uses
 an endian config bit DRM doesn't use, but I haven't tested whether
 this makes a difference and how it is connected to the overall picture.
 
 I don't think that will any difference.  radeon kms works fine on
 other big endian platforms such as powerpc.

Good! I'll opt it out then.

 
 
 The Rage128 product revealed a weakness in some motherboard
 chipsets in that there is no mechanism to guarantee
 that data written by the CPU to memory is actually in a readable
 state before the Graphics Controller receives an
 update to its copy of the Write Pointer. In an effort to alleviate this
 problem, weve introduced a mechanism into the
 Graphics Controller that will delay the actual write to the Write Pointer
 for some programmable amount of time, in
 order to give the chipset time to flush its internal write buffers to
 memory.
 There are two register fields that control this mechanism:
 PRE_WRITE_TIMER and PRE_WRITE_LIMIT.
 
 In the radeon DRM codebase I didn't found anyone using/setting
 those registers. Maybe PA-RISC has some problem here?...
 
 I doubt it.  If you are using AGP, I'd suggest disabling it and first
 try to get things working using the on chip gart rather than AGP.
 Load radeon with agpmode=-1.  

Already tried this without any luck. Anyway, a radeon driver fallbacks
to the PCI mode in our case, so does it really matter?

In addition, people with PCI cards experiencing the same issue...

 The on chip gart always uses cache
 snooped pci transactions and the driver assumes pci is cache coherent.
 On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
 system ram.  On PCIE asics, the gart table is stored in vram.  The
 gart page table maps system pages to a contiguous aperture in the
 GPU's address space.  The ring lives in gart memory.  The GPU sees a
 contiguous buffer and the gart mechanism handles the access to the
 backing pages via the page table.  I'd suggest verifying that the
 entries written to the gart page table are valid and then the
 information written to the ring buffer is valid before updating the
 ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
 causes the CP to start fetching data from the ring.

Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

--- radeon_device.c.orig2013-09-10 08:55:05.0 +
+++ radeon_device.c 2013-09-10 09:12:17.0 +
@@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon
 {
if (rdev-dummy_page.page)
return 0;
-   rdev-dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO);
-   if (rdev-dummy_page.page == NULL)
+   rdev-dummy_page.page = dma_alloc_coherent(rdev-pdev-dev, PAGE_SIZE,
+   rdev-dummy_page.addr, GFP_DMA32|GFP_KERNEL);
+   if (!rdev-dummy_page.page)
return -ENOMEM;
-   rdev-dummy_page.addr = pci_map_page(rdev-pdev, rdev-dummy_page.page,
-   0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
if (pci_dma_mapping_error(rdev-pdev, rdev-dummy_page.addr)) {
dev_err(rdev-pdev-dev, Failed to DMA MAP the dummy page\n);
-   __free_page(rdev-dummy_page.page);
-   rdev-dummy_page.page = NULL;
+   radeon_dummy_page_fini(rdev);
return -ENOMEM;
}
return 0;
@@ -698,9 +696,8 @@ void radeon_dummy_page_fini(struct radeo
 {
if (rdev-dummy_page.page == NULL)
return;
-   pci_unmap_page(rdev-pdev, rdev-dummy_page.addr,
-   PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-   __free_page(rdev-dummy_page.page);
+   

Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-10 Thread Alex Deucher
On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,

 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а):

 On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Folks,

 We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
 native video options of the latest PA-RISC servers and workstations
 (these are ATIs, most of which are based on R100/R300/R420 chips) work
 correctly on this platform (big endian pa-risc).

 However, we hadn't much success. DRM fails every time with
 ring test failed for both AGP  PCI.

 Maybe you would give us some suggestions that we could check?

 Topic started here:
 http://www.spinics.net/lists/linux-parisc/msg04908.html
 And continued there:
 http://www.spinics.net/lists/linux-parisc/msg04995.html
 http://www.spinics.net/lists/linux-parisc/msg05006.html

 Problems we've already resolved without any signs of progress:
 - Checked the successful microcode load
 parisc AGP GART code writes IOMMU entries in the wrong byte order and
 doesn't add the coherency information SBA code adds
 our PCI BAR setup doesn't really work very well together with the Radeon
 DRM address setup. DRM will generate addresses, which are even outside
 of the connected LBA

 Things planned for a check:
 The drivers/video/aty uses
 an endian config bit DRM doesn't use, but I haven't tested whether
 this makes a difference and how it is connected to the overall picture.

 I don't think that will any difference.  radeon kms works fine on
 other big endian platforms such as powerpc.

 Good! I'll opt it out then.



 The Rage128 product revealed a weakness in some motherboard
 chipsets in that there is no mechanism to guarantee
 that data written by the CPU to memory is actually in a readable
 state before the Graphics Controller receives an
 update to its copy of the Write Pointer. In an effort to alleviate this
 problem, weve introduced a mechanism into the
 Graphics Controller that will delay the actual write to the Write Pointer
 for some programmable amount of time, in
 order to give the chipset time to flush its internal write buffers to
 memory.
 There are two register fields that control this mechanism:
 PRE_WRITE_TIMER and PRE_WRITE_LIMIT.

 In the radeon DRM codebase I didn't found anyone using/setting
 those registers. Maybe PA-RISC has some problem here?...

 I doubt it.  If you are using AGP, I'd suggest disabling it and first
 try to get things working using the on chip gart rather than AGP.
 Load radeon with agpmode=-1.

 Already tried this without any luck. Anyway, a radeon driver fallbacks
 to the PCI mode in our case, so does it really matter?

 In addition, people with PCI cards experiencing the same issue...

 The on chip gart always uses cache
 snooped pci transactions and the driver assumes pci is cache coherent.
 On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
 system ram.  On PCIE asics, the gart table is stored in vram.  The
 gart page table maps system pages to a contiguous aperture in the
 GPU's address space.  The ring lives in gart memory.  The GPU sees a
 contiguous buffer and the gart mechanism handles the access to the
 backing pages via the page table.  I'd suggest verifying that the
 entries written to the gart page table are valid and then the
 information written to the ring buffer is valid before updating the
 ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
 causes the CP to start fetching data from the ring.

 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(


The dummy page isn't really going to help much.  That page is just
used as a safety placeholder for gart entries that aren't mapped on
the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
the backing pages for the gart.  You may want to look there.

Alex

 --- radeon_device.c.orig2013-09-10 08:55:05.0 +
 +++ radeon_device.c 2013-09-10 09:12:17.0 +
 @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon
  {
 if (rdev-dummy_page.page)
 return 0;
 -   rdev-dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | 
 __GFP_ZERO);
 -   if (rdev-dummy_page.page == NULL)
 +   rdev-dummy_page.page = dma_alloc_coherent(rdev-pdev-dev, 
 PAGE_SIZE,
 +   rdev-dummy_page.addr, GFP_DMA32|GFP_KERNEL);
 +   if (!rdev-dummy_page.page)
 return -ENOMEM;
 -   rdev-dummy_page.addr = pci_map_page(rdev-pdev, 
 rdev-dummy_page.page,
 -   0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 if (pci_dma_mapping_error(rdev-pdev, rdev-dummy_page.addr)) {
 dev_err(rdev-pdev-dev, Failed to DMA MAP the dummy 
 page\n);
 -   __free_page(rdev-dummy_page.page);
 -   rdev-dummy_page.page = NULL;
 +   radeon_dummy_page_fini(rdev);

Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-10 Thread Hans Verkuil
On 09/10/2013 02:37 PM, Alex Deucher wrote:
 On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Alex,

 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а):

 On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Folks,

 We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
 native video options of the latest PA-RISC servers and workstations
 (these are ATIs, most of which are based on R100/R300/R420 chips) work
 correctly on this platform (big endian pa-risc).

 However, we hadn't much success. DRM fails every time with
 ring test failed for both AGP  PCI.

 Maybe you would give us some suggestions that we could check?

 Topic started here:
 http://www.spinics.net/lists/linux-parisc/msg04908.html
 And continued there:
 http://www.spinics.net/lists/linux-parisc/msg04995.html
 http://www.spinics.net/lists/linux-parisc/msg05006.html

 Problems we've already resolved without any signs of progress:
 - Checked the successful microcode load
 parisc AGP GART code writes IOMMU entries in the wrong byte order and
 doesn't add the coherency information SBA code adds
 our PCI BAR setup doesn't really work very well together with the Radeon
 DRM address setup. DRM will generate addresses, which are even outside
 of the connected LBA

 Things planned for a check:
 The drivers/video/aty uses
 an endian config bit DRM doesn't use, but I haven't tested whether
 this makes a difference and how it is connected to the overall picture.

 I don't think that will any difference.  radeon kms works fine on
 other big endian platforms such as powerpc.

 Good! I'll opt it out then.

Actually, I am experiencing exactly the same problem on a Sam460ex ppc
system, at least as of 3.9 (the last time I tried it).

Very rarely the ringtest would pass, but then it would fail somewhere else.
I never could figure it out since as far as I could tell all the addresses
and logic was correct. It wasn't important enough for me to work more on it,
but I'd be happy to test code. I'm travelling for the next week and a half,
so I can't do anything right now.

One bug I found when working on drm/kms support for the ppc was that in
struct ttm_bus_placement the base address type was wrong: it should be
phys_addr_t, not unsigned long. The PPC460 is in 32-bit mode but physical
addresses are 32 bits.

The patch below fixes that. I always wanted to post this fix, but never got
around to it...

Regards,

Hans

Signed-off-by: Hans Verkuil hans.verk...@cisco.com
---
 arch/powerpc/sysdev/ppc4xx_msi.c   |6 +++---
 drivers/gpu/drm/radeon/radeon_device.c |2 +-
 include/drm/ttm/ttm_bo_api.h   |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 49b0659..fa33568 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1066,7 +1066,7 @@ int radeon_device_init(struct radeon_device *rdev,
if (rdev-rmmio == NULL) {
return -ENOMEM;
}
-   DRM_INFO(register mmio base: 0x%08X\n, (uint32_t)rdev-rmmio_base);
+   DRM_INFO(register mmio base: 0x%llx\n, (uint64_t)rdev-rmmio_base);
DRM_INFO(register mmio size: %u\n, (unsigned)rdev-rmmio_size);
 
/* io port mapping */
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 3cb5d84..fcdb208 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -81,7 +81,7 @@ struct ttm_placement {
  */
 struct ttm_bus_placement {
void*addr;
-   unsigned long   base;
+   phys_addr_t base;
unsigned long   size;
unsigned long   offset;
boolis_iomem;
-- 
1.7.10.4


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-10 Thread Konrad Rzeszutek Wilk
On Tue, Sep 10, 2013 at 01:20:57PM +0400, Alex Ivanov wrote:
 Alex,
 
 09.09.2013, в 21:43, Alex Deucher alexdeuc...@gmail.com написал(а):
 
  On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
  Folks,
  
  We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
  native video options of the latest PA-RISC servers and workstations
  (these are ATIs, most of which are based on R100/R300/R420 chips) work
  correctly on this platform (big endian pa-risc).
  
  However, we hadn't much success. DRM fails every time with
  ring test failed for both AGP  PCI.
  
  Maybe you would give us some suggestions that we could check?
  
  Topic started here:
  http://www.spinics.net/lists/linux-parisc/msg04908.html
  And continued there:
  http://www.spinics.net/lists/linux-parisc/msg04995.html
  http://www.spinics.net/lists/linux-parisc/msg05006.html
  
  Problems we've already resolved without any signs of progress:
  - Checked the successful microcode load
  parisc AGP GART code writes IOMMU entries in the wrong byte order and
  doesn't add the coherency information SBA code adds
  our PCI BAR setup doesn't really work very well together with the Radeon
  DRM address setup. DRM will generate addresses, which are even outside
  of the connected LBA
  
  Things planned for a check:
  The drivers/video/aty uses
  an endian config bit DRM doesn't use, but I haven't tested whether
  this makes a difference and how it is connected to the overall picture.
  
  I don't think that will any difference.  radeon kms works fine on
  other big endian platforms such as powerpc.
 
 Good! I'll opt it out then.
 
  
  
  The Rage128 product revealed a weakness in some motherboard
  chipsets in that there is no mechanism to guarantee
  that data written by the CPU to memory is actually in a readable
  state before the Graphics Controller receives an
  update to its copy of the Write Pointer. In an effort to alleviate this
  problem, weve introduced a mechanism into the
  Graphics Controller that will delay the actual write to the Write Pointer
  for some programmable amount of time, in
  order to give the chipset time to flush its internal write buffers to
  memory.
  There are two register fields that control this mechanism:
  PRE_WRITE_TIMER and PRE_WRITE_LIMIT.
  
  In the radeon DRM codebase I didn't found anyone using/setting
  those registers. Maybe PA-RISC has some problem here?...
  
  I doubt it.  If you are using AGP, I'd suggest disabling it and first
  try to get things working using the on chip gart rather than AGP.
  Load radeon with agpmode=-1.  
 
 Already tried this without any luck. Anyway, a radeon driver fallbacks
 to the PCI mode in our case, so does it really matter?
 
 In addition, people with PCI cards experiencing the same issue...
 
  The on chip gart always uses cache
  snooped pci transactions and the driver assumes pci is cache coherent.
  On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
  system ram.  On PCIE asics, the gart table is stored in vram.  The
  gart page table maps system pages to a contiguous aperture in the
  GPU's address space.  The ring lives in gart memory.  The GPU sees a
  contiguous buffer and the gart mechanism handles the access to the
  backing pages via the page table.  I'd suggest verifying that the
  entries written to the gart page table are valid and then the
  information written to the ring buffer is valid before updating the
  ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
  causes the CP to start fetching data from the ring.
 
 Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
 dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

Is this platform enabling the SWIOTLB layer? The reason I am asking is
b/c if you do indeed enable it you end up using the TTM DMA pool
which allocates pages using the dma_alloc_coherent - which means that
all of the pages that come out of TTM are already 'DMA' mapped.

And that means the radeon_gart_bind and all its friends 
use the DMA addresses that have been constructed by SWIOTLB IOMMU.

Perhaps the PA-RISC IOMMU creates the DMA addresses differently?

When the card gets programmed, you do end up using ttm_agp_bind right?
I am wondering if something like this:

https://lkml.org/lkml/2010/12/6/512

is needed to pass in the right DMA address?

 
 --- radeon_device.c.orig  2013-09-10 08:55:05.0 +
 +++ radeon_device.c   2013-09-10 09:12:17.0 +
 @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon
  {
   if (rdev-dummy_page.page)
   return 0;
 - rdev-dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO);
 - if (rdev-dummy_page.page == NULL)
 + rdev-dummy_page.page = dma_alloc_coherent(rdev-pdev-dev, PAGE_SIZE,
 + rdev-dummy_page.addr, GFP_DMA32|GFP_KERNEL);
 + if (!rdev-dummy_page.page)
   return -ENOMEM;
 - 

Re: drm/radeon: ring test failed on PA-RISC Linux

2013-09-09 Thread Alex Deucher
On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
 Folks,

 We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
 native video options of the latest PA-RISC servers and workstations
 (these are ATIs, most of which are based on R100/R300/R420 chips) work
 correctly on this platform (big endian pa-risc).

 However, we hadn't much success. DRM fails every time with
 ring test failed for both AGP  PCI.

 Maybe you would give us some suggestions that we could check?

 Topic started here:
 http://www.spinics.net/lists/linux-parisc/msg04908.html
 And continued there:
 http://www.spinics.net/lists/linux-parisc/msg04995.html
 http://www.spinics.net/lists/linux-parisc/msg05006.html

 Problems we've already resolved without any signs of progress:
 - Checked the successful microcode load
 parisc AGP GART code writes IOMMU entries in the wrong byte order and
  doesn't add the coherency information SBA code adds
 our PCI BAR setup doesn't really work very well together with the Radeon
  DRM address setup. DRM will generate addresses, which are even outside
  of the connected LBA

 Things planned for a check:
 The drivers/video/aty uses
 an endian config bit DRM doesn't use, but I haven't tested whether
 this makes a difference and how it is connected to the overall picture.

I don't think that will any difference.  radeon kms works fine on
other big endian platforms such as powerpc.


 The Rage128 product revealed a weakness in some motherboard
 chipsets in that there is no mechanism to guarantee
 that data written by the CPU to memory is actually in a readable
 state before the Graphics Controller receives an
 update to its copy of the Write Pointer. In an effort to alleviate this
 problem, weve introduced a mechanism into the
 Graphics Controller that will delay the actual write to the Write Pointer
 for some programmable amount of time, in
 order to give the chipset time to flush its internal write buffers to
 memory.
 There are two register fields that control this mechanism:
 PRE_WRITE_TIMER and PRE_WRITE_LIMIT.

 In the radeon DRM codebase I didn't found anyone using/setting
 those registers. Maybe PA-RISC has some problem here?...

I doubt it.  If you are using AGP, I'd suggest disabling it and first
try to get things working using the on chip gart rather than AGP.
Load radeon with agpmode=-1.  The on chip gart always uses cache
snooped pci transactions and the driver assumes pci is cache coherent.
 On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
system ram.  On PCIE asics, the gart table is stored in vram.  The
gart page table maps system pages to a contiguous aperture in the
GPU's address space.  The ring lives in gart memory.  The GPU sees a
contiguous buffer and the gart mechanism handles the access to the
backing pages via the page table.  I'd suggest verifying that the
entries written to the gart page table are valid and then the
information written to the ring buffer is valid before updating the
ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
causes the CP to start fetching data from the ring.

Alex


 Thanks.

  Пересылаемое сообщение  
 04.08.2013, 15:06, Alex Ivanov gnido...@p0n4ik.tk:

 11.07.2013, 23:48, Helge Deller del...@gmx.de:

  adding linux parisc mailing list...:

  On 07/11/2013 09:46 PM, Helge Deller wrote:
   On 07/10/2013 11:29 PM, Alex Ivanov wrote:
   11.07.2013, 01:14, Matt Turner matts...@gmail.com:
   On Wed, Jul 10, 2013 at 1:19 PM, Alex Ivanov gnido...@p0n4ik.tk wrote:
Thank you so much! Your guess looks to be right. After applying of 
 your
patch there was no more KP and X just worked.
   Nice! Does DRI work?
   Not on my side. Plus i can't visually jump over 8bit depth, although Xorg
   states 24bit in it's log.
   As for DRI, i'm experiencing
   ring test failed (scratch(0x15E4)=0xCAFEDEAD) with a firegl x3.
   FWIW, I'm seeing the same failure on my FireGL X1:
   80:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI 
 Radeon R300 NG [FireGL X1] (rev 80)

   [drm] radeon: irq initialized.
   [drm] Loading R300 Microcode
   [drm] radeon: ring at 0x60001000
   [drm:r100_ring_test] *ERROR* radeon: ring test failed 
 (scratch(0x15E4)=0xCAFEDEAD)
   [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
   radeon :80:00.0: failed initializing CP (-22).
   radeon :80:00.0: Disabling GPU acceleration
   [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP.
   [drm] radeon: cp finalized
   [drm] radeon: cp finalized

 I still have no clue why this happens. Broken SBA IOMMU / DRM code? Missing 
 syncing primitives?
 Should we forward this to dri-devel mail list?
 --
 To unsubscribe from this list: send the line unsubscribe linux-parisc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Завершение пересылаемого сообщения