Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-28 Thread Michel Dänzer
On Fri, 2010-03-26 at 19:20 +0100, Michel Dänzer wrote: 
 On Thu, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 
  2010/3/25 Michel Dänzer mic...@daenzer.net:
   On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote:
   From: Dave Airlie airl...@redhat.com
  
   On constrained r100 systems compiz would fail to start due to a lack
   of memory, we can just fallback place the objects rather than completely
   failing it works a lot better.
  
   Signed-off-by: Dave Airlie airl...@redhat.com
  
   This change seems to trigger or at least greatly expedite GPU lockups on
   my PowerBook. With the change applied, my normal X session locked up the
   GPU after just a few minutes several times. Now with it reverted it's
   back to the previous stability.
  
  Care to try in pci mode? see if helps, it might be just straining AGP
  a bit more, maybe try 1x as well if you aren't already in it.
 
 After various more tests, the gist of it seems to be that rendering to
 GTT with the 3D engine tends to lock up whereas the 2D and DMA engines
 work fine (otherwise I probably would have run into this a long time ago
 at BO eviction time). Maybe some kind of synchronization issue resulting
 in the 3D engine trying to access GTT memory which isn't bound yet /
 anymore?

I think I've confirmed this theory by changing the AGP driver to bind
the AGP scratch page instead of really unbinding a GTT entry. That
prevents the lockups from occurring. Even so though, this change hurts
performance, presumably because BOs evicted from VRAM (or not going in
in the first place) aren't getting (back) in.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-26 Thread Michel Dänzer
On Thu, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 
 2010/3/25 Michel Dänzer mic...@daenzer.net:
  On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote:
  From: Dave Airlie airl...@redhat.com
 
  On constrained r100 systems compiz would fail to start due to a lack
  of memory, we can just fallback place the objects rather than completely
  failing it works a lot better.
 
  Signed-off-by: Dave Airlie airl...@redhat.com
 
  This change seems to trigger or at least greatly expedite GPU lockups on
  my PowerBook. With the change applied, my normal X session locked up the
  GPU after just a few minutes several times. Now with it reverted it's
  back to the previous stability.
 
 Care to try in pci mode? see if helps, it might be just straining AGP
 a bit more, maybe try 1x as well if you aren't already in it.

After various more tests, the gist of it seems to be that rendering to
GTT with the 3D engine tends to lock up whereas the 2D and DMA engines
work fine (otherwise I probably would have run into this a long time ago
at BO eviction time). Maybe some kind of synchronization issue resulting
in the 3D engine trying to access GTT memory which isn't bound yet /
anymore?


  I don't know why that is - maybe something doesn't properly deal with
  BOs getting placed differently in some cases now - but anyway I suspect
  the implications of this change haven't been fully thought through: The
  log message sounds as though the change was mainly written with
  radeon_bo_create() / radeon_bo_list_validate() in mind, but
  radeon_ttm_placement_from_domain() is also called from other places:
 
   * radeon_bo_pin(): The change could lead to a BO being pinned to
 GTT instead of VRAM, which would probably be bad.
   * radeon_evict_flags(): The change might have undesirable
 consequences here as well, not sure.
 
 The first might be bad, [...]

I actually saw this happen once for the scanout BO of a second X server,
and it resulted in pretty spectacular cascaded failure. This confirmed
my suspicion that it's a very bad idea for radeon_bo_pin().


In summary I'm afraid this isn't really a good solution for what you
were trying to achieve, and it's actually breaking things that were
working before, so I think it should be reverted.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-25 Thread Michel Dänzer
On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote: 
 From: Dave Airlie airl...@redhat.com
 
 On constrained r100 systems compiz would fail to start due to a lack
 of memory, we can just fallback place the objects rather than completely
 failing it works a lot better.
 
 Signed-off-by: Dave Airlie airl...@redhat.com

This change seems to trigger or at least greatly expedite GPU lockups on
my PowerBook. With the change applied, my normal X session locked up the
GPU after just a few minutes several times. Now with it reverted it's
back to the previous stability.

I don't know why that is - maybe something doesn't properly deal with
BOs getting placed differently in some cases now - but anyway I suspect
the implications of this change haven't been fully thought through: The
log message sounds as though the change was mainly written with
radeon_bo_create() / radeon_bo_list_validate() in mind, but
radeon_ttm_placement_from_domain() is also called from other places:

  * radeon_bo_pin(): The change could lead to a BO being pinned to
GTT instead of VRAM, which would probably be bad. 
  * radeon_evict_flags(): The change might have undesirable
consequences here as well, not sure.

I don't think the way we currently use the same arrays for normal and
busy placement makes a lot of sense, but we probably need a better
mechanism to specify which placements are desirable / acceptable in each
case.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer






--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-25 Thread Dave Airlie
2010/3/25 Michel Dänzer mic...@daenzer.net:
 On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote:
 From: Dave Airlie airl...@redhat.com

 On constrained r100 systems compiz would fail to start due to a lack
 of memory, we can just fallback place the objects rather than completely
 failing it works a lot better.

 Signed-off-by: Dave Airlie airl...@redhat.com

 This change seems to trigger or at least greatly expedite GPU lockups on
 my PowerBook. With the change applied, my normal X session locked up the
 GPU after just a few minutes several times. Now with it reverted it's
 back to the previous stability.

Care to try in pci mode? see if helps, it might be just straining AGP
a bit more,
maybe try 1x as well if you aren't already in it.


 I don't know why that is - maybe something doesn't properly deal with
 BOs getting placed differently in some cases now - but anyway I suspect
 the implications of this change haven't been fully thought through: The
 log message sounds as though the change was mainly written with
 radeon_bo_create() / radeon_bo_list_validate() in mind, but
 radeon_ttm_placement_from_domain() is also called from other places:

      * radeon_bo_pin(): The change could lead to a BO being pinned to
        GTT instead of VRAM, which would probably be bad.
      * radeon_evict_flags(): The change might have undesirable
        consequences here as well, not sure.

The first might be bad, but the second should be okay, I'll take a closer look
in the morning.

 I don't think the way we currently use the same arrays for normal and
 busy placement makes a lot of sense, but we probably need a better
 mechanism to specify which placements are desirable / acceptable in each
 case.

Well its not really a huge matrix of choices, I'm guessing we need more
info from userspace, or use the info better. Though in theory everything
should work in GTT or VRAM equally well just slower.

Dave.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-25 Thread Michel Dänzer
On Don, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 
 2010/3/25 Michel Dänzer mic...@daenzer.net:
  On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote:
  From: Dave Airlie airl...@redhat.com
 
  On constrained r100 systems compiz would fail to start due to a lack
  of memory, we can just fallback place the objects rather than completely
  failing it works a lot better.
 
  Signed-off-by: Dave Airlie airl...@redhat.com
 
  This change seems to trigger or at least greatly expedite GPU lockups on
  my PowerBook. With the change applied, my normal X session locked up the
  GPU after just a few minutes several times. Now with it reverted it's
  back to the previous stability.
 
 Care to try in pci mode? see if helps, it might be just straining AGP
 a bit more,

Ugh, k I'll try... but that incurs such a huge performance hit that the
result might not be very meaningful I'm afraid.

 maybe try 1x as well if you aren't already in it.

I am, for some reason neither 2x nor 4x has ever worked reliably with
KMS although 4x was rock solid with UMS.


  I don't know why that is - maybe something doesn't properly deal with
  BOs getting placed differently in some cases now - but anyway I suspect
  the implications of this change haven't been fully thought through: The
  log message sounds as though the change was mainly written with
  radeon_bo_create() / radeon_bo_list_validate() in mind, but
  radeon_ttm_placement_from_domain() is also called from other places:
 
   * radeon_bo_pin(): The change could lead to a BO being pinned to
 GTT instead of VRAM, which would probably be bad.
   * radeon_evict_flags(): The change might have undesirable
 consequences here as well, not sure.
 
 The first might be bad, but the second should be okay, I'll take a closer look
 in the morning.

What about that there are now usually no busy placements specified,
couldn't that be a problem?


  I don't think the way we currently use the same arrays for normal and
  busy placement makes a lot of sense, but we probably need a better
  mechanism to specify which placements are desirable / acceptable in each
  case.
 
 Well its not really a huge matrix of choices, I'm guessing we need more
 info from userspace, or use the info better. Though in theory everything
 should work in GTT or VRAM equally well just slower.

E.g. scanout from GTT could easily exceed the available bandwidth.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-25 Thread Michel Dänzer
On Thu, 2010-03-25 at 11:11 +0100, Michel Dänzer wrote: 
 On Don, 2010-03-25 at 19:56 +1000, Dave Airlie wrote: 
  2010/3/25 Michel Dänzer mic...@daenzer.net:
   On Fri, 2010-03-19 at 10:35 +1000, Dave Airlie wrote:
   From: Dave Airlie airl...@redhat.com
  
   On constrained r100 systems compiz would fail to start due to a lack
   of memory, we can just fallback place the objects rather than completely
   failing it works a lot better.
  
   Signed-off-by: Dave Airlie airl...@redhat.com
  
   This change seems to trigger or at least greatly expedite GPU lockups on
   my PowerBook. With the change applied, my normal X session locked up the
   GPU after just a few minutes several times. Now with it reverted it's
   back to the previous stability.
  
  Care to try in pci mode? see if helps, it might be just straining AGP
  a bit more,
 
 Ugh, k I'll try... but that incurs such a huge performance hit that the
 result might not be very meaningful I'm afraid.

It didn't lock up in a couple of hours of suffering through PCI, so
maybe it is an AGP problem, or maybe PCI is just too slow to trigger
it... More likely the former though I guess.


   I don't know why that is - maybe something doesn't properly deal with
   BOs getting placed differently in some cases now - but anyway I suspect
   the implications of this change haven't been fully thought through: The
   log message sounds as though the change was mainly written with
   radeon_bo_create() / radeon_bo_list_validate() in mind, but
   radeon_ttm_placement_from_domain() is also called from other places:
  
* radeon_bo_pin(): The change could lead to a BO being pinned to
  GTT instead of VRAM, which would probably be bad.
* radeon_evict_flags(): The change might have undesirable
  consequences here as well, not sure.
  
  The first might be bad, but the second should be okay, I'll take a closer 
  look
  in the morning.
 
 What about that there are now usually no busy placements specified,
 couldn't that be a problem?

FWIW I tried re-using the normal placements for missing busy placements,
didn't help.


-- 
Earthling Michel Dänzer   |http://www.vmware.com
Libre software enthusiast |  Debian, X and DRI developer

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH 1/2] drm/radeon/bo: add some fallback placements for VRAM only objects.

2010-03-18 Thread Dave Airlie
From: Dave Airlie airl...@redhat.com

On constrained r100 systems compiz would fail to start due to a lack
of memory, we can just fallback place the objects rather than completely
failing it works a lot better.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 drivers/gpu/drm/radeon/radeon.h|2 ++
 drivers/gpu/drm/radeon/radeon_object.c |   10 +++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 72ed251..69585bb 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -238,7 +238,9 @@ struct radeon_bo {
struct list_headlist;
/* Protected by tbo.reserved */
u32 placements[3];
+   u32 busy_placements[3];
struct ttm_placementplacement;
+   struct ttm_placementbusy_placement;
struct ttm_buffer_objecttbo;
struct ttm_bo_kmap_obj  kmap;
unsignedpin_count;
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index fc9d00a..3580c75 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -65,15 +65,19 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object 
*bo)
 
 void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 {
-   u32 c = 0;
+   u32 c = 0, b = 0;
 
rbo-placement.fpfn = 0;
rbo-placement.lpfn = 0;
rbo-placement.placement = rbo-placements;
-   rbo-placement.busy_placement = rbo-placements;
+   rbo-placement.busy_placement = rbo-busy_placements;
if (domain  RADEON_GEM_DOMAIN_VRAM)
rbo-placements[c++] = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED |
TTM_PL_FLAG_VRAM;
+   /* add busy placement to TTM if VRAM is only option */
+   if (domain == RADEON_GEM_DOMAIN_VRAM) {
+   rbo-busy_placements[b++] = TTM_PL_MASK_CACHING | 
TTM_PL_FLAG_TT;
+   }
if (domain  RADEON_GEM_DOMAIN_GTT)
rbo-placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_TT;
if (domain  RADEON_GEM_DOMAIN_CPU)
@@ -81,7 +85,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, 
u32 domain)
if (!c)
rbo-placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM;
rbo-placement.num_placement = c;
-   rbo-placement.num_busy_placement = c;
+   rbo-placement.num_busy_placement = b;
 }
 
 int radeon_bo_create(struct radeon_device *rdev, struct drm_gem_object *gobj,
-- 
1.6.6.1


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel