date:20170518

Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full

2017-05-18 Thread zhoucm1




On 2017年05月19日 11:04, John Brooks wrote:

Set GTT as the busy placement for newly created BOs that have the
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
established BOs to be evicted from visible VRAM.
You this patch is consistent with my opinion I mentioned in Marek thread 
(Plan: BO move throttling for visible VRAM evictions).
"I agree on this opinion, from our performance tuning experiment, this 
case indeed often happen especially under vram memory pressure. 
redirecting to GTT is better than heavy eviction between VRAM and GTT.
But we should get a condition for redirecting (eviction counter?), 
otherwise, BO have no change back to prefer domain."
You're using busy placement to solve it, then BO is trying its prefer 
domain first, so it has chance back to prefer domain when memory is free.
Although it cannot prevent evil program occupying memory, it still looks 
reasonable, since we can gain more from heavy eviction under vram 
pressure in most cases.


Above all, I vote 'Yes' to your idea, but I also want to listen what 
others' voice is.


Thanks,
David Zhou


Signed-off-by: John Brooks 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 365883d..655718a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
  #endif
  
  	amdgpu_fill_placement_to_bo(bo, placement);

+
+   /* This is a new BO that wants to be CPU-visible; set GTT as the busy
+* placement so that it does not evict established BOs from visible 
VRAM.
+*/
+   if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
+   flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+   bo->placement.num_placement = 1;
+   bo->placement.num_busy_placement = 1;
+   bo->placement.busy_placement = >placement.placement[1];
+   }
+
/* Kernel allocation are uninterruptible */
  
  	initial_bytes_moved = atomic64_read(>num_bytes_moved);

@@ -484,6 +495,13 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
memset(, 0,
   (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
  
+

+   /* New CPU-visible BOs will have GTT set as their busy placement */
+   if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
+   flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+   domain |= AMDGPU_GEM_DOMAIN_GTT;
+   }
+
amdgpu_ttm_placement_init(adev, ,
  placements, domain, flags);
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ocl test performance drop

2017-05-18 Thread zhoucm1

I also realized this problem when I did fix for sched fence, I thought 
if no issue, just left it as current code, it's more safe.
As Flora point it out, it results in performance drop, then we need to 
correct it back.
pipeline_sync in vm flush should be inserted only when 
amdgpu_vm_ring_has_compute_vm_bug() is true, not all time for every vm 
flush for all engines. The bug has been fixed in mec firmware from 
version 673 in gfx8.
About gfx9 for amdgpu_vm_ring_has_compute_vm_bug(), current 
implementation is thinking gfx9 doesn't have compute vm bug by default. 
Not sure it's in there. If no vm fault related to this bug, then we can 
think gfx9 has no compute vm bug, I believe firmware team does that 
based on previous generation, otherwise we need identify every 
generation, that's crazy.


Regards,
David Zhou
On 2017年05月19日 11:36, Flora Cui wrote:

btw, what's about gfx9 for amdgpu_vm_ring_has_compute_vm_bug()? Is the
workaround still needed?

On Fri, May 19, 2017 at 10:25:19AM +0800, Flora Cui wrote:

On Thu, May 18, 2017 at 01:38:15PM +0200, Christian König wrote:

Am 18.05.2017 um 09:45 schrieb Flora Cui:

partial revert commit <6971d3d> - drm/amdgpu: cleanup logic in
amdgpu_vm_flush

Change-Id: Iadce9d613dfe9a739643a74050cea55854832adb
Signed-off-by: Flora Cui 

I don't see how the revert should be faster than the original.

Especially that amdgpu_vm_had_gpu_reset() is now called twice sounds like
more overhead than necessary.

Please explain further.

Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 88420dc..a96bad6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -743,23 +743,19 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job)
id->gws_size != job->gws_size ||
id->oa_base != job->oa_base ||
id->oa_size != job->oa_size);
-   bool vm_flush_needed = job->vm_needs_flush ||
-   amdgpu_vm_ring_has_compute_vm_bug(ring);
unsigned patch_offset = 0;
int r;
-   if (amdgpu_vm_had_gpu_reset(adev, id)) {
-   gds_switch_needed = true;
-   vm_flush_needed = true;
-   }
-
-   if (!vm_flush_needed && !gds_switch_needed)
+   if (!job->vm_needs_flush && !gds_switch_needed &&
+   !amdgpu_vm_had_gpu_reset(adev, id) &&
+   !amdgpu_vm_ring_has_compute_vm_bug(ring))
return 0;
if (ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
-   if (ring->funcs->emit_vm_flush && vm_flush_needed) {

[flora]: for compute ring & amdgpu_vm_ring_has_compute_vm_bug(), a vm_flush is
inserted. This might cause performance drop.

+   if (ring->funcs->emit_vm_flush &&
+   (job->vm_needs_flush || amdgpu_vm_had_gpu_reset(adev, id))) {
struct fence *fence;
trace_amdgpu_vm_flush(ring, job->vm_id, job->vm_pd_addr);



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ocl test performance drop

2017-05-18 Thread Flora Cui

btw, what's about gfx9 for amdgpu_vm_ring_has_compute_vm_bug()? Is the
workaround still needed?

On Fri, May 19, 2017 at 10:25:19AM +0800, Flora Cui wrote:
> On Thu, May 18, 2017 at 01:38:15PM +0200, Christian König wrote:
> > Am 18.05.2017 um 09:45 schrieb Flora Cui:
> > >partial revert commit <6971d3d> - drm/amdgpu: cleanup logic in
> > >amdgpu_vm_flush
> > >
> > >Change-Id: Iadce9d613dfe9a739643a74050cea55854832adb
> > >Signed-off-by: Flora Cui 
> > 
> > I don't see how the revert should be faster than the original.
> > 
> > Especially that amdgpu_vm_had_gpu_reset() is now called twice sounds like
> > more overhead than necessary.
> > 
> > Please explain further.
> > 
> > Christian.
> > 
> > >---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +-
> > >  1 file changed, 5 insertions(+), 9 deletions(-)
> > >
> > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> > >b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > >index 88420dc..a96bad6 100644
> > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > >@@ -743,23 +743,19 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
> > >amdgpu_job *job)
> > >   id->gws_size != job->gws_size ||
> > >   id->oa_base != job->oa_base ||
> > >   id->oa_size != job->oa_size);
> > >-  bool vm_flush_needed = job->vm_needs_flush ||
> > >-  amdgpu_vm_ring_has_compute_vm_bug(ring);
> > >   unsigned patch_offset = 0;
> > >   int r;
> > >-  if (amdgpu_vm_had_gpu_reset(adev, id)) {
> > >-  gds_switch_needed = true;
> > >-  vm_flush_needed = true;
> > >-  }
> > >-
> > >-  if (!vm_flush_needed && !gds_switch_needed)
> > >+  if (!job->vm_needs_flush && !gds_switch_needed &&
> > >+  !amdgpu_vm_had_gpu_reset(adev, id) &&
> > >+  !amdgpu_vm_ring_has_compute_vm_bug(ring))
> > >   return 0;
> > >   if (ring->funcs->init_cond_exec)
> > >   patch_offset = amdgpu_ring_init_cond_exec(ring);
> > >-  if (ring->funcs->emit_vm_flush && vm_flush_needed) {
> [flora]: for compute ring & amdgpu_vm_ring_has_compute_vm_bug(), a vm_flush is
> inserted. This might cause performance drop.
> > >+  if (ring->funcs->emit_vm_flush &&
> > >+  (job->vm_needs_flush || amdgpu_vm_had_gpu_reset(adev, id))) {
> > >   struct fence *fence;
> > >   trace_amdgpu_vm_flush(ring, job->vm_id, job->vm_pd_addr);
> > 
> > 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Topics for beginners

2017-05-18 Thread Subrat Meher

Hi,

I am interested in working on radeon's Linux graphics stack. Are there any
areas that are more approachable to someone  unfamiliar with the stack?

As for existing knowledge, I have worked with OpenGL, OpenCL and embedded
applications.

Apologies if this is not the right place for this discussion.

Thanks,
Subrat
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full

2017-05-18 Thread John Brooks

Set GTT as the busy placement for newly created BOs that have the
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
established BOs to be evicted from visible VRAM.

Signed-off-by: John Brooks 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 365883d..655718a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
 #endif
 
amdgpu_fill_placement_to_bo(bo, placement);
+
+   /* This is a new BO that wants to be CPU-visible; set GTT as the busy
+* placement so that it does not evict established BOs from visible 
VRAM.
+*/
+   if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
+   flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+   bo->placement.num_placement = 1;
+   bo->placement.num_busy_placement = 1;
+   bo->placement.busy_placement = >placement.placement[1];
+   }
+
/* Kernel allocation are uninterruptible */
 
initial_bytes_moved = atomic64_read(>num_bytes_moved);
@@ -484,6 +495,13 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
memset(, 0,
   (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
 
+
+   /* New CPU-visible BOs will have GTT set as their busy placement */
+   if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
+   flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+   domain |= AMDGPU_GEM_DOMAIN_GTT;
+   }
+
amdgpu_ttm_placement_init(adev, ,
  placements, domain, flags);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-18 Thread John Brooks

I'm glad this is being worked on. However, somewhat to my surprise, this patch
series didn't help Dying Light's BO eviction problem. For those who don't know,
that game performs very badly in certain areas, and it is correlated with
increased TTM eviction rates. Relevant screenshots of gallium HUD and sysprof:

http://www.fastquake.com/images/screen-dlgalliumhud1-20170513-171241.png
http://www.fastquake.com/images/screen-dlsysprof-20170515-225919.png

I noticed last week that adding RADEON_DOMAIN_GTT to the domains in radeonsi
(patch: http://www.fastquake.com/files/text/radeon-gtt.txt ) greatly improved
performance in these areas, to the tune of about a 30fps increase. Obviously,
putting GTT in every buffer's domain is not a proper solution. But it lead me
to believe that perhaps the problem wasn't just the swapping of resident BOs,
but the creation of new ones that only have VRAM in their domain, and they
cause existing BOs to be evicted from visible VRAM unconditionally.

The attached patch assigns GTT as the busy placement for newly created BOs that
have the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they will go to
GTT if visible VRAM is full, instead of evicting established BOs. Since there
is no way to know what the usage patterns of a new BO will be, we shouldn't
evict established BOs (for which we have hypothetically had the opportunity to
gather usage data) from visible VRAM for new, unknown BOs.

With this patch I get hugely improved performance in Dying Light just like with
the Mesa patch: I observed 30-40fps where I got 14 before, and 60fps where I
got 40 before. TTM evictions and bytes moved have dropped to zero where they
were exceedingly high before. Buffer evictions no longer dominate the prof
trace. Screenshots:

http://www.fastquake.com/images/screen-dl-gtt_busy_only-20170518-192602.png
http://www.fastquake.com/images/screen-dlsysprof-gttpatch-20170518-223200.png

--
John Brooks (Frogging101)

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ocl test performance drop

2017-05-18 Thread Flora Cui

On Thu, May 18, 2017 at 01:38:15PM +0200, Christian König wrote:
> Am 18.05.2017 um 09:45 schrieb Flora Cui:
> >partial revert commit <6971d3d> - drm/amdgpu: cleanup logic in
> >amdgpu_vm_flush
> >
> >Change-Id: Iadce9d613dfe9a739643a74050cea55854832adb
> >Signed-off-by: Flora Cui 
> 
> I don't see how the revert should be faster than the original.
> 
> Especially that amdgpu_vm_had_gpu_reset() is now called twice sounds like
> more overhead than necessary.
> 
> Please explain further.
> 
> Christian.
> 
> >---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +-
> >  1 file changed, 5 insertions(+), 9 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> >b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >index 88420dc..a96bad6 100644
> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> >@@ -743,23 +743,19 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
> >amdgpu_job *job)
> > id->gws_size != job->gws_size ||
> > id->oa_base != job->oa_base ||
> > id->oa_size != job->oa_size);
> >-bool vm_flush_needed = job->vm_needs_flush ||
> >-amdgpu_vm_ring_has_compute_vm_bug(ring);
> > unsigned patch_offset = 0;
> > int r;
> >-if (amdgpu_vm_had_gpu_reset(adev, id)) {
> >-gds_switch_needed = true;
> >-vm_flush_needed = true;
> >-}
> >-
> >-if (!vm_flush_needed && !gds_switch_needed)
> >+if (!job->vm_needs_flush && !gds_switch_needed &&
> >+!amdgpu_vm_had_gpu_reset(adev, id) &&
> >+!amdgpu_vm_ring_has_compute_vm_bug(ring))
> > return 0;
> > if (ring->funcs->init_cond_exec)
> > patch_offset = amdgpu_ring_init_cond_exec(ring);
> >-if (ring->funcs->emit_vm_flush && vm_flush_needed) {
[flora]: for compute ring & amdgpu_vm_ring_has_compute_vm_bug(), a vm_flush is
inserted. This might cause performance drop.
> >+if (ring->funcs->emit_vm_flush &&
> >+(job->vm_needs_flush || amdgpu_vm_had_gpu_reset(adev, id))) {
> > struct fence *fence;
> > trace_amdgpu_vm_flush(ring, job->vm_id, job->vm_pd_addr);
> 
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

2017-05-18 Thread Michel Dänzer

On 19/05/17 10:43 AM, John Brooks wrote:
> On Fri, May 19, 2017 at 10:20:37AM +0900, Michel Dänzer wrote:
>> On 19/05/17 12:43 AM, John Brooks wrote:
>>> On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
 From: Michel Dänzer 

 In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
 flag set to CPU visible VRAM with more force.

 For other BOs, this gives another chance to stay in VRAM if they
 happened to lie in the CPU visible part and another BO needs to go
 there.

 This should allow BOs to stay in VRAM longer in some cases.

 Signed-off-by: Michel Dänzer 
>>
>> [...]
>>
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
 index 57789b860768..d5ed85026542 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
 @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct 
 ttm_buffer_object *bo,
adev->mman.buffer_funcs_ring &&
adev->mman.buffer_funcs_ring->ready == false) {
amdgpu_ttm_placement_from_domain(abo, 
 AMDGPU_GEM_DOMAIN_CPU);
 +  } else if (adev->mc.visible_vram_size < 
 adev->mc.real_vram_size) {
 +  unsigned fpfn = adev->mc.visible_vram_size >> 
 PAGE_SHIFT;
 +  struct drm_mm_node *node = bo->mem.mm_node;
 +  unsigned long pages_left;
 +
 +  for (pages_left = bo->mem.num_pages;
 +   pages_left;
 +   pages_left -= node->size, node++) {
 +  if (node->start < fpfn)
 +  break;
 +  }
 +
 +  if (!pages_left)
 +  goto gtt;
 +
 +  /* Try evicting to the CPU inaccessible part of VRAM
 +   * first, but only set GTT as busy placement, so this
 +   * BO will be evicted to GTT rather than causing other
 +   * BOs to be evicted from VRAM
 +   */
 +  amdgpu_ttm_placement_from_domain(abo, 
 AMDGPU_GEM_DOMAIN_VRAM |
 +   AMDGPU_GEM_DOMAIN_GTT);
 +  abo->placements[0].fpfn = fpfn;
 +  abo->placements[0].lpfn = 0;
 +  abo->placement.busy_placement = >placements[1];
>>>
>>> Are you sure you want to hardcode the placements index? It'll be dependent 
>>> on
>>> the order set up in amdgpu_ttm_placement_init.
>>
>> Yes, see patch 1. Looping over the placements and testing their contents
>> is silly when we know exactly how they were set up. 
> 
> Agreed
> 
>> Or do you mean this code shouldn't call amdgpu_ttm_placement_from_domain at
>> all and just set up the placements itself?
> 
> Calling amdgpu_ttm_placement_from_domain makes sense. I was just imagining a
> scenario where code like this that makes assumptions about the ordering of
> placements in the array would break silently if that order were changed, and
> you'd have to go about finding the places where integer literals were used to
> address specific placements.

Right, if we make changes to amdgpu_ttm_placement_init, we'll need to
audit and possibly update its callers. I think that's reasonable, though
others might disagree. :)

Thanks for your feedback.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

2017-05-18 Thread John Brooks

On Fri, May 19, 2017 at 10:20:37AM +0900, Michel Dänzer wrote:
> On 19/05/17 12:43 AM, John Brooks wrote:
> > On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
> >> From: Michel Dänzer 
> >>
> >> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
> >> flag set to CPU visible VRAM with more force.
> >>
> >> For other BOs, this gives another chance to stay in VRAM if they
> >> happened to lie in the CPU visible part and another BO needs to go
> >> there.
> >>
> >> This should allow BOs to stay in VRAM longer in some cases.
> >>
> >> Signed-off-by: Michel Dänzer 
> 
> [...]
> 
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> index 57789b860768..d5ed85026542 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct 
> >> ttm_buffer_object *bo,
> >>adev->mman.buffer_funcs_ring &&
> >>adev->mman.buffer_funcs_ring->ready == false) {
> >>amdgpu_ttm_placement_from_domain(abo, 
> >> AMDGPU_GEM_DOMAIN_CPU);
> >> +  } else if (adev->mc.visible_vram_size < 
> >> adev->mc.real_vram_size) {
> >> +  unsigned fpfn = adev->mc.visible_vram_size >> 
> >> PAGE_SHIFT;
> >> +  struct drm_mm_node *node = bo->mem.mm_node;
> >> +  unsigned long pages_left;
> >> +
> >> +  for (pages_left = bo->mem.num_pages;
> >> +   pages_left;
> >> +   pages_left -= node->size, node++) {
> >> +  if (node->start < fpfn)
> >> +  break;
> >> +  }
> >> +
> >> +  if (!pages_left)
> >> +  goto gtt;
> >> +
> >> +  /* Try evicting to the CPU inaccessible part of VRAM
> >> +   * first, but only set GTT as busy placement, so this
> >> +   * BO will be evicted to GTT rather than causing other
> >> +   * BOs to be evicted from VRAM
> >> +   */
> >> +  amdgpu_ttm_placement_from_domain(abo, 
> >> AMDGPU_GEM_DOMAIN_VRAM |
> >> +   AMDGPU_GEM_DOMAIN_GTT);
> >> +  abo->placements[0].fpfn = fpfn;
> >> +  abo->placements[0].lpfn = 0;
> >> +  abo->placement.busy_placement = >placements[1];
> > 
> > Are you sure you want to hardcode the placements index? It'll be dependent 
> > on
> > the order set up in amdgpu_ttm_placement_init.
> 
> Yes, see patch 1. Looping over the placements and testing their contents
> is silly when we know exactly how they were set up. 

Agreed

> Or do you mean this code shouldn't call amdgpu_ttm_placement_from_domain at
> all and just set up the placements itself?

Calling amdgpu_ttm_placement_from_domain makes sense. I was just imagining a
scenario where code like this that makes assumptions about the ordering of
placements in the array would break silently if that order were changed, and
you'd have to go about finding the places where integer literals were used to
address specific placements.

> 
> -- 
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer

--
John Brooks
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] iommu/amd: flush IOTLB for specific domains only

2017-05-18 Thread Michel Dänzer

On 07/04/17 07:20 PM, Joerg Roedel wrote:
> On Mon, Mar 27, 2017 at 11:47:07AM +0530, arindam.n...@amd.com wrote:
>> From: Arindam Nath 
>>
>> The idea behind flush queues is to defer the IOTLB flushing
>> for domains for which the mappings are no longer valid. We
>> add such domains in queue_add(), and when the queue size
>> reaches FLUSH_QUEUE_SIZE, we perform __queue_flush().
>>
>> Since we have already taken lock before __queue_flush()
>> is called, we need to make sure the IOTLB flushing is
>> performed as quickly as possible.
>>
>> In the current implementation, we perform IOTLB flushing
>> for all domains irrespective of which ones were actually
>> added in the flush queue initially. This can be quite
>> expensive especially for domains for which unmapping is
>> not required at this point of time.
>>
>> This patch makes use of domain information in
>> 'struct flush_queue_entry' to make sure we only flush
>> IOTLBs for domains who need it, skipping others.
>>
>> Signed-off-by: Arindam Nath 
>> ---
>>  drivers/iommu/amd_iommu.c | 15 ---
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 98940d1..6a9a048 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -2227,15 +2227,16 @@ static struct iommu_group 
>> *amd_iommu_device_group(struct device *dev)
>>  
>>  static void __queue_flush(struct flush_queue *queue)
>>  {
>> -struct protection_domain *domain;
>> -unsigned long flags;
>>  int idx;
>>  
>> -/* First flush TLB of all known domains */
>> -spin_lock_irqsave(_iommu_pd_lock, flags);
>> -list_for_each_entry(domain, _iommu_pd_list, list)
>> -domain_flush_tlb(domain);
>> -spin_unlock_irqrestore(_iommu_pd_lock, flags);
>> +/* First flush TLB of all domains which were added to flush queue */
>> +for (idx = 0; idx < queue->next; ++idx) {
>> +struct flush_queue_entry *entry;
>> +
>> +entry = queue->entries + idx;
>> +
>> +domain_flush_tlb(>dma_dom->domain);
>> +}
> 
> With this we will flush a domain every time we find one of its
> iova-addresses in the flush queue, so potentially we flush a domain
> multiple times per __queue_flush() call.
> 
> Its better to either add a flush-flag to the domains and evaluate that
> in __queue_flush or keep a list of domains to flush to make the flushing
> really more efficient.

Arindam, can you incorporate Joerg's feedback?

FWIW, looks like Carrizo systems are affected by this as well (see e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=101029#c21), so it would be
good to land this fix in some form ASAP.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd: include instead of "linux/delay.h"

2017-05-18 Thread Masahiro Yamada

Use <...> notation to include headers located in include/linux.
While we are here, tweak the includes order a bit to sort them
alphabetically.

Signed-off-by: Masahiro Yamada 
---

 drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c| 4 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c  | 2 +-
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c   | 4 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 5 +++--
 drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c   | 8 +---
 drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c  | 5 +++--
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index ff4ae3d..963a9e0 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -22,10 +22,10 @@
  */
 
 #include "pp_debug.h"
-#include "linux/delay.h"
-#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include "cgs_common.h"
 #include "power_state.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
index f5e8fda..f6b4dd9 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
@@ -21,8 +21,8 @@
  *
  */
 
+#include 
 #include 
-#include "linux/delay.h"
 #include "hwmgr.h"
 #include "amd_acpi.h"
 #include "pp_acpi.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
index 8f663ab..581374d 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
@@ -21,11 +21,11 @@
  *
  */
 #include "pp_debug.h"
+#include 
+#include 
 #include 
 #include 
-#include 
 #include 
-#include "linux/delay.h"
 #include "pp_acpi.h"
 #include "ppatomctrl.h"
 #include "atombios.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 8394955..f4ab81b 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -20,10 +20,11 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
+
+#include 
+#include 
 #include 
 #include 
-#include 
-#include "linux/delay.h"
 
 #include "hwmgr.h"
 #include "amd_powerplay.h"
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c
index 1f6744a..39c7091 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c
@@ -20,11 +20,13 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-#include 
+
+#include 
+#include 
 #include 
 #include 
-#include 
-#include "linux/delay.h"
+#include 
+
 #include "cgs_common.h"
 #include "smu/smu_8_0_d.h"
 #include "smu/smu_8_0_sh_mask.h"
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c
index c0d7576..2e954a4 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c
@@ -20,15 +20,16 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
-#include 
+
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "pp_instance.h"
 #include "smumgr.h"
 #include "cgs_common.h"
-#include "linux/delay.h"
 
 MODULE_FIRMWARE("amdgpu/topaz_smc.bin");
 MODULE_FIRMWARE("amdgpu/topaz_k_smc.bin");
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/powerplay: ensure loop does not wraparound on decrement

2017-05-18 Thread walter harms



Am 17.05.2017 20:13, schrieb Colin King:
> From: Colin Ian King 
> 
> The current for loop decrements i when it is zero and this causes
> a wrap-around back to ~0 because i is unsigned. In the unlikely event
> that mask is 0, the loop will run forever. Fix this so we can't loop
> forever.
> 
> Detected by CoverityScan, CID#1435469 ("Unsigned compared against 0")
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> index ad30f5d3a10d..d92c9b9b15be 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> @@ -4199,7 +4199,7 @@ static int vega10_force_clock_level(struct pp_hwmgr 
> *hwmgr,
>   }
>   data->smc_state_table.gfx_boot_level = i;
>  
> - for (i = 31; i >= 0; i--) {
> + for (i = 32; --i; ) {
>   if (mask & (1 << i))
>   break;
>   }

nitpicking:
we notices at several points that programmers are bad at counting backwards.
Is there a reason not to start with i=0 ?

re,
 wh
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

2017-05-18 Thread Michel Dänzer

On 19/05/17 12:43 AM, John Brooks wrote:
> On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
>> From: Michel Dänzer 
>>
>> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
>> flag set to CPU visible VRAM with more force.
>>
>> For other BOs, this gives another chance to stay in VRAM if they
>> happened to lie in the CPU visible part and another BO needs to go
>> there.
>>
>> This should allow BOs to stay in VRAM longer in some cases.
>>
>> Signed-off-by: Michel Dänzer 

[...]

>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 57789b860768..d5ed85026542 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
>> *bo,
>>  adev->mman.buffer_funcs_ring &&
>>  adev->mman.buffer_funcs_ring->ready == false) {
>>  amdgpu_ttm_placement_from_domain(abo, 
>> AMDGPU_GEM_DOMAIN_CPU);
>> +} else if (adev->mc.visible_vram_size < 
>> adev->mc.real_vram_size) {
>> +unsigned fpfn = adev->mc.visible_vram_size >> 
>> PAGE_SHIFT;
>> +struct drm_mm_node *node = bo->mem.mm_node;
>> +unsigned long pages_left;
>> +
>> +for (pages_left = bo->mem.num_pages;
>> + pages_left;
>> + pages_left -= node->size, node++) {
>> +if (node->start < fpfn)
>> +break;
>> +}
>> +
>> +if (!pages_left)
>> +goto gtt;
>> +
>> +/* Try evicting to the CPU inaccessible part of VRAM
>> + * first, but only set GTT as busy placement, so this
>> + * BO will be evicted to GTT rather than causing other
>> + * BOs to be evicted from VRAM
>> + */
>> +amdgpu_ttm_placement_from_domain(abo, 
>> AMDGPU_GEM_DOMAIN_VRAM |
>> + AMDGPU_GEM_DOMAIN_GTT);
>> +abo->placements[0].fpfn = fpfn;
>> +abo->placements[0].lpfn = 0;
>> +abo->placement.busy_placement = >placements[1];
> 
> Are you sure you want to hardcode the placements index? It'll be dependent on
> the order set up in amdgpu_ttm_placement_init.

Yes, see patch 1. Looping over the placements and testing their contents
is silly when we know exactly how they were set up. Or do you mean this
code shouldn't call amdgpu_ttm_placement_from_domain at all and just set
up the placements itself?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm: Harmonize CIK ASIC support in radeon and amdgpu (v2)

2017-05-18 Thread Michel Dänzer

On 19/05/17 04:35 AM, Felix Kuehling wrote:
> On 17-04-27 05:22 AM, Michel Dänzer wrote:
>> On 27/04/17 05:15 AM, Felix Kuehling wrote:
>>> Hi Michel,
>>>
>>> You said in an earlier email that it doesn't have to be convenient. With
>>> that in mind, I went back to a minimalistic solution that doesn't need
>>> any additional Kconfig options and only uses two module parameters (one
>>> in radeon, one in amdgpu).
>>>
>>> I'm attaching my WIP patches for reference (currently based on
>>> amd-kfd-staging). For an official review I'd rebase them on
>>> amd-staging-4.9, remove the #if-0-ed hack that didn't work out, and add
>>> the same for SI.
>>>
>>> Can everyone can agree that this is good enough?
>> Yes, something like this could be a good first step in the right
>> direction. Some comments:
>>
>> The radeon options should be available regardless of
>> CONFIG_DRM_AMDGPU_CIK/SI, or they won't be useful for amdgpu-pro / other
>> out-of-tree amdgpu builds.
> 
> Hmm, when an amdgpu-pro build is installed on a system with an older
> kernel, radeon won't have the module option at all. Unless the pro-build
> blacklists radeon, or replaces radeon with its own version, it will
> always have to be compiled without CIK support.
> 
> Therefore, I think my patch, meant for upstream, should not consider
> this case.

Obviously, out-of-tree module builds will have to continue doing what
they've been doing so far with kernels without your patch. I'm thinking
about making it easier for them with kernels which do have your patch.
At some point in the future, maybe the support for kernels without your
patch can be dropped entirely.


>> The default at this point should possibly still be for CIK GPUs to be
>> driven by radeon, even if CONFIG_DRM_AMDGPU_CIK is enabled;
> 
> Alex and Christian seem to think otherwise.

FWIW, on my AMD notebook (HP EliteBook 725 G2, Kaveri), suspend/resume
with amdgpu results in a black screen (can reboot blindly); works fine
with radeon.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 5/5] drm/amdgpu: Update Kconfig help for SI and CIK support

2017-05-18 Thread Felix Kuehling

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index f3b6df8..468a19b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -5,15 +5,23 @@ config DRM_AMDGPU_SI
  Choose this option if you want to enable experimental support
  for SI asics.
 
+ SI is already supported in radeon. Experimental support for SI
+ in amdgpu will be disabled by default and is still provided by
+ radeon. Use module options to override this:
+
+ radeon.si_support=0 amdgpu.si_support=1
+
 config DRM_AMDGPU_CIK
bool "Enable amdgpu support for CIK parts"
depends on DRM_AMDGPU
help
- Choose this option if you want to enable experimental support
- for CIK asics.
+ Choose this option if you want to enable support for CIK asics.
+
+ CIK is already supported in radeon. If you enable this option,
+ support for CIK will be provided by amdgpu and disabled in
+ radeon by default. Use module options to override this:
 
- CIK is already supported in radeon.  CIK support in amdgpu
- is for experimentation and testing.
+ radeon.cik_support=1 amdgpu.cik_support=0
 
 config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/5] drm/amdgpu: Add module param to control CIK support

2017-05-18 Thread Felix Kuehling

If AMDGPU supports CIK, add a module parameter to control CIK
support. It's on by default in AMDGPU, while it is off by default
in radeon.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 15 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ae7e775..be1cb3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -114,6 +114,10 @@
 extern int amdgpu_param_buf_per_se;
 extern int amdgpu_job_hang_limit;
 
+#ifdef CONFIG_DRM_AMDGPU_CIK
+extern int amdgpu_cik_support;
+#endif
+
 #define AMDGPU_DEFAULT_GTT_SIZE_MB 3072ULL /* 3GB by default */
 #define AMDGPU_WAIT_IDLE_TIMEOUT_IN_MS 3000
 #define AMDGPU_MAX_USEC_TIMEOUT10  /* 100 ms */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 130c45d..f131ac2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -241,6 +241,12 @@
 MODULE_PARM_DESC(job_hang_limit, "how much time allow a job hang and not drop 
it (default 0)");
 module_param_named(job_hang_limit, amdgpu_job_hang_limit, int ,0444);
 
+#ifdef CONFIG_DRM_AMDGPU_CIK
+int amdgpu_cik_support = 1;
+MODULE_PARM_DESC(cik_support, "CIK support (1 = enabled (default), 0 = 
disabled)");
+module_param_named(cik_support, amdgpu_cik_support, int, 0444);
+#endif
+
 
 static const struct pci_device_id pciidlist[] = {
 #ifdef  CONFIG_DRM_AMDGPU_SI
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 75d0eba..a2171c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -88,6 +88,21 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
struct amdgpu_device *adev;
int r, acpi_status;
 
+#ifdef CONFIG_DRM_AMDGPU_CIK
+   if (!amdgpu_cik_support) {
+   switch (flags & AMD_ASIC_MASK) {
+   case CHIP_KAVERI:
+   case CHIP_BONAIRE:
+   case CHIP_HAWAII:
+   case CHIP_KABINI:
+   case CHIP_MULLINS:
+   dev_info(dev->dev,
+"CIK support disabled by module param\n");
+   return -ENODEV;
+   }
+   }
+#endif
+
adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
if (adev == NULL) {
return -ENOMEM;
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/5] drm/radeon: Add module param to control SI support

2017-05-18 Thread Felix Kuehling

If AMDGPU supports SI, add a module parameter to control SI
support in radeon. It's on by default in radeon, while it will be
off by default in AMDGPU as long as SI support is experimental.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/radeon/radeon.h |  3 +++
 drivers/gpu/drm/radeon/radeon_drv.c |  6 ++
 drivers/gpu/drm/radeon/radeon_kms.c | 14 ++
 3 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8d2a021..8698812 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -116,6 +116,9 @@
 extern int radeon_uvd;
 extern int radeon_vce;
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+extern int radeon_si_support;
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 extern int radeon_cik_support;
 #endif
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index 2cdb01b..247a923 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -301,6 +301,12 @@ static inline void radeon_unregister_atpx_handler(void) {}
 MODULE_PARM_DESC(vce, "vce enable/disable vce support (1 = enable, 0 = 
disable)");
 module_param_named(vce, radeon_vce, int, 0444);
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+int radeon_si_support = 1;
+MODULE_PARM_DESC(si_support, "SI support (1 = enabled (default), 0 = 
disabled)");
+module_param_named(si_support, radeon_si_support, int, 0444);
+#endif
+
 #ifdef CONFIG_DRM_AMDGPU_CIK
 int radeon_cik_support = 0;
 MODULE_PARM_DESC(cik_support, "CIK support (1 = enabled, 0 = disabled 
(default))");
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 9453e28..63d9f2d 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -99,6 +99,20 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
struct radeon_device *rdev;
int r, acpi_status;
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+   if (!radeon_si_support) {
+   switch (flags & RADEON_FAMILY_MASK) {
+   case CHIP_TAHITI:
+   case CHIP_PITCAIRN:
+   case CHIP_VERDE:
+   case CHIP_OLAND:
+   case CHIP_HAINAN:
+   dev_info(dev->dev,
+"SI support disabled by module param\n");
+   return -ENODEV;
+   }
+   }
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
if (!radeon_cik_support) {
switch (flags & RADEON_FAMILY_MASK) {
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/5] drm/amdgpu: Add module param to control SI support

2017-05-18 Thread Felix Kuehling

If AMDGPU supports SI, add a module parameter to control SI
support. It's off by default in AMDGPU as long as SI suppost is
experimental, while it is on by default in radeon.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 17 +
 3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index be1cb3b..de7e409 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -114,6 +114,9 @@
 extern int amdgpu_param_buf_per_se;
 extern int amdgpu_job_hang_limit;
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+extern int amdgpu_si_support;
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
 extern int amdgpu_cik_support;
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f131ac2..24297a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -241,6 +241,12 @@
 MODULE_PARM_DESC(job_hang_limit, "how much time allow a job hang and not drop 
it (default 0)");
 module_param_named(job_hang_limit, amdgpu_job_hang_limit, int ,0444);
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+int amdgpu_si_support = 1;
+MODULE_PARM_DESC(si_support, "SI support (1 = enabled, 0 = disabled 
(default))");
+module_param_named(si_support, amdgpu_si_support, int, 0444);
+#endif
+
 #ifdef CONFIG_DRM_AMDGPU_CIK
 int amdgpu_cik_support = 1;
 MODULE_PARM_DESC(cik_support, "CIK support (1 = enabled (default), 0 = 
disabled)");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a2171c3..f418fa4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -88,6 +88,23 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
struct amdgpu_device *adev;
int r, acpi_status;
 
+#ifdef CONFIG_DRM_AMDGPU_SI
+   if (!amdgpu_si_support) {
+   switch (flags & AMD_ASIC_MASK) {
+   case CHIP_TAHITI:
+   case CHIP_PITCAIRN:
+   case CHIP_VERDE:
+   case CHIP_OLAND:
+   case CHIP_HAINAN:
+   dev_info(dev->dev,
+"SI support provided by radeon.\n");
+   dev_info(dev->dev,
+   "Use radeon.si_support=0 amdgpu.si_support=1 to override.\n"
+   );
+   return -ENODEV;
+   }
+   }
+#endif
 #ifdef CONFIG_DRM_AMDGPU_CIK
if (!amdgpu_cik_support) {
switch (flags & AMD_ASIC_MASK) {
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm: Harmonize CIK ASIC support in radeon and amdgpu (v2)

2017-05-18 Thread Felix Kuehling

On 17-04-27 05:22 AM, Michel Dänzer wrote:
> On 27/04/17 05:15 AM, Felix Kuehling wrote:
>> Hi Michel,
>>
>> You said in an earlier email that it doesn't have to be convenient. With
>> that in mind, I went back to a minimalistic solution that doesn't need
>> any additional Kconfig options and only uses two module parameters (one
>> in radeon, one in amdgpu).
>>
>> I'm attaching my WIP patches for reference (currently based on
>> amd-kfd-staging). For an official review I'd rebase them on
>> amd-staging-4.9, remove the #if-0-ed hack that didn't work out, and add
>> the same for SI.
>>
>> Can everyone can agree that this is good enough?
> Yes, something like this could be a good first step in the right
> direction. Some comments:
>
> The radeon options should be available regardless of
> CONFIG_DRM_AMDGPU_CIK/SI, or they won't be useful for amdgpu-pro / other
> out-of-tree amdgpu builds.

Hmm, when an amdgpu-pro build is installed on a system with an older
kernel, radeon won't have the module option at all. Unless the pro-build
blacklists radeon, or replaces radeon with its own version, it will
always have to be compiled without CIK support.

Therefore, I think my patch, meant for upstream, should not consider
this case.

>
> The default at this point should possibly still be for CIK GPUs to be
> driven by radeon, even if CONFIG_DRM_AMDGPU_CIK is enabled;

Alex and Christian seem to think otherwise.

Regards,
  Felix

>  more
> definitely so for SI. Then we can flip the default and remove
> CONFIG_DRM_AMDGPU_CIK/SI at some point in the future, and (much) later
> maybe remove the CIK/SI code from radeon.
>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-18 Thread Lukas Wunner

Nicolai Stange reports the following oops which is caused by
dereferencing rdev->pdev before it's subsequently set by
radeon_device_init().  Fix it.

  BUG: unable to handle kernel NULL pointer dereference at 07cb
  IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
  PGD 0
  P4D 0

  Oops:  [#1] SMP
  Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit 
drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel 
serio_raw mmc_core pps_core video i2c_hid hid_plantronics
  CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1
  Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
  task: 97d62c8f task.stack: b96f01478000
  RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
  RSP: 0018:b96f0147b9d0 EFLAGS: 00010246
  RAX:  RBX: 97d620085000 RCX: 00610037
  RDX:  RSI: 002b RDI: 
  RBP: b96f0147b9e8 R08: 0002 R09: b96f0147b924
  R10:  R11: 97d62edd2ec0 R12: 97d628d5c000
  R13: 00610037 R14: c0698280 R15: 
  FS:  7f496363d8c0() GS:97d62eb0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 07cb CR3: 00022c14c000 CR4: 001406e0
  Call Trace:
   drm_dev_register+0x146/0x1d0 [drm]
   drm_get_pci_dev+0x9a/0x180 [drm]
   radeon_pci_probe+0xb8/0xe0 [radeon]
   local_pci_probe+0x45/0xa0
   pci_device_probe+0x14f/0x1a0
   driver_probe_device+0x29c/0x450
   __driver_attach+0xdf/0xf0
   ? driver_probe_device+0x450/0x450
   bus_for_each_dev+0x6c/0xc0
   driver_attach+0x1e/0x20
   bus_add_driver+0x170/0x270
   driver_register+0x60/0xe0
   ? 0xc0508000
   __pci_register_driver+0x4c/0x50
   drm_pci_init+0xeb/0x100 [drm]
   ? vga_switcheroo_register_handler+0x6a/0x90
   ? 0xc0508000
   radeon_init+0x98/0xb6 [radeon]
   do_one_initcall+0x52/0x1a0
   ? __vunmap+0x81/0xb0
   ? kmem_cache_alloc_trace+0x159/0x1b0
   ? do_init_module+0x27/0x1f8
   do_init_module+0x5f/0x1f8
   load_module+0x27ce/0x2be0
   SYSC_finit_module+0xdf/0x110
   ? SYSC_finit_module+0xdf/0x110
   SyS_finit_module+0xe/0x10
   do_syscall_64+0x67/0x150
   entry_SYSCALL64_slow_path+0x25/0x25
  RIP: 0033:0x7f4962295679
  RSP: 002b:7ffdd8c4f878 EFLAGS: 0246 ORIG_RAX: 0139
  RAX: ffda RBX: 55c014ed8200 RCX: 7f4962295679
  RDX:  RSI: 7f4962dd19c5 RDI: 0010
  RBP: 7f4962dd19c5 R08:  R09: 7ffdd8c4f990
  R10: 0010 R11: 0246 R12: 
  R13: 55c014ed81a0 R14: 0002 R15: 55c0149d1fca
  Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 
00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 
f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
  RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: b96f0147b9d0
  CR2: 07cb
  ---[ end trace 89cc4ba7e569c65c ]---

Reported-by: Nicolai Stange 
Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
vga_switcheroo")
Signed-off-by: Lukas Wunner 
---

Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
needs to be fixed, so sending out with a proper commit message now.
The bug was only introduced to radeon, not amdgpu.

@Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
land before -rc3 because Sean Paul has already sent out the -rc2 pull.
I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!

 drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 6a68d440bc44..d0ad03674250 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
if ((radeon_runtime_pm != 0) &&
radeon_has_atpx() &&
((flags & RADEON_IS_IGP) == 0) &&
-   !pci_is_thunderbolt_attached(rdev->pdev))
+   !pci_is_thunderbolt_attached(dev->pdev))
flags |= RADEON_IS_PX;
 
/* radeon_device_init should report only fatal error
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: (radeon?) WARNING: drivers/gpu/drm/drm_irq.c:1195 drm_vblank_put (v4.11-12441-g56868a4)

2017-05-18 Thread Tommi Rantala

2017-05-11 5:51 GMT+03:00 Michel Dänzer :
> On 11/05/17 04:33 AM, Tommi Rantala wrote:
>> Complete kernel log:
>> http://termbin.com/dzy5
>>
>> [  249.952546] [ cut here ]
>> [  249.952593] WARNING: CPU: 5 PID: 0 at
>> /home/ttrantal/git/linux/drivers/gpu/drm/drm_irq.c:1195
>> drm_vblank_put+0xc4/0x120 [drm]
>> [  249.952596] Modules linked in: fuse tun bridge stp llc af_packet
>> pl2303 usbserial shpchp acpi_cpufreq binfmt_misc amdgpu hid_generic
>> uhci_hcd radeon 3c59x mii tg3 ehci_pci ehci_hcd i2c_algo_bit
>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm
>> agpgart unix autofs4
>> [  249.952675] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW
>> 4.11.0+ #4
>> [  249.952678] Hardware name: Hewlett-Packard HP xw6600
>> Workstation/0A9Ch, BIOS 786F4 v01.46 09/20/2012
>> [  249.952681] task: 88080aea task.stack: c900031b
>> [  249.952695] RIP: 0010:drm_vblank_put+0xc4/0x120 [drm]
>> [  249.952698] RSP: 0018:88080f003d70 EFLAGS: 00010046
>> [  249.952703] RAX:  RBX: 880801d53000 RCX: 
>> 
>> [  249.952706] RDX:  RSI:  RDI: 
>> 88080a4ac000
>> [  249.952709] RBP: 88080f003d88 R08: 0001 R09: 
>> 0003
>> [  249.952711] R10: 88080f003d08 R11: 001da540 R12: 
>> 88080a4ac000
>> [  249.952714] R13:  R14: 0086 R15: 
>> 8808019a
>> [  249.952717] FS:  () GS:88080f00()
>> knlGS:
>> [  249.952720] CS:  0010 DS:  ES:  CR0: 80050033
>> [  249.952723] CR2: 7f8bcc3a5810 CR3: 000808789000 CR4: 
>> 06e0
>> [  249.952726] Call Trace:
>> [  249.952731]  
>> [  249.952746]  drm_crtc_vblank_put+0x1b/0x30 [drm]
>> [  249.952813]  radeon_crtc_handle_flip+0xdc/0x140 [radeon]
>> [  249.952843]  si_irq_process+0x610/0x1e90 [radeon]
>> [  249.952872]  radeon_driver_irq_handler_kms+0x39/0xc0 [radeon]
>> [  249.952881]  __handle_irq_event_percpu+0x60/0x580
>> [  249.952887]  handle_irq_event_percpu+0x20/0x90
>> [  249.952892]  handle_irq_event+0x46/0xb0
>> [  249.952897]  handle_edge_irq+0x13d/0x370
>> [  249.952903]  handle_irq+0x66/0x210
>> [  249.952908]  ? __local_bh_enable+0x34/0x50
>> [  249.952914]  do_IRQ+0x7e/0x1b0
>> [  249.952920]  common_interrupt+0x95/0x95
>
> Weird, not sure how this could happen. Can you bisect?

Hi,

Bisection points to this (also manually applied commit 9739e74646
while testing, got kernel oops otherwise):

commit 29dc0d1de18239cf3ef8bab578b8321ed340d81c
Author: Daniel Vetter 
Date:   Wed Mar 22 22:50:49 2017 +0100

drm: Roll out acquire context for the page_flip ioctl

Again just prep work.

Reviewed-by: Harry Wentland 
Signed-off-by: Daniel Vetter 
Link: 
http://patchwork.freedesktop.org/patch/msgid/20170322215058.8671-11-daniel.vet...@ffwll.ch


I'm also seeing some more warnings in this version:


May 18 19:21:55 xw6600 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp14s0:
link becomes ready
May 18 19:21:57 xw6600 kernel: [ cut here ]
May 18 19:21:57 xw6600 kernel: WARNING: CPU: 5 PID: 4607 at
/home/ttrantal/git/linux/drivers/gpu/drm/drm_modeset_lock.c:193
drm_modeset_lock_crtc+0xe5/0x100 [drm]
May 18 19:21:57 xw6600 kernel: Modules linked in: tun bridge stp llc
af_packet pl2303 usbserial shpchp acpi_cpufreq binfmt_misc amdgpu
hid_generic uhci_hcd radeon 3c59x mii i2c_algo_bit drm_kms_helper tg3
syscopyarea sysfillrect sysimgblt
May 18 19:21:57 xw6600 kernel: CPU: 5 PID: 4607 Comm: gnome-shell Not
tainted 4.11.0-rc3-00944-g29dc0d1-dirty #30
May 18 19:21:57 xw6600 kernel: Hardware name: Hewlett-Packard HP
xw6600 Workstation/0A9Ch, BIOS 786F4 v01.46 09/20/2012
May 18 19:21:57 xw6600 kernel: Call Trace:
May 18 19:21:57 xw6600 kernel:  dump_stack+0x69/0x9b
May 18 19:21:57 xw6600 kernel:  __warn+0xff/0x140
May 18 19:21:57 xw6600 kernel:  warn_slowpath_null+0x18/0x20
May 18 19:21:57 xw6600 kernel:  drm_modeset_lock_crtc+0xe5/0x100 [drm]
May 18 19:21:57 xw6600 kernel:  drm_mode_cursor_common+0xbd/0x200 [drm]
May 18 19:21:57 xw6600 kernel:  drm_mode_cursor_ioctl+0x3c/0x40 [drm]
May 18 19:21:57 xw6600 kernel:  drm_ioctl+0x3ea/0x870 [drm]
May 18 19:21:57 xw6600 kernel:  ? drm_mode_setplane+0x1a0/0x1a0 [drm]
May 18 19:21:57 xw6600 kernel:  ? trace_hardirqs_on_caller+0x1ad/0x2c0
May 18 19:21:57 xw6600 kernel:  ? trace_hardirqs_on+0xd/0x10
May 18 19:21:57 xw6600 kernel:  radeon_drm_ioctl+0x6e/0x110 [radeon]
May 18 19:21:57 xw6600 kernel:  do_vfs_ioctl+0xac/0x9d0
May 18 19:21:57 xw6600 kernel:  ? security_file_ioctl+0x4c/0x80
May 18 19:21:57 xw6600 kernel:  SyS_ioctl+0x74/0x80
May 18 19:21:57 xw6600 kernel:  entry_SYSCALL_64_fastpath+0x18/0xad
May 18 19:21:57 xw6600 kernel: RIP: 0033:0x7f98d9e52787
May 18 19:21:57 xw6600 kernel: RSP: 002b:7ffe12ca5938 EFLAGS:
0246 ORIG_RAX:

Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

2017-05-18 Thread John Brooks

On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
> flag set to CPU visible VRAM with more force.
> 
> For other BOs, this gives another chance to stay in VRAM if they
> happened to lie in the CPU visible part and another BO needs to go
> there.
> 
> This should allow BOs to stay in VRAM longer in some cases.
> 
> Signed-off-by: Michel Dänzer 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 27 +++
>  2 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 41ee353b22c8..d45c2325c61a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -963,11 +963,20 @@ int amdgpu_bo_fault_reserve_notify(struct 
> ttm_buffer_object *bo)
>AMDGPU_GEM_DOMAIN_GTT);
>   abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
>  
> - /* Only set GTT as busy placement; if there is no space in CPU visible
> -  * VRAM, move this BO to GTT instead of evicting other BOs
> -  */
> - abo->placement.busy_placement = >placements[1];
> - abo->placement.num_busy_placement = 1;
> + if (abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> + /* Only set VRAM as normal placement; if there is no space in
> +  * CPU visible VRAM, evict other BOs, only fall back to GTT as
> +  * last resort
> +  */
> + abo->placement.num_placement = 1;
> + } else {
> + /* Only set GTT as busy placement; if there is no space in CPU
> +  * visible VRAM, move this BO to GTT instead of evicting other
> +  * BOs
> +  */
> + abo->placement.busy_placement = >placements[1];
> + abo->placement.num_busy_placement = 1;
> + }
>  
>   return ttm_bo_validate(bo, >placement, false, false);
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 57789b860768..d5ed85026542 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
> *bo,
>   adev->mman.buffer_funcs_ring &&
>   adev->mman.buffer_funcs_ring->ready == false) {
>   amdgpu_ttm_placement_from_domain(abo, 
> AMDGPU_GEM_DOMAIN_CPU);
> + } else if (adev->mc.visible_vram_size < 
> adev->mc.real_vram_size) {
> + unsigned fpfn = adev->mc.visible_vram_size >> 
> PAGE_SHIFT;
> + struct drm_mm_node *node = bo->mem.mm_node;
> + unsigned long pages_left;
> +
> + for (pages_left = bo->mem.num_pages;
> +  pages_left;
> +  pages_left -= node->size, node++) {
> + if (node->start < fpfn)
> + break;
> + }
> +
> + if (!pages_left)
> + goto gtt;
> +
> + /* Try evicting to the CPU inaccessible part of VRAM
> +  * first, but only set GTT as busy placement, so this
> +  * BO will be evicted to GTT rather than causing other
> +  * BOs to be evicted from VRAM
> +  */
> + amdgpu_ttm_placement_from_domain(abo, 
> AMDGPU_GEM_DOMAIN_VRAM |
> +  AMDGPU_GEM_DOMAIN_GTT);
> + abo->placements[0].fpfn = fpfn;
> + abo->placements[0].lpfn = 0;
> + abo->placement.busy_placement = >placements[1];

Are you sure you want to hardcode the placements index? It'll be dependent on
the order set up in amdgpu_ttm_placement_init.

> + abo->placement.num_busy_placement = 1;
>   } else {
> +gtt:
>   amdgpu_ttm_placement_from_domain(abo, 
> AMDGPU_GEM_DOMAIN_GTT);
>   /* Set an upper limit to force directly allocating
>* address space for the BO.
> -- 
> 2.11.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

--
John Brooks
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/amdgpu: Enable raven gpu_info firmware

2017-05-18 Thread Zhang, Hawking

IIRC It's already in amd-staging-4.9 branch. Anyway, the patch is

Reviewed-by: Hawking Zhang 

Regards,
Hawking

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Tom 
St Denis
Sent: Thursday, May 18, 2017 20:00
To: amd-gfx@lists.freedesktop.org
Cc: StDenis, Tom 
Subject: [PATCH] drm/amd/amdgpu: Enable raven gpu_info firmware

Add CHIP_RAVEN to the list of ASICs that have gpu_info firmware.

Signed-off-by: Tom St Denis 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d95d4c92da2a..ca9a765385de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1428,6 +1428,9 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)  #endif
default:
return 0;
+   case CHIP_RAVEN:
+   chip_name = "raven";
+   break;
case CHIP_VEGA10:
chip_name = "vega10";
break;
--
2.12.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/amdgpu: Enable raven gpu_info firmware

2017-05-18 Thread Tom St Denis


On 18/05/17 10:07 AM, Zhang, Hawking wrote:

IIRC It's already in amd-staging-4.9 branch. Anyway, the patch is

Reviewed-by: Hawking Zhang 


It must have gone missing when Alex (Deucher) made the 4.11 branch.

Thanks for the R-b but perhaps we should cherry-pick the patch from 4.9 
if it's small enough.


Looks like it is simply this commit

commit 3b51519cc8ebb6a25a86522eccc9708afd4dc33b
Author: Alex Deucher 
Date:   Tue May 9 12:27:35 2017 -0400

drm/amdgpu: add raven gpu_info support

Add support for parsing the gpu info table on raven.
This is required to get the gpu config data for raven.

Signed-off-by: Alex Deucher 

I'll cherry-pick that over and push it out momentarily (since Alex is on 
vacation).


Tom
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/radeon: fix "force the UVD DPB into VRAM as well"

2017-05-18 Thread Christian König

From: Christian König 

The DPB must be in VRAM, but not in the first segment.

Signed-off-by: Christian König 
Tested-by: Arthur Marsh 
---
 drivers/gpu/drm/radeon/radeon_uvd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c 
b/drivers/gpu/drm/radeon/radeon_uvd.c
index fad4a11..0cd0e7b 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -621,7 +621,7 @@ static int radeon_uvd_cs_reloc(struct radeon_cs_parser *p,
}
 
/* TODO: is this still necessary on NI+ ? */
-   if ((cmd == 0 || cmd == 1 || cmd == 0x3) &&
+   if ((cmd == 0 || cmd == 0x3) &&
(start >> 28) != (p->rdev->uvd.gpu_addr >> 28)) {
DRM_ERROR("msg/fb buffer %LX-%LX out of 256MB segment!\n",
  start, end);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/amdgpu: Enable raven gpu_info firmware

2017-05-18 Thread Tom St Denis

Add CHIP_RAVEN to the list of ASICs that have
gpu_info firmware.

Signed-off-by: Tom St Denis 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d95d4c92da2a..ca9a765385de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1428,6 +1428,9 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
 #endif
default:
return 0;
+   case CHIP_RAVEN:
+   chip_name = "raven";
+   break;
case CHIP_VEGA10:
chip_name = "vega10";
break;
-- 
2.12.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix ocl test performance drop

2017-05-18 Thread Christian König


Am 18.05.2017 um 09:45 schrieb Flora Cui:

partial revert commit <6971d3d> - drm/amdgpu: cleanup logic in
amdgpu_vm_flush

Change-Id: Iadce9d613dfe9a739643a74050cea55854832adb
Signed-off-by: Flora Cui 


I don't see how the revert should be faster than the original.

Especially that amdgpu_vm_had_gpu_reset() is now called twice sounds 
like more overhead than necessary.


Please explain further.

Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 88420dc..a96bad6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -743,23 +743,19 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job)
id->gws_size != job->gws_size ||
id->oa_base != job->oa_base ||
id->oa_size != job->oa_size);
-   bool vm_flush_needed = job->vm_needs_flush ||
-   amdgpu_vm_ring_has_compute_vm_bug(ring);
unsigned patch_offset = 0;
int r;
  
-	if (amdgpu_vm_had_gpu_reset(adev, id)) {

-   gds_switch_needed = true;
-   vm_flush_needed = true;
-   }
-
-   if (!vm_flush_needed && !gds_switch_needed)
+   if (!job->vm_needs_flush && !gds_switch_needed &&
+   !amdgpu_vm_had_gpu_reset(adev, id) &&
+   !amdgpu_vm_ring_has_compute_vm_bug(ring))
return 0;
  
  	if (ring->funcs->init_cond_exec)

patch_offset = amdgpu_ring_init_cond_exec(ring);
  
-	if (ring->funcs->emit_vm_flush && vm_flush_needed) {

+   if (ring->funcs->emit_vm_flush &&
+   (job->vm_needs_flush || amdgpu_vm_had_gpu_reset(adev, id))) {
struct fence *fence;
  
  		trace_amdgpu_vm_flush(ring, job->vm_id, job->vm_pd_addr);



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: fix ocl test performance drop

2017-05-18 Thread Flora Cui

partial revert commit <6971d3d> - drm/amdgpu: cleanup logic in
amdgpu_vm_flush

Change-Id: Iadce9d613dfe9a739643a74050cea55854832adb
Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 88420dc..a96bad6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -743,23 +743,19 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job)
id->gws_size != job->gws_size ||
id->oa_base != job->oa_base ||
id->oa_size != job->oa_size);
-   bool vm_flush_needed = job->vm_needs_flush ||
-   amdgpu_vm_ring_has_compute_vm_bug(ring);
unsigned patch_offset = 0;
int r;
 
-   if (amdgpu_vm_had_gpu_reset(adev, id)) {
-   gds_switch_needed = true;
-   vm_flush_needed = true;
-   }
-
-   if (!vm_flush_needed && !gds_switch_needed)
+   if (!job->vm_needs_flush && !gds_switch_needed &&
+   !amdgpu_vm_had_gpu_reset(adev, id) &&
+   !amdgpu_vm_ring_has_compute_vm_bug(ring))
return 0;
 
if (ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
 
-   if (ring->funcs->emit_vm_flush && vm_flush_needed) {
+   if (ring->funcs->emit_vm_flush &&
+   (job->vm_needs_flush || amdgpu_vm_had_gpu_reset(adev, id))) {
struct fence *fence;
 
trace_amdgpu_vm_flush(ring, job->vm_id, job->vm_pd_addr);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add dep_sync for amdgpu job

2017-05-18 Thread Christian König


Am 18.05.2017 um 10:18 schrieb Chunming Zhou:

The fence in dep_sync cannot be optimized.

Change-Id: Ica1924ad4fe991c0f13438ab521036f60544afcc
Signed-off-by: Chunming Zhou 


A bit more commit message wouldn't hurt, but either way the patch is 
Reviewed-by: Christian König 


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 +++--
  3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8351dd2..49bcafd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1162,6 +1162,7 @@ struct amdgpu_job {
struct amdgpu_vm*vm;
struct amdgpu_ring  *ring;
struct amdgpu_sync  sync;
+   struct amdgpu_sync  dep_sync;
struct amdgpu_sync  sched_sync;
struct amdgpu_ib*ibs;
struct fence*fence; /* the hw fence */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 2f5ef94..dce3ed6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1048,7 +1048,7 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
}
}
  
-	return amdgpu_sem_add_cs(p->ctx, p->job->ring, >job->sync);

+   return amdgpu_sem_add_cs(p->ctx, p->job->ring, >job->dep_sync);
  }
  
  static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 0c177bb..12f3207 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -63,6 +63,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
(*job)->num_ibs = num_ibs;
  
  	amdgpu_sync_create(&(*job)->sync);

+   amdgpu_sync_create(&(*job)->dep_sync);
amdgpu_sync_create(&(*job)->sched_sync);
  
  	return 0;

@@ -102,6 +103,7 @@ static void amdgpu_job_free_cb(struct amd_sched_job *s_job)
  
  	fence_put(job->fence);

amdgpu_sync_free(>sync);
+   amdgpu_sync_free(>dep_sync);
amdgpu_sync_free(>sched_sync);
kfree(job);
  }
@@ -112,6 +114,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
  
  	fence_put(job->fence);

amdgpu_sync_free(>sync);
+   amdgpu_sync_free(>dep_sync);
amdgpu_sync_free(>sched_sync);
kfree(job);
  }
@@ -144,9 +147,16 @@ static struct fence *amdgpu_job_dependency(struct 
amd_sched_job *sched_job)
struct amdgpu_job *job = to_amdgpu_job(sched_job);
struct amdgpu_vm *vm = job->vm;
  
-	struct fence *fence = amdgpu_sync_get_fence(>sync);

+   struct fence *fence = amdgpu_sync_get_fence(>dep_sync);
int r;
  
+	if (amd_sched_dependency_optimized(fence, sched_job->s_entity)) {

+   r = amdgpu_sync_fence(job->adev, >sched_sync, fence);
+   if (r)
+   DRM_ERROR("Error adding fence to sync (%d)\n", r);
+   }
+   if (!fence)
+   fence = amdgpu_sync_get_fence(>sync);
while (fence == NULL && vm && !job->vm_id) {
struct amdgpu_ring *ring = job->ring;
  
@@ -159,11 +169,6 @@ static struct fence *amdgpu_job_dependency(struct amd_sched_job *sched_job)

fence = amdgpu_sync_get_fence(>sync);
}
  
-	if (amd_sched_dependency_optimized(fence, sched_job->s_entity)) {

-   r = amdgpu_sync_fence(job->adev, >sched_sync, fence);
-   if (r)
-   DRM_ERROR("Error adding fence to sync (%d)\n", r);
-   }
return fence;
  }
  



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Plan: BO move throttling for visible VRAM evictions

2017-05-18 Thread Marek Olšák

On May 18, 2017 10:17 AM, "Michel Dänzer"  wrote:

On 17/05/17 09:35 PM, Marek Olšák wrote:
> On May 16, 2017 3:57 AM, "Michel Dänzer"  > wrote:
> On 15/05/17 07:11 PM, Marek Olšák wrote:
> > On May 15, 2017 4:29 AM, "Michel Dänzer"  
> > >> wrote:
> >
> > I think the next step should be to make radeonsi keep track of
> how much
> > VRAM it's trying to use that's expected to be accessed by the
> CPU, and
> > to use GTT instead when that exceeds a threshold (probably
> derived from
> > vram_vis_size).
> >
> > That's difficult to estimate. There are apps with 600MB of mapped
VRAM
> > and don't experience any performance issues. And some apps with
> 300MB of
> > mapped VRAM do. It only depends on the CPU access pattern, not what
> > radeonsi sees.
>
> What I mean is keeping track of the total size of resources which have
> RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it exceeds a
> threshold, create new ones having those flags in GTT instead. Even
> though this might not be strictly necessary with amdgpu in the long
run,
> it probably is for radeon anyway, and in the short term it might help
> even with amdgpu.
>
>
> That might hurt us more than it can help.

You may be right, but I think I'll play with that idea a little anyway
to see how it goes. :)

> All mappable buffers have the CPU access flag set, but many of them are
> immutable.

You mean they're only written to once by the CPU? We shouldn't set the
RADEON_FLAG_CPU_ACCESS flag for BOs where we expect that, because it
will currently prevent them from being in the CPU invisible part of VRAM.

The only thing I can do is set the CPU access flag for persistently mapped
buffers only. We certainly want buffers to go to the invisible part of VRAM
if there is no CPU access for a certain timeframe. So maybe we shouldn't
set the flag at all. What do you thing?

The truth is we have no way to know what apps intend to do with any buffers.

> The only place where this can be handled is the kernel.

Ideally, the placement of a BO should be determined based on how it's
actually being used by the GPU vs CPU. But I'm not sure how to determine
that in a useful way.

CPU page faults are the only way to determine that CPU access is happening.

> Even if it's as simple as: if (bo->numcpufaults > 10) domain = GTT_WC;

I'm skeptical about the number of CPU page faults per se being a useful
metric. It doesn't tell us much about how the BO is used even by the
CPU, let alone the GPU. But let's see where this leads you.

It tells us more than what Mesa can ever know, which is nothing.

Marek

One thing that might help would be if we could swap individual memory
nodes between visible and invisible VRAM for CPU page faults, instead of
moving/evicting whole BOs. Christian, do you think something like that
would be possible?

Another idea (to avoid issues such as the recent one with Rocket League)
was to make VRAM CPU mappings write-only, and move the BO to GTT if
there's a read fault. But not sure if this is possible at all, or how
much effort it would be.

--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions

2017-05-18 Thread Michel Dänzer

On 18/05/17 06:17 PM, Christian König wrote:
> Am 18.05.2017 um 11:08 schrieb Michel Dänzer:
>> From: Michel Dänzer 
>>
>> We know how the placements were initialized in these cases, so we can
>> set the restrictions directly without a loop.
>>
>> Signed-off-by: Michel Dänzer 

[...]

>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> index 2ca09f111f08..60688fa5ef98 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> @@ -375,11 +375,7 @@ void amdgpu_uvd_free_handles(struct amdgpu_device
>> *adev, struct drm_file *filp)
>> static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
>>   {
>> -int i;
>> -for (i = 0; i < abo->placement.num_placement; ++i) {
>> -abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
>> -abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
>> -}
>> +abo->placements[0].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
> 
> This is not correct. The restriction applies to all placements, not only
> the first one.

I see, there can be multiple placements in amdgpu_uvd_cs_pass1, if cmd
!= 0x0 and != 0x3? I'll drop this hunk then, thanks.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions

2017-05-18 Thread Christian König


Am 18.05.2017 um 11:08 schrieb Michel Dänzer:

From: Michel Dänzer 

We know how the placements were initialized in these cases, so we can
set the restrictions directly without a loop.

Signed-off-by: Michel Dänzer 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 19 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|  6 +-
  3 files changed, 8 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 365883d7948d..b3252bc8fe51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -939,8 +939,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object 
*bo)
  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
struct amdgpu_bo *abo;
-   unsigned long offset, size, lpfn;
-   int i, r;
+   unsigned long offset, size;
+   int r;
  
  	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))

return 0;
@@ -961,14 +961,7 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
  
  	/* hurrah the memory is not visible ! */

amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
-   lpfn =  adev->mc.visible_vram_size >> PAGE_SHIFT;
-   for (i = 0; i < abo->placement.num_placement; i++) {
-   /* Force into visible VRAM */
-   if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
-   (!abo->placements[i].lpfn ||
-abo->placements[i].lpfn > lpfn))
-   abo->placements[i].lpfn = lpfn;
-   }
+   abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
r = ttm_bo_validate(bo, >placement, false, false);
if (unlikely(r == -ENOMEM)) {
amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 5db0230e45c6..57789b860768 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -191,7 +191,6 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
.lpfn = 0,
.flags = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM
};
-   unsigned i;
  
  	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) {

placement->placement = 
@@ -209,20 +208,10 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
} else {
amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_GTT);
-   for (i = 0; i < abo->placement.num_placement; ++i) {
-   if (!(abo->placements[i].flags &
- TTM_PL_FLAG_TT))
-   continue;
-
-   if (abo->placements[i].lpfn)
-   continue;
-
-   /* set an upper limit to force directly
-* allocating address space for the BO.
-*/
-   abo->placements[i].lpfn =
-   adev->mc.gtt_size >> PAGE_SHIFT;
-   }
+   /* Set an upper limit to force directly allocating
+* address space for the BO.
+*/
+   abo->placements[0].lpfn = adev->mc.gtt_size >> 
PAGE_SHIFT;
}
break;
case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 2ca09f111f08..60688fa5ef98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -375,11 +375,7 @@ void amdgpu_uvd_free_handles(struct amdgpu_device *adev, 
struct drm_file *filp)
  
  static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)

  {
-   int i;
-   for (i = 0; i < abo->placement.num_placement; ++i) {
-   abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
-   abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
-   }
+   abo->placements[0].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;


This is not correct. The restriction applies to all placements, not only 
the first one.


Christian.


  }
  
  static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

2017-05-18 Thread Michel Dänzer

From: Michel Dänzer 

In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag set to CPU visible VRAM with more force.

For other BOs, this gives another chance to stay in VRAM if they
happened to lie in the CPU visible part and another BO needs to go
there.

This should allow BOs to stay in VRAM longer in some cases.

Signed-off-by: Michel Dänzer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 27 +++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 41ee353b22c8..d45c2325c61a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -963,11 +963,20 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
 AMDGPU_GEM_DOMAIN_GTT);
abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
 
-   /* Only set GTT as busy placement; if there is no space in CPU visible
-* VRAM, move this BO to GTT instead of evicting other BOs
-*/
-   abo->placement.busy_placement = >placements[1];
-   abo->placement.num_busy_placement = 1;
+   if (abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+   /* Only set VRAM as normal placement; if there is no space in
+* CPU visible VRAM, evict other BOs, only fall back to GTT as
+* last resort
+*/
+   abo->placement.num_placement = 1;
+   } else {
+   /* Only set GTT as busy placement; if there is no space in CPU
+* visible VRAM, move this BO to GTT instead of evicting other
+* BOs
+*/
+   abo->placement.busy_placement = >placements[1];
+   abo->placement.num_busy_placement = 1;
+   }
 
return ttm_bo_validate(bo, >placement, false, false);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 57789b860768..d5ed85026542 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
adev->mman.buffer_funcs_ring &&
adev->mman.buffer_funcs_ring->ready == false) {
amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
+   } else if (adev->mc.visible_vram_size < 
adev->mc.real_vram_size) {
+   unsigned fpfn = adev->mc.visible_vram_size >> 
PAGE_SHIFT;
+   struct drm_mm_node *node = bo->mem.mm_node;
+   unsigned long pages_left;
+
+   for (pages_left = bo->mem.num_pages;
+pages_left;
+pages_left -= node->size, node++) {
+   if (node->start < fpfn)
+   break;
+   }
+
+   if (!pages_left)
+   goto gtt;
+
+   /* Try evicting to the CPU inaccessible part of VRAM
+* first, but only set GTT as busy placement, so this
+* BO will be evicted to GTT rather than causing other
+* BOs to be evicted from VRAM
+*/
+   amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_VRAM |
+AMDGPU_GEM_DOMAIN_GTT);
+   abo->placements[0].fpfn = fpfn;
+   abo->placements[0].lpfn = 0;
+   abo->placement.busy_placement = >placements[1];
+   abo->placement.num_busy_placement = 1;
} else {
+gtt:
amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_GTT);
/* Set an upper limit to force directly allocating
 * address space for the BO.
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-18 Thread Michel Dänzer

From: Michel Dänzer 

This series was developed and tested under the following scenario:

Running the PTS dirt-rally benchmark (1920x1080, Ultra) on Tonga with
2G, with CPU visible VRAM artificially restricted to 64 MB.

Without this series, there's a lot of stutter during about the first
minute of a benchmark run. During this time there are significant amounts
of buffer moves (starting from about 500 MB on the HUD) and evictions,
gradually declining until the buffer moves settle around 8 MB on the HUD.

With this series, there's only slight stutter during the first seconds
after the car launches, even though the buffer move volume is about the
same as without the series. Buffer evictions are eliminated almost
completely, except for a few at the beginning. Buffer moves still settle
around 8 MB on the HUD, but with less variance than before.

Note that there's only little if any improvement of the average framerate
reported, but the minimum framerate as seen on the HUD goes from ~10 fps
to ~17.


Patch 1 is a cleanup that I noticed along the way.

Patch 2 makes the main difference for the above scenario.

Patch 3 doesn't make as much difference, I'm fine with it not landing at
least for now.

Michel Dänzer (3):
  drm/amdgpu: Drop useless loops for placement restrictions
  drm/amdgpu: Don't evict other BOs from VRAM for page faults
  drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 42 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 46 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|  6 +---
 3 files changed, 51 insertions(+), 43 deletions(-)

-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions

2017-05-18 Thread Michel Dänzer

From: Michel Dänzer 

We know how the placements were initialized in these cases, so we can
set the restrictions directly without a loop.

Signed-off-by: Michel Dänzer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 19 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|  6 +-
 3 files changed, 8 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 365883d7948d..b3252bc8fe51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -939,8 +939,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object 
*bo)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
struct amdgpu_bo *abo;
-   unsigned long offset, size, lpfn;
-   int i, r;
+   unsigned long offset, size;
+   int r;
 
if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))
return 0;
@@ -961,14 +961,7 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
 
/* hurrah the memory is not visible ! */
amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
-   lpfn =  adev->mc.visible_vram_size >> PAGE_SHIFT;
-   for (i = 0; i < abo->placement.num_placement; i++) {
-   /* Force into visible VRAM */
-   if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
-   (!abo->placements[i].lpfn ||
-abo->placements[i].lpfn > lpfn))
-   abo->placements[i].lpfn = lpfn;
-   }
+   abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
r = ttm_bo_validate(bo, >placement, false, false);
if (unlikely(r == -ENOMEM)) {
amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 5db0230e45c6..57789b860768 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -191,7 +191,6 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
.lpfn = 0,
.flags = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM
};
-   unsigned i;
 
if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) {
placement->placement = 
@@ -209,20 +208,10 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_CPU);
} else {
amdgpu_ttm_placement_from_domain(abo, 
AMDGPU_GEM_DOMAIN_GTT);
-   for (i = 0; i < abo->placement.num_placement; ++i) {
-   if (!(abo->placements[i].flags &
- TTM_PL_FLAG_TT))
-   continue;
-
-   if (abo->placements[i].lpfn)
-   continue;
-
-   /* set an upper limit to force directly
-* allocating address space for the BO.
-*/
-   abo->placements[i].lpfn =
-   adev->mc.gtt_size >> PAGE_SHIFT;
-   }
+   /* Set an upper limit to force directly allocating
+* address space for the BO.
+*/
+   abo->placements[0].lpfn = adev->mc.gtt_size >> 
PAGE_SHIFT;
}
break;
case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 2ca09f111f08..60688fa5ef98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -375,11 +375,7 @@ void amdgpu_uvd_free_handles(struct amdgpu_device *adev, 
struct drm_file *filp)
 
 static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
 {
-   int i;
-   for (i = 0; i < abo->placement.num_placement; ++i) {
-   abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
-   abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
-   }
+   abo->placements[0].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
 }
 
 static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/3] drm/amdgpu: Don't evict other BOs from VRAM for page faults

2017-05-18 Thread Michel Dänzer

From: Michel Dänzer 

If there is no free space in CPU visible VRAM for the faulting BO, move
it to GTT instead of evicting other BOs from CPU visible VRAM.

Signed-off-by: Michel Dänzer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b3252bc8fe51..41ee353b22c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -940,7 +940,6 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object 
*bo)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
struct amdgpu_bo *abo;
unsigned long offset, size;
-   int r;
 
if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))
return 0;
@@ -960,22 +959,17 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
return -EINVAL;
 
/* hurrah the memory is not visible ! */
-   amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
+   amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
+AMDGPU_GEM_DOMAIN_GTT);
abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
-   r = ttm_bo_validate(bo, >placement, false, false);
-   if (unlikely(r == -ENOMEM)) {
-   amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
-   return ttm_bo_validate(bo, >placement, false, false);
-   } else if (unlikely(r != 0)) {
-   return r;
-   }
 
-   offset = bo->mem.start << PAGE_SHIFT;
-   /* this should never happen */
-   if ((offset + size) > adev->mc.visible_vram_size)
-   return -EINVAL;
+   /* Only set GTT as busy placement; if there is no space in CPU visible
+* VRAM, move this BO to GTT instead of evicting other BOs
+*/
+   abo->placement.busy_placement = >placements[1];
+   abo->placement.num_busy_placement = 1;
 
-   return 0;
+   return ttm_bo_validate(bo, >placement, false, false);
 }
 
 /**
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Plan: BO move throttling for visible VRAM evictions

2017-05-18 Thread Christian König


Am 18.05.2017 um 10:17 schrieb Michel Dänzer:

[SNIP]
One thing that might help would be if we could swap individual memory
nodes between visible and invisible VRAM for CPU page faults, instead of
moving/evicting whole BOs. Christian, do you think something like that
would be possible?
I've looked into this while working on splitting VRAM allocations into 
smaller nodes and the answer is nope, not at all.


We would pretty much need to rewrite whole TTM from BO based to page 
based. Otherwise you just have way to much overhead with all those ttm 
objects around.


And when we really want this step I would rather wait for HMM to land 
and use that instead.


Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: add dep_sync for amdgpu job

2017-05-18 Thread Chunming Zhou

The fence in dep_sync cannot be optimized.

Change-Id: Ica1924ad4fe991c0f13438ab521036f60544afcc
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 +++--
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8351dd2..49bcafd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1162,6 +1162,7 @@ struct amdgpu_job {
struct amdgpu_vm*vm;
struct amdgpu_ring  *ring;
struct amdgpu_sync  sync;
+   struct amdgpu_sync  dep_sync;
struct amdgpu_sync  sched_sync;
struct amdgpu_ib*ibs;
struct fence*fence; /* the hw fence */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 2f5ef94..dce3ed6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1048,7 +1048,7 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
}
}
 
-   return amdgpu_sem_add_cs(p->ctx, p->job->ring, >job->sync);
+   return amdgpu_sem_add_cs(p->ctx, p->job->ring, >job->dep_sync);
 }
 
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 0c177bb..12f3207 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -63,6 +63,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
(*job)->num_ibs = num_ibs;
 
amdgpu_sync_create(&(*job)->sync);
+   amdgpu_sync_create(&(*job)->dep_sync);
amdgpu_sync_create(&(*job)->sched_sync);
 
return 0;
@@ -102,6 +103,7 @@ static void amdgpu_job_free_cb(struct amd_sched_job *s_job)
 
fence_put(job->fence);
amdgpu_sync_free(>sync);
+   amdgpu_sync_free(>dep_sync);
amdgpu_sync_free(>sched_sync);
kfree(job);
 }
@@ -112,6 +114,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
 
fence_put(job->fence);
amdgpu_sync_free(>sync);
+   amdgpu_sync_free(>dep_sync);
amdgpu_sync_free(>sched_sync);
kfree(job);
 }
@@ -144,9 +147,16 @@ static struct fence *amdgpu_job_dependency(struct 
amd_sched_job *sched_job)
struct amdgpu_job *job = to_amdgpu_job(sched_job);
struct amdgpu_vm *vm = job->vm;
 
-   struct fence *fence = amdgpu_sync_get_fence(>sync);
+   struct fence *fence = amdgpu_sync_get_fence(>dep_sync);
int r;
 
+   if (amd_sched_dependency_optimized(fence, sched_job->s_entity)) {
+   r = amdgpu_sync_fence(job->adev, >sched_sync, fence);
+   if (r)
+   DRM_ERROR("Error adding fence to sync (%d)\n", r);
+   }
+   if (!fence)
+   fence = amdgpu_sync_get_fence(>sync);
while (fence == NULL && vm && !job->vm_id) {
struct amdgpu_ring *ring = job->ring;
 
@@ -159,11 +169,6 @@ static struct fence *amdgpu_job_dependency(struct 
amd_sched_job *sched_job)
fence = amdgpu_sync_get_fence(>sync);
}
 
-   if (amd_sched_dependency_optimized(fence, sched_job->s_entity)) {
-   r = amdgpu_sync_fence(job->adev, >sched_sync, fence);
-   if (r)
-   DRM_ERROR("Error adding fence to sync (%d)\n", r);
-   }
return fence;
 }
 
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Plan: BO move throttling for visible VRAM evictions

2017-05-18 Thread Michel Dänzer

On 17/05/17 09:35 PM, Marek Olšák wrote:
> On May 16, 2017 3:57 AM, "Michel Dänzer"  > wrote:
> On 15/05/17 07:11 PM, Marek Olšák wrote:
> > On May 15, 2017 4:29 AM, "Michel Dänzer"  
> > >> wrote:
> >
> > I think the next step should be to make radeonsi keep track of
> how much
> > VRAM it's trying to use that's expected to be accessed by the
> CPU, and
> > to use GTT instead when that exceeds a threshold (probably
> derived from
> > vram_vis_size).
> >
> > That's difficult to estimate. There are apps with 600MB of mapped VRAM
> > and don't experience any performance issues. And some apps with
> 300MB of
> > mapped VRAM do. It only depends on the CPU access pattern, not what
> > radeonsi sees.
> 
> What I mean is keeping track of the total size of resources which have
> RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it exceeds a
> threshold, create new ones having those flags in GTT instead. Even
> though this might not be strictly necessary with amdgpu in the long run,
> it probably is for radeon anyway, and in the short term it might help
> even with amdgpu.
> 
> 
> That might hurt us more than it can help.

You may be right, but I think I'll play with that idea a little anyway
to see how it goes. :)

> All mappable buffers have the CPU access flag set, but many of them are
> immutable.

You mean they're only written to once by the CPU? We shouldn't set the
RADEON_FLAG_CPU_ACCESS flag for BOs where we expect that, because it
will currently prevent them from being in the CPU invisible part of VRAM.

> The only place where this can be handled is the kernel.

Ideally, the placement of a BO should be determined based on how it's
actually being used by the GPU vs CPU. But I'm not sure how to determine
that in a useful way.

> Even if it's as simple as: if (bo->numcpufaults > 10) domain = GTT_WC;

I'm skeptical about the number of CPU page faults per se being a useful
metric. It doesn't tell us much about how the BO is used even by the
CPU, let alone the GPU. But let's see where this leads you.

One thing that might help would be if we could swap individual memory
nodes between visible and invisible VRAM for CPU page faults, instead of
moving/evicting whole BOs. Christian, do you think something like that
would be possible?

Another idea (to avoid issues such as the recent one with Rocket League)
was to make VRAM CPU mappings write-only, and move the BO to GTT if
there's a read fault. But not sure if this is possible at all, or how
much effort it would be.

-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [REGRESSION] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo

2017-05-18 Thread Lukas Wunner

On Wed, May 17, 2017 at 11:08:23PM +0200, Nicolai Stange wrote:
> I'm experiencing a boot failure on next-20170515:
> 
>   BUG: unable to handle kernel NULL pointer dereference at 07cb
>   IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
[snip]
> Bisection lead to commit 7ffb0ce31cf9 ("drm/radeon: Don't register
> Thunderbolt eGPU with vga_switcheroo"). Reverting this commit on top of
> next-20170515 fixes the issue for me.
> 
> My box is a Dell laptop which most certainly hasn't got any Thunderbolt
> circuitry.

Thanks a lot Nicolai for reporting this, my apologies for the breakage
which turns out to be a dereference of rdev->pdev before it's set. :-(

14:   e8 e7 c0 0e 00  callq  0xec100 ; radeon_has_atpx()
19:   84 c0   test   %al,%al
1b:   74 9a   je 0xffb7
1d:   41 f7 c5 00 00 02 00test   $0x2,%r13d  ; flags & RADEON_IS_IGP
24:   75 91   jne0xffb7
26:   49 8b 44 24 10  mov0x10(%r12),%rax ; rax = rdev
2b:*  0f b6 90 cb 07 00 00movzbl 0x7cb(%rax),%edx <-- trapping 
instruction

Could you verify if the patch below fixes the issue for you?

Thanks!

Lukas

-- >8 --

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index e3e7cb1..4761f27 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
if ((radeon_runtime_pm != 0) &&
radeon_has_atpx() &&
((flags & RADEON_IS_IGP) == 0) &&
-   !pci_is_thunderbolt_attached(rdev->pdev))
+   !pci_is_thunderbolt_attached(dev->pdev))
flags |= RADEON_IS_PX;
 
/* radeon_device_init should report only fatal error
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd: include instead of "linux/delay.h"

2017-05-18 Thread Christian König


Am 18.05.2017 um 06:43 schrieb Masahiro Yamada:

Use <...> notation to include headers located in include/linux.
While we are here, tweak the includes order a bit to sort them
alphabetically.

Signed-off-by: Masahiro Yamada 


Reviewed-by: Christian König 


---

  drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c| 4 ++--
  drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c  | 2 +-
  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c   | 4 ++--
  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 5 +++--
  drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c   | 8 +---
  drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c  | 5 +++--
  6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index ff4ae3d..963a9e0 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -22,10 +22,10 @@
   */
  
  #include "pp_debug.h"

-#include "linux/delay.h"
-#include 
+#include 
  #include 
  #include 
+#include 
  #include 
  #include "cgs_common.h"
  #include "power_state.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
index f5e8fda..f6b4dd9 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
@@ -21,8 +21,8 @@
   *
   */
  
+#include 

  #include 
-#include "linux/delay.h"
  #include "hwmgr.h"
  #include "amd_acpi.h"
  #include "pp_acpi.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
index 8f663ab..581374d 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
@@ -21,11 +21,11 @@
   *
   */
  #include "pp_debug.h"
+#include 
+#include 
  #include 
  #include 
-#include 
  #include 
-#include "linux/delay.h"
  #include "pp_acpi.h"
  #include "ppatomctrl.h"
  #include "atombios.h"
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 8394955..f4ab81b 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -20,10 +20,11 @@
   * OTHER DEALINGS IN THE SOFTWARE.
   *
   */
+
+#include 
+#include 
  #include 
  #include 
-#include 
-#include "linux/delay.h"
  
  #include "hwmgr.h"

  #include "amd_powerplay.h"
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c
index 1f6744a..39c7091 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c
@@ -20,11 +20,13 @@
   * OTHER DEALINGS IN THE SOFTWARE.
   *
   */
-#include 
+
+#include 
+#include 
  #include 
  #include 
-#include 
-#include "linux/delay.h"
+#include 
+
  #include "cgs_common.h"
  #include "smu/smu_8_0_d.h"
  #include "smu/smu_8_0_sh_mask.h"
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c
index c0d7576..2e954a4 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c
@@ -20,15 +20,16 @@
   * OTHER DEALINGS IN THE SOFTWARE.
   *
   */
-#include 
+
+#include 
  #include 
  #include 
  #include 
+#include 
  #include 
  #include "pp_instance.h"
  #include "smumgr.h"
  #include "cgs_common.h"
-#include "linux/delay.h"
  
  MODULE_FIRMWARE("amdgpu/topaz_smc.bin");

  MODULE_FIRMWARE("amdgpu/topaz_k_smc.bin");



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

41 matches

Mail list logo