Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-12 Thread Christoph Hellwig
On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:
> On 10/04/18 21:59, Sinan Kaya wrote:
>> Code is expecing to observe the same number of buffers returned from
>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This
>> doesn't hold true universally especially for systems with IOMMU.
>
> So why not fix said code? It's clearly not a real hardware limitation, and 
> the map_sg() APIs have potentially returned fewer than nents since forever, 
> so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.

>> IOMMU driver tries to combine buffers into a single DMA address as much
>> as it can. The right thing is to tell the DMA layer how much combining
>> IOMMU can do.
>
> Disagree; this is a dodgy hack, since you'll now end up passing 
> scatterlists into dma_map_sg() which already violate max_seg_size to begin 
> with, and I think a conscientious DMA API implementation would be at rights 
> to fail the mapping for that reason (I know arm64 happens not to, but that 
> was a deliberate design decision to make my life easier at the time).

Agreed.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over

2018-04-12 Thread Michel Dänzer
On 2018-04-12 06:25 AM, Andrey Grodzovsky wrote:
> On 04/12/2018 12:16 AM, Alex Deucher wrote:
>> On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
>>  wrote:
>>> From: Alex Deucher 
>>>
>>> Steal 9 MB for vga emulation and fb if vga is enabled, otherwise,
>>> steal enough to cover the current display size as set by the vbios.
>>>
>>> If no memory is used (e.g., secondary or headless card), skip
>>> stolen memory reserve.
>>>
>>> Signed-off-by: Alex Deucher 
>> Looks like you used v1 rather than v2 of this patch:
>> https://patchwork.freedesktop.org/patch/215566/
>>
>> Alex
> 
> To much patches circling around, NP, will respin tomorrow.

FWIW, it might help if everybody kept the status of their patches up to
date on https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/ :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-12 Thread Michel Dänzer
On 2018-04-12 01:30 AM, Cyr, Aric wrote:
>> From: Michel Dänzer [mailto:mic...@daenzer.net]
>> Sent: Wednesday, April 11, 2018 05:50
>> On 2018-04-11 08:57 AM, Nicolai Hähnle wrote:
>>> On 10.04.2018 23:45, Cyr, Aric wrote:
 How does it work fine today given that all kernel seems to know is
 'current' or 'current+1' vsyncs.
 Presumably the applications somehow schedule all this just fine.
 If this works without variable refresh for 60Hz, will it not work for
 a fixed-rate "48Hz" monitor (assuming a 24Hz video)?
>>>
>>> You're right. I guess a better way to state the point is that it
>>> *doesn't* really work today with fixed refresh, but if we're going to
>>> introduce a new API, then why not do so in a way that can fix these
>>> additional problems as well?
>>
>> Exactly. With a fixed frame duration, we'll still have fundamentally the
>> same issues as we currently do without variable refresh, not making use
>> of the full potential of variable refresh.
> 
> I see.  Well then, that's makes this sort of orthogonal to the discussion.  
> If you say that there are no media players on Linux today that can maintain 
> audio/video sync with a 60Hz display, then that problem is much larger than 
> the one we're trying to solve here.  
> By the way, I don't believe that is a true statement :)

Indeed, that's not what we're saying:

With fixed refresh rate, audio/video sync cannot be maintained without
occasional visual artifacts, due to skipped / repeated frames.


>>> How about what I wrote in an earlier mail of having attributes:
>>>
>>> - target_present_time_ns
>>> - hint_frame_time_ns (optional)
>>>
>>> ... and if a video player set both, the driver could still do the
>>> optimizations you've explained?
>>
>> FWIW, I don't think a property would be a good mechanism for the target
>> presentation time.
>>
>> At least with VDPAU, video players are already explicitly specifying the
>> target presentation time, so no changes should be required at that
>> level. Don't know about other video APIs.
>>
>> The X11 Present extension protocol is also prepared for specifying the
>> target presentation time already, the support for it just needs to be
>> implemented.
> 
> I'm perfectly OK with presentation time-based *API*.  I get it from a user 
> mode/app perspective, and that's fine.  We need that feedback and would like 
> help defining that portions of the stack.
> However, I think it doesn't make as much sense as a *DDI* because it doesn't 
> correspond to any hardware real or logical (i.e. no one would implement it in 
> HW this way) and the industry specs aren't defined that way.

Which specs are you referring to? There are at least two specs (VDPAU
and VK_GOOGLE_display_timing) which are defined that way.

> You can have libdrm or some other usermode component translate your 
> presentation time into a frame duration and schedule it.

This cuts both ways.

> What's the advantage of having this in kernel besides the fact we lose the 
> intent of the application and could prevent features and optimizations.

To me, presentation time is much clearer as intent of the application.
It can express all the same things frame duration can, but not the other
way around.


> When it gets to kernel, I think it is much more elegant for the flip structure
> to contain a simple duration that says "hey, show this frame on the screen for
> this long".

A game cannot know this in advance, can it? Per the Croteam
presentation, it depends on when this frame is actually presented (among
other things).


>  1) We can simplify media players' lives by helping them get really, really 
> close to their content rate, so they wouldn't need any frame rate conversion. 
>  

At least with VDPAU, media players shouldn't need any changes at all, as
they're already explicitly specifying the presentation times.


>  They'll still need A/V syncing though, and variable refresh cannot solve 
> this

I've been trying to explain that it can, perfectly. Can you explain why
you think it can't, or ask if something isn't clear about what I've been
explaining?


> P.S. Thanks for the Croteam link.  Interesting, but basically nullified by 
> variable refresh rate displays.
According to whom / what? I don't see why it wouldn't apply to variable
refresh as well. Without time-based presentation, the game cannot
prevent a frame from being presented too early.

There is no doubt that the artifacts of not doing this properly will be
less noticeable with variable refresh, but that doesn't mean they don't
exist.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over

2018-04-12 Thread Huang Rui
On Thu, Apr 12, 2018 at 12:25:08PM +0800, Grodzovsky, Andrey wrote:
> 
> 
> On 04/12/2018 12:16 AM, Alex Deucher wrote:
> > On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
> >  wrote:
> >> From: Alex Deucher 
> >>
> >> Steal 9 MB for vga emulation and fb if vga is enabled, otherwise,
> >> steal enough to cover the current display size as set by the vbios.
> >>
> >> If no memory is used (e.g., secondary or headless card), skip
> >> stolen memory reserve.
> >>
> >> Signed-off-by: Alex Deucher 
> > Looks like you used v1 rather than v2 of this patch:
> > https://patchwork.freedesktop.org/patch/215566/
> >
> > Alex
> 
> To much patches circling around, NP, will respin tomorrow.
> 

Nevermind, guy. I will verify them once you send the patches tomorrow. :-)

Thanks,
Ray
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Lucas Stach
Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> > For dma_map_sg(), DMA API implementations are free to merge consecutive
> > segments into a single DMA mapping if conditions are suitable, thus the
> > resulting DMA addresses may be packed into fewer entries than
> > ttm->sg->nents implies.
> > 
> > drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> > its callers either have to reject the 0 < count < nents case or risk
> > getting bogus addresses back later. Fortunately this is relatively easy
> > to deal with having to rejig structures to also store the mapped count,
> > since the total DMA length should still be equal to the total buffer
> > length. All we need is a separate scatterlist cursor to iterate the DMA
> > addresses separately from the CPU addresses.
> 
> Mhm, I think I like Sinas approach better.
> 
> See the hardware actually needs the dma_address on a page by page basis.
> 
> Joining multiple consecutive pages into one entry is just additional 
> overhead which we don't need.

But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?
Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Christian König

Am 12.04.2018 um 11:11 schrieb Lucas Stach:

Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:

For dma_map_sg(), DMA API implementations are free to merge consecutive
segments into a single DMA mapping if conditions are suitable, thus the
resulting DMA addresses may be packed into fewer entries than
ttm->sg->nents implies.

drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
its callers either have to reject the 0 < count < nents case or risk
getting bogus addresses back later. Fortunately this is relatively easy
to deal with having to rejig structures to also store the mapped count,
since the total DMA length should still be equal to the total buffer
length. All we need is a separate scatterlist cursor to iterate the DMA
addresses separately from the CPU addresses.

Mhm, I think I like Sinas approach better.

See the hardware actually needs the dma_address on a page by page basis.

Joining multiple consecutive pages into one entry is just additional
overhead which we don't need.

But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?


No, I wouldn't like to use that. We're already using that :)

But we use huge pages by allocating consecutive chunks of memory so that 
both the CPU as well as the GPU can increase their TLB hit rate.


What currently happens is that the DMA subsystem tries to coalesce 
multiple pages into on SG entry and we de-coalesce that inside the 
driver again to create our random access array.


That is a huge waste of memory and CPU cycles and I actually wanted to 
get rid of it for quite some time now. Sinas approach seems to be a good 
step into the right direction to me to actually clean that up.



Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.


A way which preserves both path would be indeed nice to have, but that 
only allows for the TLB optimization for the GPU and not the CPU any 
more. So I actually see that as really minor use case.


Regards,
Christian.



Regards,
Lucas


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-12 Thread Christian König

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

On 10/04/18 21:59, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

So why not fix said code? It's clearly not a real hardware limitation, and
the map_sg() APIs have potentially returned fewer than nents since forever,
so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.


I agree that the common DRM functions should be able to deal with both, 
but we should let the driver side decide if it wants multiple addresses 
combined or not.





IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing
scatterlists into dma_map_sg() which already violate max_seg_size to begin
with, and I think a conscientious DMA API implementation would be at rights
to fail the mapping for that reason (I know arm64 happens not to, but that
was a deliberate design decision to make my life easier at the time).

Agreed.


Sounds like my understanding of max_seg_size is incorrect, what exactly 
should that describe?


Thanks,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Lucas Stach
Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
> > Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
> > > Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> > > > For dma_map_sg(), DMA API implementations are free to merge consecutive
> > > > segments into a single DMA mapping if conditions are suitable, thus the
> > > > resulting DMA addresses may be packed into fewer entries than
> > > > ttm->sg->nents implies.
> > > > 
> > > > drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> > > > its callers either have to reject the 0 < count < nents case or risk
> > > > getting bogus addresses back later. Fortunately this is relatively easy
> > > > to deal with having to rejig structures to also store the mapped count,
> > > > since the total DMA length should still be equal to the total buffer
> > > > length. All we need is a separate scatterlist cursor to iterate the DMA
> > > > addresses separately from the CPU addresses.
> > > 
> > > Mhm, I think I like Sinas approach better.
> > > 
> > > See the hardware actually needs the dma_address on a page by page basis.
> > > 
> > > Joining multiple consecutive pages into one entry is just additional
> > > overhead which we don't need.
> > 
> > But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
> > be in front of your GPU to map large pages as such, causing additional
> > overhead by means of additional TLB misses and page walks on the IOMMU
> > side.
> > 
> > And wouldn't you like to use huge pages at the GPU side, if the IOMMU
> > already provides you the service of producing one large consecutive
> > address range, rather than mapping them via a number of small pages?
> 
> No, I wouldn't like to use that. We're already using that :)
> 
> But we use huge pages by allocating consecutive chunks of memory so that 
> both the CPU as well as the GPU can increase their TLB hit rate.

I'm not convinced that this is a universal win. Many GPU buffers aren't
accessed by the CPU and allocating huge pages puts much more strain on
the kernel MM subsystem.

> What currently happens is that the DMA subsystem tries to coalesce 
> multiple pages into on SG entry and we de-coalesce that inside the 
> driver again to create our random access array.

I guess the right thing would be to have a flag that tells the the DMA
implementation to not coalesce the entries. But (ab-)using max segment
size tells the DMA API to split the segments if they are larger than
the given size, which is probably not what you want either as you now
need to coalesce the segments again when you are mapping real huge
pages.

> That is a huge waste of memory and CPU cycles and I actually wanted to 
> get rid of it for quite some time now. Sinas approach seems to be a good 
> step into the right direction to me to actually clean that up.
> 
> > Detecting such a condition is much easier if the DMA map implementation
> > gives you the coalesced scatter entries.
> 
> A way which preserves both path would be indeed nice to have, but that 
> only allows for the TLB optimization for the GPU and not the CPU any 
> more. So I actually see that as really minor use case.

Some of the Tegras have much larger TLBs and better page-walk
performance on the system IOMMU compared to the GPU MMU, so they would
probably benefit a good deal even if the hugepage optimization only
targets the GPU side.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/11] drm/ttm: add ttm process struct

2018-04-12 Thread Chunming Zhou
Change-Id: I34924a40392653e72f143c30ab312cbbf9fa
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 23 +++
 include/drm/ttm/ttm_bo_api.h|  1 +
 include/drm/ttm/ttm_bo_driver.h | 10 ++
 3 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 98e06f8bf23b..b740d8f390ca 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1556,6 +1556,7 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
0x1000);
INIT_DELAYED_WORK(&bdev->wq, ttm_bo_delayed_workqueue);
INIT_LIST_HEAD(&bdev->ddestroy);
+   INIT_LIST_HEAD(&bdev->process_list);
bdev->dev_mapping = mapping;
bdev->glob = glob;
bdev->need_dma32 = need_dma32;
@@ -1569,6 +1570,28 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 }
 EXPORT_SYMBOL(ttm_bo_device_init);
 
+int ttm_process_init(struct ttm_process *process, struct ttm_bo_device *bdev,
+struct reservation_object *resv)
+{
+   int i, j;
+
+   INIT_LIST_HEAD(&process->process_list);
+   for (i = 0; i < TTM_NUM_MEM_TYPES; i++) {
+   for (j = 0; j < TTM_MAX_BO_PRIORITY; j++) {
+   INIT_LIST_HEAD(&process->fixed_lru[i][j]);
+   INIT_LIST_HEAD(&process->dynamic_lru[i][j]);
+   }
+   }
+   spin_lock(&bdev->glob->lru_lock);
+   list_add_tail(&process->process_list, &bdev->process_list);
+   spin_unlock(&bdev->glob->lru_lock);
+
+   process->resv = resv;
+
+   return 0;
+}
+EXPORT_SYMBOL(ttm_process_init);
+
 /*
  * buffer object vm functions.
  */
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index c67977aa1a0e..8cb4b48f387a 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -172,6 +172,7 @@ struct ttm_buffer_object {
 */
 
struct ttm_bo_device *bdev;
+   struct ttm_process *process;
enum ttm_bo_type type;
void (*destroy) (struct ttm_buffer_object *);
unsigned long num_pages;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 3234cc322e70..91120923de81 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -438,6 +438,13 @@ struct ttm_bo_global {
 
 #define TTM_NUM_MEM_TYPES 8
 
+struct ttm_process {
+   struct list_head process_list;
+   struct list_head fixed_lru[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
+   struct list_head dynamic_lru[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
+   struct reservation_object *resv;
+};
+
 /**
  * struct ttm_bo_device - Buffer object driver device-specific data.
  *
@@ -459,6 +466,7 @@ struct ttm_bo_device {
 * Constant after bo device init / atomic.
 */
struct list_head device_list;
+   struct list_head process_list;
struct ttm_bo_global *glob;
struct ttm_bo_driver *driver;
struct ttm_mem_type_manager man[TTM_NUM_MEM_TYPES];
@@ -575,6 +583,8 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, struct 
ttm_bo_global *glob,
   struct address_space *mapping,
   uint64_t file_page_offset, bool need_dma32);
 
+int ttm_process_init(struct ttm_process *process, struct ttm_bo_device *bdev,
+struct reservation_object *resv);
 /**
  * ttm_bo_unmap_virtual
  *
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 00/11] ***[WIP] TTM per process lru ***

2018-04-12 Thread Chunming Zhou
Since per-process-bo feature is introduced, old lru isn't working for it.
old lru order is depending on BO list order, which will be updated by bo
list after every command submission.
But for per-process-bo, which aren't in bo list, so it have no chance to
refresh its order in lru. Which also will resulit in unstable performance for
application.
per-process-bo means they will be used by every this process, and validated
automatically. Their order should be fixed.

Above reason, we introduce a per-process-lru instead of old lru.
like struct definition:
+struct ttm_process {
+   struct list_head process_list;
+   struct rb_root fixed_lru[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
+   struct list_head dynamic_lru[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
+   struct reservation_object *resv;
+   atomic64_t bo_index;
+};
process_list is a list node to add bdev->proces_list.
fixed_lru is to store per-process-bo.
dyanmic_lru is to store non per-process-bo.
resvation is this process pd root bo resvation.
bo_index is to counter bo index, every bo beloning to this process will get an 
index.

Last, the patch set is working in progress, so I don't reorganize and clean up 
them yet.
Sending them to community is to see if any other concern for this new machanism.

Tested result:
the solution solves my issue F1 game performance isn't stable, and not 
introduce extra cpu overhead, the Talos game can prove it.


TODO:
1. patch organization.
2. some corner case need to handle(fini, bo destroy, force_clean)
3. clean up.
4. more palces can be improved if this machanism is accepted, like per-vm-lru 
don't need move at all if not destroy, maybe add a flag for evictalbe.
5. vm->evicted list in amdgpu can also be replaced by RB tree, which can make 
sure validation order. 

Thanks,
David Zhou

Chunming Zhou (11):
  drm/ttm: add ttm process struct
  drm/amdgpu: use ttm process in amdgpu vm
  drm/amdgpu: add kernel process
  drm/amdgpu: pass process to tbo
  drm/ttm: add per process lru
  drm/amdgpu: pass ttm process to buffer object
  drm/ttm: use RB tree instead of link list
  drm/ttm: add bo index
  drm/amdgpu: counter for every bo creation
  drm/ttm: some good fixes for per-vm-lru
  drm/ttm: bulk move per vm bo

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  24 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 169 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   2 +
 drivers/gpu/drm/ttm/ttm_bo.c   | 134 ---
 include/drm/ttm/ttm_bo_api.h   |   4 +-
 include/drm/ttm/ttm_bo_driver.h|  14 ++-
 11 files changed, 336 insertions(+), 29 deletions(-)

-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 02/11] drm/amdgpu: use ttm process in amdgpu vm

2018-04-12 Thread Chunming Zhou
Change-Id: I2cf802e641d8b2cdb2bf8bdf1957f3f4f27afaba
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index da55a78d7380..6ef449ea8d07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2429,6 +2429,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
list_add_tail(&vm->root.base.vm_status, &vm->evicted);
amdgpu_bo_unreserve(vm->root.base.bo);
 
+   ttm_process_init(&vm->ttm_vm, &adev->mman.bdev,
+vm->root.base.bo->tbo.resv);
if (pasid) {
unsigned long flags;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 30f080364c97..61b89642fa3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
@@ -166,6 +167,7 @@ struct amdgpu_vm_pt {
 struct amdgpu_vm {
/* tree of virtual addresses mapped */
struct rb_root_cached   va;
+   struct ttm_process  ttm_vm;
 
/* protecting invalidated */
spinlock_t  status_lock;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 03/11] drm/amdgpu: add kernel process

2018-04-12 Thread Chunming Zhou
Change-Id: Ie43f3c73cc65526a449208f3ce927b1dfad5cf6b
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2babfad1fd7f..4b66585a8638 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1629,6 +1629,8 @@ struct amdgpu_device {
unsigned long last_mm_index;
boolin_gpu_reset;
struct mutex  lock_reset;
+
+   struct ttm_process  kernel_process;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 205da3ff9cd0..4c9e10505e2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1421,6 +1421,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}
adev->mman.initialized = true;
+   ttm_process_init(&adev->kernel_process, &adev->mman.bdev, NULL);
 
/* We opt to avoid OOM on system pages allocations */
adev->mman.bdev.no_retry = true;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/11] drm/amdgpu: pass process to tbo

2018-04-12 Thread Chunming Zhou
Change-Id: I4de8146567b858ae07a8a27cadf71d13d490e8ac
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 7 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 28c2706e48d7..37b19ce97699 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -195,6 +195,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
uint64_t size = args->in.bo_size;
struct reservation_object *resv = NULL;
struct drm_gem_object *gobj;
+   struct amdgpu_bo *abo;
uint32_t handle;
int r;
 
@@ -243,10 +244,12 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
 (u32)(0x & args->in.domains),
 flags, false, resv, &gobj);
+   if (!r) {
+   abo = gem_to_amdgpu_bo(gobj);
+   abo->tbo.process = &vm->ttm_vm;
+   }
if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
if (!r) {
-   struct amdgpu_bo *abo = gem_to_amdgpu_bo(gobj);
-
abo->parent = amdgpu_bo_ref(vm->root.base.bo);
}
amdgpu_bo_unreserve(vm->root.base.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 9e23d6f6f3f3..e911db2d1945 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -447,8 +447,10 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, 
unsigned long size,
else
amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved, 0);
 
-   if (type == ttm_bo_type_kernel)
+   if (type == ttm_bo_type_kernel) {
bo->tbo.priority = 1;
+   bo->tbo.process = &adev->kernel_process;
+   }
 
if (flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
bo->tbo.mem.placement & TTM_PL_FLAG_VRAM) {
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 05/11] drm/ttm: add per process lru

2018-04-12 Thread Chunming Zhou
Change-Id: Id2333f69119222a7e9bdb0357bbed97cf08636da
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 59 ++---
 include/drm/ttm/ttm_bo_driver.h |  3 ++-
 2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index b740d8f390ca..c1d0ec1238c6 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -161,15 +161,20 @@ void ttm_bo_add_to_lru(struct ttm_buffer_object *bo)
 {
struct ttm_bo_device *bdev = bo->bdev;
struct ttm_mem_type_manager *man;
+   struct ttm_process *ttm_process = bo->process;
 
reservation_object_assert_held(bo->resv);
 
if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
BUG_ON(!list_empty(&bo->lru));
 
-   man = &bdev->man[bo->mem.mem_type];
-   list_add_tail(&bo->lru, &man->lru[bo->priority]);
kref_get(&bo->list_kref);
+   if (bo->resv == ttm_process->resv)
+   list_add_tail(&bo->lru,
+ 
&ttm_process->fixed_lru[bo->mem.mem_type][bo->priority]);
+   else
+   list_add_tail(&bo->lru,
+ 
&ttm_process->dynamic_lru[bo->mem.mem_type][bo->priority]);
 
if (bo->ttm && !(bo->ttm->page_flags &
 (TTM_PAGE_FLAG_SG | TTM_PAGE_FLAG_SWAPPED))) {
@@ -712,13 +717,35 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
struct ttm_bo_global *glob = bdev->glob;
struct ttm_mem_type_manager *man = &bdev->man[mem_type];
struct ttm_buffer_object *bo = NULL;
+   struct ttm_process *process;
bool locked = false;
unsigned i;
int ret;
 
spin_lock(&glob->lru_lock);
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
-   list_for_each_entry(bo, &man->lru[i], lru) {
+   list_for_each_entry(process, &bdev->process_list, process_list) {
+   list_for_each_entry(bo, &process->dynamic_lru[mem_type][i], 
lru) {
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked))
+   continue;
+
+   if (place && !bdev->driver->eviction_valuable(bo,
+ place)) {
+   if (locked)
+   reservation_object_unlock(bo->resv);
+   continue;
+   }
+   break;
+   }
+   /* If the inner loop terminated early, we have our candidate */
+   if (&bo->lru != &process->dynamic_lru[mem_type][i])
+   break;
+   bo = NULL;
+   list_for_each_entry(bo, &process->fixed_lru[mem_type][i], lru) {
+   if (!bo)
+   continue;
+   if (&bo->lru == &process->fixed_lru[mem_type][i])
+   break;
if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked))
continue;
 
@@ -732,11 +759,14 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
}
 
/* If the inner loop terminated early, we have our candidate */
-   if (&bo->lru != &man->lru[i])
+   if (&bo->lru != &process->fixed_lru[mem_type][i])
break;
 
bo = NULL;
}
+   if (bo)
+   break;
+   }
 
if (!bo) {
spin_unlock(&glob->lru_lock);
@@ -1318,13 +1348,13 @@ static int ttm_bo_force_list_clean(struct ttm_bo_device 
*bdev,
 
spin_lock(&glob->lru_lock);
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
-   while (!list_empty(&man->lru[i])) {
+   //while (!list_empty(&man->lru[i])) {
spin_unlock(&glob->lru_lock);
ret = ttm_mem_evict_first(bdev, mem_type, NULL, &ctx);
if (ret)
return ret;
spin_lock(&glob->lru_lock);
-   }
+   //}
}
spin_unlock(&glob->lru_lock);
 
@@ -1427,9 +1457,10 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned 
type,
man->has_type = true;
man->use_type = true;
man->size = p_size;
-
+/*
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
INIT_LIST_HEAD(&man->lru[i]);
+*/
man->move = NULL;
 
return 0;
@@ -1518,13 +1549,13 @@ int ttm_bo_device_release(struct ttm_bo_device *bdev)
 
if (ttm_bo_delayed_delete(bdev, true))
pr_debug("Delayed destroy list was clean\n");
-
+/*
spin_lock(&glob->lru_lock);
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
if (list

[PATCH 07/11] drm/ttm: use RB tree instead of link list

2018-04-12 Thread Chunming Zhou
Change-Id: I0f533c6512f3b72fcf2fbf11d738f38d9e087f26
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 39 +++
 include/drm/ttm/ttm_bo_api.h|  3 ++-
 include/drm/ttm/ttm_bo_driver.h |  2 +-
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c1d0ec1238c6..73343d1108d2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -157,6 +157,26 @@ static void ttm_bo_release_list(struct kref *list_kref)
ttm_mem_global_free(bdev->glob->mem_glob, acc_size);
 }
 
+static void ttm_bo_add_to_rb(struct ttm_buffer_object *bo,
+struct rb_root *root) {
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+while (*new) {
+   struct ttm_buffer_object *this =
+   container_of(*new, struct ttm_buffer_object, node);
+   int result = bo->index - this->index;
+
+   parent = *new;
+   if (result < 0)
+   new = &((*new)->rb_left);
+   else if (result > 0)
+   new = &((*new)->rb_right);
+   else
+   return;
+   }
+   rb_link_node(&bo->node, parent, new);
+   rb_insert_color(&bo->node, root);
+}
 void ttm_bo_add_to_lru(struct ttm_buffer_object *bo)
 {
struct ttm_bo_device *bdev = bo->bdev;
@@ -170,7 +190,7 @@ void ttm_bo_add_to_lru(struct ttm_buffer_object *bo)
 
kref_get(&bo->list_kref);
if (bo->resv == ttm_process->resv)
-   list_add_tail(&bo->lru,
+   ttm_bo_add_to_rb(bo,
  
&ttm_process->fixed_lru[bo->mem.mem_type][bo->priority]);
else
list_add_tail(&bo->lru,
@@ -200,8 +220,12 @@ void ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
if (!list_empty(&bo->lru)) {
list_del_init(&bo->lru);
kref_put(&bo->list_kref, ttm_bo_ref_bug);
+   } else if (RB_EMPTY_NODE(&bo->node)) {
+   rb_erase(&bo->node,
+
&bo->process->fixed_lru[bo->mem.mem_type][bo->priority]);
+   RB_CLEAR_NODE(&bo->node);
+   kref_put(&bo->list_kref, ttm_bo_ref_bug);
}
-
/*
 * TODO: Add a driver hook to delete from
 * driver-specific LRU's here.
@@ -725,6 +749,7 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
spin_lock(&glob->lru_lock);
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
list_for_each_entry(process, &bdev->process_list, process_list) {
+   struct rb_node *node;
list_for_each_entry(bo, &process->dynamic_lru[mem_type][i], 
lru) {
if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked))
continue;
@@ -741,11 +766,9 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
if (&bo->lru != &process->dynamic_lru[mem_type][i])
break;
bo = NULL;
-   list_for_each_entry(bo, &process->fixed_lru[mem_type][i], lru) {
-   if (!bo)
-   continue;
-   if (&bo->lru == &process->fixed_lru[mem_type][i])
-   break;
+   for (node = rb_first(&process->fixed_lru[mem_type][i]); node;
+node = rb_next(node)) {
+   bo = rb_entry(node, struct ttm_buffer_object, node);
if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked))
continue;
 
@@ -1609,7 +1632,7 @@ int ttm_process_init(struct ttm_process *process, struct 
ttm_bo_device *bdev,
INIT_LIST_HEAD(&process->process_list);
for (i = 0; i < TTM_NUM_MEM_TYPES; i++) {
for (j = 0; j < TTM_MAX_BO_PRIORITY; j++) {
-   INIT_LIST_HEAD(&process->fixed_lru[i][j]);
+   process->fixed_lru[i][j] = RB_ROOT;
INIT_LIST_HEAD(&process->dynamic_lru[i][j]);
}
}
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 8cb4b48f387a..9c8179bdd685 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -170,7 +170,7 @@ struct ttm_buffer_object {
/**
 * Members constant at init.
 */
-
+   struct rb_node node;
struct ttm_bo_device *bdev;
struct ttm_process *process;
enum ttm_bo_type type;
@@ -232,6 +232,7 @@ struct ttm_buffer_object {
struct reservation_object *resv;
struct reservation_object ttm_resv;
struct mutex wu_mutex;
+   u64 index;
 };
 
 /**
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index b6aa7fc5bf14..818aee15d1ec 100644
---

[PATCH 06/11] drm/amdgpu: pass ttm process to buffer object

2018-04-12 Thread Chunming Zhou
Change-Id: Ifb0dc95db6a358cf7f76e2a99f94c58637ad6ee6
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  17 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 170 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   1 +
 7 files changed, 187 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4b66585a8638..f398a566f57b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -443,7 +443,8 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
 int alignment, u32 initial_domain,
 u64 flags, enum ttm_bo_type type,
 struct reservation_object *resv,
-struct drm_gem_object **obj);
+struct drm_gem_object **obj,
+struct ttm_process *process);
 
 int amdgpu_mode_dumb_create(struct drm_file *file_priv,
struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index ff89e84b34ce..1cee2125f570 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -146,7 +146,7 @@ static int amdgpufb_create_pinned_object(struct 
amdgpu_fbdev *rfbdev,
   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
   AMDGPU_GEM_CREATE_VRAM_CLEARED,
-  true, NULL, &gobj);
+  true, NULL, &gobj, 
&adev->kernel_process);
if (ret) {
pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 37b19ce97699..d5cbdc810aba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -45,7 +45,8 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
 int alignment, u32 initial_domain,
 u64 flags, enum ttm_bo_type type,
 struct reservation_object *resv,
-struct drm_gem_object **obj)
+struct drm_gem_object **obj,
+struct ttm_process *process)
 {
struct amdgpu_bo *bo;
int r;
@@ -56,8 +57,8 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
alignment = PAGE_SIZE;
}
 
-   r = amdgpu_bo_create(adev, size, alignment, initial_domain,
-flags, type, resv, &bo);
+   r = amdgpu_bo_create1(adev, size, alignment, initial_domain,
+ flags, type, resv, &bo, process);
if (r) {
DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, %d)\n",
  size, initial_domain, alignment, r);
@@ -243,7 +244,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
 
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
 (u32)(0x & args->in.domains),
-flags, false, resv, &gobj);
+flags, false, resv, &gobj, &vm->ttm_vm);
if (!r) {
abo = gem_to_amdgpu_bo(gobj);
abo->tbo.process = &vm->ttm_vm;
@@ -273,6 +274,8 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
 {
struct ttm_operation_ctx ctx = { true, false };
struct amdgpu_device *adev = dev->dev_private;
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   struct amdgpu_vm *vm = &fpriv->vm;
struct drm_amdgpu_gem_userptr *args = data;
struct drm_gem_object *gobj;
struct amdgpu_bo *bo;
@@ -297,7 +300,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
 
/* create a gem object to contain this object in */
r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU,
-0, 0, NULL, &gobj);
+0, 0, NULL, &gobj, &vm->ttm_vm);
if (r)
return r;
 
@@ -735,6 +738,8 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
struct drm_mode_create_dumb *args)
 {
struct amdgpu_device *adev = dev->dev_private;
+   struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
+  

[PATCH 08/11] drm/ttm: add bo index

2018-04-12 Thread Chunming Zhou
Change-Id: I4abf5cf0aaf946162dabd08fc1fd0406c2abf418
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 1 +
 include/drm/ttm/ttm_bo_driver.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 73343d1108d2..d56312702b49 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1641,6 +1641,7 @@ int ttm_process_init(struct ttm_process *process, struct 
ttm_bo_device *bdev,
spin_unlock(&bdev->glob->lru_lock);
 
process->resv = resv;
+   atomic64_set(&process->bo_index, 0);
 
return 0;
 }
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 818aee15d1ec..4c42df6afcfe 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -443,6 +443,7 @@ struct ttm_process {
struct rb_root fixed_lru[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
struct list_head dynamic_lru[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
struct reservation_object *resv;
+   atomic64_t bo_index;
 };
 
 /**
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 09/11] drm/amdgpu: counter for every bo creation

2018-04-12 Thread Chunming Zhou
Change-Id: I877b2d5c54b3e47b66f2d58454dcf1e5f5a68972
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 7bb6ee777067..d54abba4e017 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -622,8 +622,10 @@ static int amdgpu_bo_do_create1(struct amdgpu_device 
*adev, unsigned long size,
if (type == ttm_bo_type_kernel)
bo->tbo.priority = 1;
bo->tbo.process = process;
+   bo->tbo.index = (u64)atomic64_inc_return(&process->bo_index);
 
bo->tbo.bdev = &adev->mman.bdev;
+   RB_CLEAR_NODE(&bo->tbo.node);
amdgpu_ttm_placement_from_domain(bo, domains);
r = ttm_bo_init_reserved(&adev->mman.bdev, &bo->tbo, size, type,
 &bo->placement, page_align, &ctx, acc_size,
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 10/11] drm/ttm: some good fixes for per-vm-lru

2018-04-12 Thread Chunming Zhou
Change-Id: Ib68bff91fd127162cf8c72516101546e1fe014df
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  1 -
 drivers/gpu/drm/ttm/ttm_bo.c   | 39 --
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d54abba4e017..ea95c2e9a858 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -625,7 +625,6 @@ static int amdgpu_bo_do_create1(struct amdgpu_device *adev, 
unsigned long size,
bo->tbo.index = (u64)atomic64_inc_return(&process->bo_index);
 
bo->tbo.bdev = &adev->mman.bdev;
-   RB_CLEAR_NODE(&bo->tbo.node);
amdgpu_ttm_placement_from_domain(bo, domains);
r = ttm_bo_init_reserved(&adev->mman.bdev, &bo->tbo, size, type,
 &bo->placement, page_align, &ctx, acc_size,
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index d56312702b49..dc9545eeb5d6 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -164,15 +164,12 @@ static void ttm_bo_add_to_rb(struct ttm_buffer_object *bo,
 while (*new) {
struct ttm_buffer_object *this =
container_of(*new, struct ttm_buffer_object, node);
-   int result = bo->index - this->index;
 
parent = *new;
-   if (result < 0)
+   if (bo->index < this->index)
new = &((*new)->rb_left);
-   else if (result > 0)
-   new = &((*new)->rb_right);
else
-   return;
+   new = &((*new)->rb_right);
}
rb_link_node(&bo->node, parent, new);
rb_insert_color(&bo->node, root);
@@ -211,6 +208,24 @@ static void ttm_bo_ref_bug(struct kref *list_kref)
BUG();
 }
 
+static struct ttm_buffer_object *ttm_bo_rb_find(struct rb_root *root, u64 
index)
+{
+   struct rb_node *node = root->rb_node;
+
+   while (node) {
+   struct ttm_buffer_object *bo =
+   container_of(node, struct ttm_buffer_object, node);
+
+   if (index < bo->index)
+   node = node->rb_left;
+   else if (index > bo->index)
+   node = node->rb_right;
+   else
+   return bo;
+   }
+
+   return NULL;
+}
 void ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
 {
if (!list_empty(&bo->swap)) {
@@ -220,10 +235,10 @@ void ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
if (!list_empty(&bo->lru)) {
list_del_init(&bo->lru);
kref_put(&bo->list_kref, ttm_bo_ref_bug);
-   } else if (RB_EMPTY_NODE(&bo->node)) {
+   } else if
+   
(ttm_bo_rb_find(&bo->process->fixed_lru[bo->mem.mem_type][bo->priority], 
bo->index)) {
rb_erase(&bo->node,
 
&bo->process->fixed_lru[bo->mem.mem_type][bo->priority]);
-   RB_CLEAR_NODE(&bo->node);
kref_put(&bo->list_kref, ttm_bo_ref_bug);
}
/*
@@ -769,23 +784,21 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
for (node = rb_first(&process->fixed_lru[mem_type][i]); node;
 node = rb_next(node)) {
bo = rb_entry(node, struct ttm_buffer_object, node);
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked))
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked)) {
+   bo = NULL;
continue;
+   }
 
if (place && !bdev->driver->eviction_valuable(bo,
  place)) {
if (locked)
reservation_object_unlock(bo->resv);
+   bo = NULL;
continue;
}
break;
}
 
-   /* If the inner loop terminated early, we have our candidate */
-   if (&bo->lru != &process->fixed_lru[mem_type][i])
-   break;
-
-   bo = NULL;
}
if (bo)
break;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 11/11] drm/ttm: bulk move per vm bo

2018-04-12 Thread Chunming Zhou
Change-Id: I5b6afbdd715e28e5266b5099ca9a34399d1fc3a1
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index dc9545eeb5d6..d6e7c835f4c1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -261,8 +261,13 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo)
 {
reservation_object_assert_held(bo->resv);
 
-   ttm_bo_del_from_lru(bo);
-   ttm_bo_add_to_lru(bo);
+   if (bo->resv == bo->process->resv) {
+   list_move_tail(&bo->process->process_list,
+  &bo->bdev->process_list);
+   } else {
+   ttm_bo_del_from_lru(bo);
+   ttm_bo_add_to_lru(bo);
+   }
 }
 EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-12 Thread Emily Deng
issue:
there are VMC page fault occurred if force APP kill during
3dmark test, the cause is in entity_fini we manually signal
all those jobs in entity's queue which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on
shadow buffer, and shadow buffer's Gart table is cleaned by
ttm_bo_release since the fence in its reservation was fake signaled
by entity_fini() under the case of SIGKILL received.

2)page fault occurred in gfx' job because during the lifetime
of gfx job we manually fake signal all jobs from its entity
in entity_fini(), thus the unmapping/clear PTE job depend on those
result fence is satisfied and sdma start clearing the PTE and lead
to GFX page fault.

fix:
1)should at least wait all jobs already scheduled complete in entity_fini()
if SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency, should
set this entity guilty to prevent its job really run since the dependency
is fake signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the
runqueue and returns an error when the process was killed.
2)The second one then goes over the entity, install it as
completion signal for the remaining jobs and signals all jobs
with an error code.

v3:
1)Replace the fini1 and fini2 with better name
2)Call the first part before the VM teardown in
amdgpu_driver_postclose_kms() and the second part
after the VM teardown
3)Keep the original function drm_sched_entity_fini to
refine the code.

Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 74 ++-
 include/drm/gpu_scheduler.h   |  7 +++
 5 files changed, 131 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2babfad..200db73 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx, unsigned ring_id);
 
 void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_cleanup(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
 void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 09d35051..659add4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -111,8 +111,9 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
return r;
 }
 
-static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
+static void amdgpu_ctx_fini(struct kref *ref)
 {
+   struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx, refcount);
struct amdgpu_device *adev = ctx->adev;
unsigned i, j;
 
@@ -125,13 +126,11 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
kfree(ctx->fences);
ctx->fences = NULL;
 
-   for (i = 0; i < adev->num_rings; i++)
-   drm_sched_entity_fini(&adev->rings[i]->sched,
- &ctx->rings[i].entity);
-
amdgpu_queue_mgr_fini(adev, &ctx->queue_mgr);
 
mutex_destroy(&ctx->lock);
+
+   kfree(ctx);
 }
 
 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
@@ -170,12 +169,15 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
 static void amdgpu_ctx_do_release(struct kref *ref)
 {
struct amdgpu_ctx *ctx;
+   u32 i;
 
ctx = container_of(ref, struct amdgpu_ctx, refcount);
 
-   amdgpu_ctx_fini(ctx);
+   for (i = 0; i < ctx->adev->num_rings; i++)
+   drm_sched_entity_fini(&ctx->adev->rings[i]->sched,
+   &ctx->rings[i].entity);
 
-   kfree(ctx);
+   amdgpu_ctx_fini(ref);
 }
 
 static int amdgpu_ctx_free(struct amdgpu_fpriv *fpriv, uint32_t id)
@@ -435,16 +437,62 @@ void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr)
idr_init(&mgr->ctx_handles);
 }
 
+void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
+{
+   struct amdgpu_ctx *ctx;
+   struct idr *idp;
+   uint32_t id, i;
+
+   idp = &mgr->ctx_handles;
+
+   idr_for_each_entry(idp, ctx, id) {
+
+   if (!ctx->adev)
+   return;
+
+   for (i = 0; i < ctx->adev->num_rings; i++)
+   if (kref_read(&ctx->refcount) == 1)
+   
drm_sched_entity_do_release(&ctx->adev->rings[i]->sched,
+ &ctx->rings[i].entity);
+   else
+   DRM_ERROR("ctx %p is st

Re: [PATCH xf86-video-amdgpu 3/5] Keep CRTC properties consistent

2018-04-12 Thread Michel Dänzer
On 2018-04-11 11:26 PM, Leo Li wrote:
> On 2018-04-11 04:39 AM, Michel Dänzer wrote:
>> 
>> Hmm. So either legacy or non-legacy clients won't work at all, or 
>> they'll step on each other's toes, clobbering the HW gamma LUT from
>> each other.
>> 
>> I'm afraid neither of those alternatives sound like a good user 
>> experience to me.
>> 
>> Consider on the one hand something like Night Light / redshift,
>> using legacy APIs to adjust colour temperature to the time of day.
>> On the other hand, another client using the non-legacy API for say
>> fine-tuning of a display's advanced gamma capabilities.
>> 
>> Ideally, in this case the gamma LUTs from the legacy and non-legacy
>> APIs should be combined, such that the hardware LUT reflects both
>> the colour temperature set by Night Light / refshift and the
>> fine-tuning set by the non-legacy client. Is that feasible? If not,
>> can you explain a little why?
> 
> Hmm, combining LUTs could be feasible, but I don't think that's the 
> right approach.
> 
> LUTs can be combined through composition h(x) = f(g(x)), with some 
> interpolation involved. Once combined, it can be set using the 
> non-legacy API, since that'll likely have a larger LUT size.
> 
> But I think what you've mentioned raises a bigger issue of color 
> management conflicts in general. It doesn't have to be redshift vs 
> monitor correction, what if there more than 2 applications contending
> to set gamma? xrandr's brightness already conflicts with redshift,
> and so does some apps running on WINE.

What you're describing is an X11 design flaw, which we cannot do
anything about in the DDX driver.

What I want to avoid is repeating a similar situation as we had before
xserver 1.19, where one could have either working per-window colormaps
and global gamma, or per-CRTC gamma via RandR, not all at the same
time. (Before xserver 1.7, they would clobber the HW LUT from each
other) I fixed this in
https://cgit.freedesktop.org/xorg/xserver/commit/?id=b4e46c0444bb09f4af59d9d13acc939a0fbbc6d6
by composing the LUTs.


> For the small example at hand, the ideal solution is to have
> redshift use the color transformation matrix (CTM), which will be
> exposed as part of the non-legacy color API. It'll need modification
> of redshift, but at least it won't conflict with anything gamma
> related.

From past experience, it can take a long time (up to forever) for some
clients to be updated like this. E.g. it's unlikely at this point that
libraries such as SDL1 will ever be updated for the new API, so I'm
afraid users would run into things like
https://bugs.freedesktop.org/show_bug.cgi?id=27222 again.

(Besides, it would conflict with anything else using the same API, as
you described above, so it's not even obviously the ideal solution)


> Jumping back on some patch 1 topics:
> 
> Are there ways to get arbitrarily (no more than 4096 elements) sized 
> arrays from a client, to the DDX driver?

Seems like the RRChangeOutputProperty request could work.


> I agree, it would definitely be nicer for clients to not worry about
> DRM blobs at all.

Indeed. E.g. as a bonus, it would allow this to work even with a remote
display connection.


I'm holding off on the more detailed discussion about the other patches
until there is a plan for this fundamental issue.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Christian König

Am 12.04.2018 um 11:49 schrieb Lucas Stach:

Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:

Am 12.04.2018 um 11:11 schrieb Lucas Stach:

Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:

For dma_map_sg(), DMA API implementations are free to merge consecutive
segments into a single DMA mapping if conditions are suitable, thus the
resulting DMA addresses may be packed into fewer entries than
ttm->sg->nents implies.

drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
its callers either have to reject the 0 < count < nents case or risk
getting bogus addresses back later. Fortunately this is relatively easy
to deal with having to rejig structures to also store the mapped count,
since the total DMA length should still be equal to the total buffer
length. All we need is a separate scatterlist cursor to iterate the DMA
addresses separately from the CPU addresses.

Mhm, I think I like Sinas approach better.

See the hardware actually needs the dma_address on a page by page basis.

Joining multiple consecutive pages into one entry is just additional
overhead which we don't need.

But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?

No, I wouldn't like to use that. We're already using that :)

But we use huge pages by allocating consecutive chunks of memory so that
both the CPU as well as the GPU can increase their TLB hit rate.

I'm not convinced that this is a universal win. Many GPU buffers aren't
accessed by the CPU and allocating huge pages puts much more strain on
the kernel MM subsystem.


Yeah, we indeed see the extra overhead during allocation.


What currently happens is that the DMA subsystem tries to coalesce
multiple pages into on SG entry and we de-coalesce that inside the
driver again to create our random access array.

I guess the right thing would be to have a flag that tells the the DMA
implementation to not coalesce the entries. But (ab-)using max segment
size tells the DMA API to split the segments if they are larger than
the given size, which is probably not what you want either as you now
need to coalesce the segments again when you are mapping real huge
pages.


No, even with huge pages I need an array with every single dma address 
for a 4K pages (or whatever the normal page size is).


The problem is that I need a random access array for the DMA addresses, 
even when they are continuously.


I agree that max segment size is a bit ugly here, but at least for 
radeon, amdgpu and pretty much TTM in general it is exactly what I need.


I could fix TTM to not need that, but for radeon and amdgpu it is the 
hardware which needs this.


Christian.




That is a huge waste of memory and CPU cycles and I actually wanted to
get rid of it for quite some time now. Sinas approach seems to be a good
step into the right direction to me to actually clean that up.


Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.

A way which preserves both path would be indeed nice to have, but that
only allows for the TLB optimization for the GPU and not the CPU any
more. So I actually see that as really minor use case.

Some of the Tegras have much larger TLBs and better page-walk
performance on the system IOMMU compared to the GPU MMU, so they would
probably benefit a good deal even if the hugepage optimization only
targets the GPU side.

Regards,
Lucas


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-12 Thread Andrey Grodzovsky



On 04/12/2018 12:32 AM, Alex Deucher wrote:

On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
 wrote:

Reserved VRAM is used to avoid overriding pre OS FB.
Once our display stack takes over we don't need the reserved
VRAM anymore.

v2:
Remove comment, we know actually why we need to reserve the stolen VRAM.
Fix return type for amdgpu_ttm_late_init.
v3:
Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
v4:
Don't release stolen memory for GMC9 ASICs untill GART corruption
on S3 resume is resolved.

Signed-off-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher 

Looks like you used the original version of this patch as well.
Updated version here:
https://patchwork.freedesktop.org/patch/215567/
more comments below.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  7 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  8 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|  1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  2 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  2 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  2 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 14 ++
  8 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 9e23d6f..a160ef0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -852,6 +852,13 @@ int amdgpu_bo_init(struct amdgpu_device *adev)
 return amdgpu_ttm_init(adev);
  }

+int amdgpu_bo_late_init(struct amdgpu_device *adev)
+{
+   amdgpu_ttm_late_init(adev);
+
+   return 0;
+}
+
  void amdgpu_bo_fini(struct amdgpu_device *adev)
  {
 amdgpu_ttm_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 3bee133..1e9fe85 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -251,6 +251,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
  int amdgpu_bo_unpin(struct amdgpu_bo *bo);
  int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
  int amdgpu_bo_init(struct amdgpu_device *adev);
+int amdgpu_bo_late_init(struct amdgpu_device *adev);
  void amdgpu_bo_fini(struct amdgpu_device *adev);
  int amdgpu_bo_fbdev_mmap(struct amdgpu_bo *bo,
 struct vm_area_struct *vma);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 0555821..7a608cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1517,14 +1517,18 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 return 0;
  }

+void amdgpu_ttm_late_init(struct amdgpu_device *adev)
+{
+   if (adev->gmc.stolen_size)

no need to check for NULL here.  amdgpu_bo_free_kernel() will do the
right thing even if stolen_vga_memory is NULL.


+   amdgpu_bo_free_kernel(&adev->stolen_vga_memory, NULL, NULL);
+}
+
  void amdgpu_ttm_fini(struct amdgpu_device *adev)
  {
 if (!adev->mman.initialized)
 return;

 amdgpu_ttm_debugfs_fini(adev);
-   if (adev->gmc.stolen_size)
-   amdgpu_bo_free_kernel(&adev->stolen_vga_memory, NULL, NULL);
 amdgpu_ttm_fw_reserve_vram_fini(adev);
 if (adev->mman.aper_base_kaddr)
 iounmap(adev->mman.aper_base_kaddr);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 6ea7de8..e969c87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -77,6 +77,7 @@ uint64_t amdgpu_vram_mgr_usage(struct ttm_mem_type_manager 
*man);
  uint64_t amdgpu_vram_mgr_vis_usage(struct ttm_mem_type_manager *man);

  int amdgpu_ttm_init(struct amdgpu_device *adev);
+void amdgpu_ttm_late_init(struct amdgpu_device *adev);
  void amdgpu_ttm_fini(struct amdgpu_device *adev);
  void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
 bool enable);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 63f0b65..4a8f9bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -819,6 +819,8 @@ static int gmc_v6_0_late_init(void *handle)
  {
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   amdgpu_bo_late_init(adev);
+
 if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS)
 return amdgpu_irq_get(adev, &adev->gmc.vm_fault, 0);
 else
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 2deb5c9..189fdf9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -958,6 +958,8 @@ static int gmc_v7_0_late_init(void *handle)
  {
 struct amdgpu_devi

Re: RFC for a render API to support adaptive sync and VRR

2018-04-12 Thread Nicolai Hähnle

On 12.04.2018 01:30, Cyr, Aric wrote:

At least with VDPAU, video players are already explicitly specifying the
target presentation time, so no changes should be required at that
level. Don't know about other video APIs.

The X11 Present extension protocol is also prepared for specifying the
target presentation time already, the support for it just needs to be
implemented.


I'm perfectly OK with presentation time-based *API*.  I get it from a user 
mode/app perspective, and that's fine.  We need that feedback and would like 
help defining that portions of the stack.
However, I think it doesn't make as much sense as a *DDI* because it doesn't 
correspond to any hardware real or logical (i.e. no one would implement it in 
HW this way) and the industry specs aren't defined that way.
You can have libdrm or some other usermode component translate your presentation time 
into a frame duration and schedule it.  What's the advantage of having this in kernel 
besides the fact we lose the intent of the application and could prevent features and 
optimizations.  When it gets to kernel, I think it is much more elegant for the flip 
structure to contain a simple duration that says "hey, show this frame on the screen 
for this long".  Then we don't need any clocks or timers just some simple math and 
program the hardware.


There isn't necessarily an inherent advantage to having this translation 
in the kernel. However, we *must* do this translation in a place that is 
owned by display experts (i.e., you guys), because only you guys know 
how to actually do that translation reliably and correctly.


Since your work is currently limited to the kernel, it makes sense to do 
it in the kernel.


If the translation doesn't happen in a place that you feel comfortable 
working on, we're setting ourselves up for a future where this 
hypothetical future UMD component will get this wrong, and there'll be a 
lot of finger-pointing between you guys and whoever writes that UMD, 
with likely little willingness to actually go into the respective other 
codebase to fix what's wrong. And that's a pretty sucky future.


Cheers,
Nicolai

P.S.: I'm also a little surprised that you seem to be saying that 
requesting a target present time is basically impossible (at least, 
that's kind of implied by your statement about mGPUs), and yet there's 
precedent for such APIs in both Vulkan and VDPAU.





In short,
  1) We can simplify media players' lives by helping them get really, really 
close to their content rate, so they wouldn't need any frame rate conversion.
  They'll still need A/V syncing though, and variable refresh cannot solve 
this and thus is way out of scope of what we're proposing.

  2) For gaming, don't even try to guess a frame duration, the driver/hardware 
will do a better job every time, just specify duration=0 and flip as fast as 
you can.

Regards,
   Aric

P.S. Thanks for the Croteam link.  Interesting, but basically nullified by 
variable refresh rate displays.  You won't have 
stuttering/microstuttering/juddering/tearing if your display's refresh rate 
matches the render/present rate of the game.  Maybe I should grab The Talos 
Principle to see how well it works with FreeSync display :)

--
ARIC CYR
PMTS Software Engineer | SW – Display Technologies






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread okaya

On 2018-04-12 06:33, Christian König wrote:

Am 12.04.2018 um 11:49 schrieb Lucas Stach:

Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:

Am 12.04.2018 um 11:11 schrieb Lucas Stach:

Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:
For dma_map_sg(), DMA API implementations are free to merge 
consecutive
segments into a single DMA mapping if conditions are suitable, 
thus the

resulting DMA addresses may be packed into fewer entries than
ttm->sg->nents implies.

drm_prime_sg_to_page_addr_arrays() does not account for this, 
meaning
its callers either have to reject the 0 < count < nents case or 
risk
getting bogus addresses back later. Fortunately this is relatively 
easy
to deal with having to rejig structures to also store the mapped 
count,
since the total DMA length should still be equal to the total 
buffer
length. All we need is a separate scatterlist cursor to iterate 
the DMA

addresses separately from the CPU addresses.

Mhm, I think I like Sinas approach better.

See the hardware actually needs the dma_address on a page by page 
basis.


Joining multiple consecutive pages into one entry is just 
additional

overhead which we don't need.
But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that 
might
be in front of your GPU to map large pages as such, causing 
additional
overhead by means of additional TLB misses and page walks on the 
IOMMU

side.

And wouldn't you like to use huge pages at the GPU side, if the 
IOMMU

already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?

No, I wouldn't like to use that. We're already using that :)

But we use huge pages by allocating consecutive chunks of memory so 
that

both the CPU as well as the GPU can increase their TLB hit rate.
I'm not convinced that this is a universal win. Many GPU buffers 
aren't

accessed by the CPU and allocating huge pages puts much more strain on
the kernel MM subsystem.


Yeah, we indeed see the extra overhead during allocation.


What currently happens is that the DMA subsystem tries to coalesce
multiple pages into on SG entry and we de-coalesce that inside the
driver again to create our random access array.

I guess the right thing would be to have a flag that tells the the DMA
implementation to not coalesce the entries. But (ab-)using max segment
size tells the DMA API to split the segments if they are larger than
the given size, which is probably not what you want either as you now
need to coalesce the segments again when you are mapping real huge
pages.


No, even with huge pages I need an array with every single dma address
for a 4K pages (or whatever the normal page size is).

The problem is that I need a random access array for the DMA
addresses, even when they are continuously.

I agree that max segment size is a bit ugly here, but at least for
radeon, amdgpu and pretty much TTM in general it is exactly what I
need.

I could fix TTM to not need that, but for radeon and amdgpu it is the
hardware which needs this.


I can implement i915 approach as Robin suggested as an alternative. Is 
that better?





Christian.



That is a huge waste of memory and CPU cycles and I actually wanted 
to
get rid of it for quite some time now. Sinas approach seems to be a 
good

step into the right direction to me to actually clean that up.

Detecting such a condition is much easier if the DMA map 
implementation

gives you the coalesced scatter entries.
A way which preserves both path would be indeed nice to have, but 
that

only allows for the TLB optimization for the GPU and not the CPU any
more. So I actually see that as really minor use case.

Some of the Tegras have much larger TLBs and better page-walk
performance on the system IOMMU compared to the GPU MMU, so they would
probably benefit a good deal even if the hugepage optimization only
targets the GPU side.

Regards,
Lucas

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-12 Thread Alex Deucher
On Thu, Apr 12, 2018 at 7:17 AM, Andrey Grodzovsky
 wrote:
>
>
> On 04/12/2018 12:32 AM, Alex Deucher wrote:
>>
>> On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
>>  wrote:
>>>
>>> Reserved VRAM is used to avoid overriding pre OS FB.
>>> Once our display stack takes over we don't need the reserved
>>> VRAM anymore.
>>>
>>> v2:
>>> Remove comment, we know actually why we need to reserve the stolen VRAM.
>>> Fix return type for amdgpu_ttm_late_init.
>>> v3:
>>> Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
>>> v4:
>>> Don't release stolen memory for GMC9 ASICs untill GART corruption
>>> on S3 resume is resolved.
>>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> Signed-off-by: Alex Deucher 
>>
>> Looks like you used the original version of this patch as well.
>> Updated version here:
>> https://patchwork.freedesktop.org/patch/215567/
>> more comments below.
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  7 +++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  8 ++--
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|  1 +
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  2 ++
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  2 ++
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  2 ++
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 14 ++
>>>   8 files changed, 31 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 9e23d6f..a160ef0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -852,6 +852,13 @@ int amdgpu_bo_init(struct amdgpu_device *adev)
>>>  return amdgpu_ttm_init(adev);
>>>   }
>>>
>>> +int amdgpu_bo_late_init(struct amdgpu_device *adev)
>>> +{
>>> +   amdgpu_ttm_late_init(adev);
>>> +
>>> +   return 0;
>>> +}
>>> +
>>>   void amdgpu_bo_fini(struct amdgpu_device *adev)
>>>   {
>>>  amdgpu_ttm_fini(adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> index 3bee133..1e9fe85 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> @@ -251,6 +251,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo,
>>> u32 domain,
>>>   int amdgpu_bo_unpin(struct amdgpu_bo *bo);
>>>   int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
>>>   int amdgpu_bo_init(struct amdgpu_device *adev);
>>> +int amdgpu_bo_late_init(struct amdgpu_device *adev);
>>>   void amdgpu_bo_fini(struct amdgpu_device *adev);
>>>   int amdgpu_bo_fbdev_mmap(struct amdgpu_bo *bo,
>>>  struct vm_area_struct *vma);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 0555821..7a608cf 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -1517,14 +1517,18 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>>>  return 0;
>>>   }
>>>
>>> +void amdgpu_ttm_late_init(struct amdgpu_device *adev)
>>> +{
>>> +   if (adev->gmc.stolen_size)
>>
>> no need to check for NULL here.  amdgpu_bo_free_kernel() will do the
>> right thing even if stolen_vga_memory is NULL.
>>
>>> +   amdgpu_bo_free_kernel(&adev->stolen_vga_memory, NULL,
>>> NULL);
>>> +}
>>> +
>>>   void amdgpu_ttm_fini(struct amdgpu_device *adev)
>>>   {
>>>  if (!adev->mman.initialized)
>>>  return;
>>>
>>>  amdgpu_ttm_debugfs_fini(adev);
>>> -   if (adev->gmc.stolen_size)
>>> -   amdgpu_bo_free_kernel(&adev->stolen_vga_memory, NULL,
>>> NULL);
>>>  amdgpu_ttm_fw_reserve_vram_fini(adev);
>>>  if (adev->mman.aper_base_kaddr)
>>>  iounmap(adev->mman.aper_base_kaddr);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> index 6ea7de8..e969c87 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> @@ -77,6 +77,7 @@ uint64_t amdgpu_vram_mgr_usage(struct
>>> ttm_mem_type_manager *man);
>>>   uint64_t amdgpu_vram_mgr_vis_usage(struct ttm_mem_type_manager *man);
>>>
>>>   int amdgpu_ttm_init(struct amdgpu_device *adev);
>>> +void amdgpu_ttm_late_init(struct amdgpu_device *adev);
>>>   void amdgpu_ttm_fini(struct amdgpu_device *adev);
>>>   void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
>>>  bool enable);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
>>> index 63f0b65..4a8f9bd 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
>>> @@ -819,6 +819,8 @@ static int gmc_v6_0_late_init(void *handle)
>>>   {
>>>  struct amdgpu_device *adev = (struc

Re: [PATCH V2] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-12 Thread Robin Murphy

On 12/04/18 10:42, Christian König wrote:

Am 12.04.2018 um 08:26 schrieb Christoph Hellwig:

On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote:

On 10/04/18 21:59, Sinan Kaya wrote:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.
So why not fix said code? It's clearly not a real hardware 
limitation, and
the map_sg() APIs have potentially returned fewer than nents since 
forever,

so there's really no excuse.

Yes, relying on dma_map_sg returning the same number of entries as passed
it is completely bogus.


I agree that the common DRM functions should be able to deal with both, 
but we should let the driver side decide if it wants multiple addresses 
combined or not.





IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Disagree; this is a dodgy hack, since you'll now end up passing
scatterlists into dma_map_sg() which already violate max_seg_size to 
begin
with, and I think a conscientious DMA API implementation would be at 
rights
to fail the mapping for that reason (I know arm64 happens not to, but 
that

was a deliberate design decision to make my life easier at the time).

Agreed.


Sounds like my understanding of max_seg_size is incorrect, what exactly 
should that describe?


The segment size and boundary mask are there to represent a device's 
hardware scatter-gather capabilities - a lot of things have limits on 
the size and alignment of the data pointed to by a single descriptor 
(e.g. an xHCI TRB, where the data buffer must not cross a 64KB boundary) 
- although they can also be relevant to non-scatter-gather devices if 
they just have limits on how big/aligned a single DMA transfer can be.


Robin.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately

2018-04-12 Thread Robin Murphy

On 12/04/18 11:33, Christian König wrote:

Am 12.04.2018 um 11:49 schrieb Lucas Stach:

Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:

Am 12.04.2018 um 11:11 schrieb Lucas Stach:

Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:
For dma_map_sg(), DMA API implementations are free to merge 
consecutive
segments into a single DMA mapping if conditions are suitable, 
thus the

resulting DMA addresses may be packed into fewer entries than
ttm->sg->nents implies.

drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
its callers either have to reject the 0 < count < nents case or risk
getting bogus addresses back later. Fortunately this is relatively 
easy
to deal with having to rejig structures to also store the mapped 
count,

since the total DMA length should still be equal to the total buffer
length. All we need is a separate scatterlist cursor to iterate 
the DMA

addresses separately from the CPU addresses.

Mhm, I think I like Sinas approach better.

See the hardware actually needs the dma_address on a page by page 
basis.


Joining multiple consecutive pages into one entry is just additional
overhead which we don't need.


Note that the merging done inside dma_map_sg() is pretty trivial in 
itself (it's effectively just the inverse of the logic in this patch). 
The "overhead" here is inherent in calling sg_alloc_table_from_pages() 
and dma_map_sg() at all.



But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?

No, I wouldn't like to use that. We're already using that :)

But we use huge pages by allocating consecutive chunks of memory so that
both the CPU as well as the GPU can increase their TLB hit rate.

I'm not convinced that this is a universal win. Many GPU buffers aren't
accessed by the CPU and allocating huge pages puts much more strain on
the kernel MM subsystem.


Yeah, we indeed see the extra overhead during allocation.


What currently happens is that the DMA subsystem tries to coalesce
multiple pages into on SG entry and we de-coalesce that inside the
driver again to create our random access array.

I guess the right thing would be to have a flag that tells the the DMA
implementation to not coalesce the entries. But (ab-)using max segment
size tells the DMA API to split the segments if they are larger than
the given size, which is probably not what you want either as you now
need to coalesce the segments again when you are mapping real huge
pages.


No, even with huge pages I need an array with every single dma address 
for a 4K pages (or whatever the normal page size is).


The problem is that I need a random access array for the DMA addresses, 
even when they are continuously.


OK, that makes me wonder if you even need dma_map_sg() in this case. 
From the sound of that it would be a lot simpler to just call 
dma_map_page() in a loop over the pair of arrays. AFAICS that's what TTM 
already does in most places.


I agree that max segment size is a bit ugly here, but at least for 
radeon, amdgpu and pretty much TTM in general it is exactly what I need.


I could fix TTM to not need that, but for radeon and amdgpu it is the 
hardware which needs this.


Sorry, I don't follow - how does the hardware care about the format of 
an intermediate data structure used to *generate* the dma_address array? 
That's all that I'm proposing to fix here.


Robin.



Christian.




That is a huge waste of memory and CPU cycles and I actually wanted to
get rid of it for quite some time now. Sinas approach seems to be a good
step into the right direction to me to actually clean that up.


Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.

A way which preserves both path would be indeed nice to have, but that
only allows for the TLB optimization for the GPU and not the CPU any
more. So I actually see that as really minor use case.

Some of the Tegras have much larger TLBs and better page-walk
performance on the system IOMMU compared to the GPU MMU, so they would
probably benefit a good deal even if the hugepage optimization only
targets the GPU side.

Regards,
Lucas



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-12 Thread Michel Dänzer
On 2018-04-12 03:33 PM, Alex Deucher wrote:
> On Thu, Apr 12, 2018 at 7:17 AM, Andrey Grodzovsky
>  wrote:
>> On 04/12/2018 12:32 AM, Alex Deucher wrote:
>>>
>>> On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
>>>  wrote:

 Reserved VRAM is used to avoid overriding pre OS FB.
 Once our display stack takes over we don't need the reserved
 VRAM anymore.

 v2:
 Remove comment, we know actually why we need to reserve the stolen VRAM.
 Fix return type for amdgpu_ttm_late_init.
 v3:
 Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
 v4:
 Don't release stolen memory for GMC9 ASICs untill GART corruption
 on S3 resume is resolved.

 Signed-off-by: Andrey Grodzovsky 
 Signed-off-by: Alex Deucher 

[...]

 diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
 b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
 index 252a6c69..099e3ce5 100644
 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
 +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
 @@ -659,6 +659,16 @@ static int gmc_v9_0_late_init(void *handle)
  unsigned i;
  int r;

 +   /*
 +* TODO:
 +* Currently there is a bug where some memory client outside
 +* of the driver writes to first 8M of VRAM on S3 resume,
 +* this overrides GART which by default gets placed in first 8M
 and
 +* causes VM_FAULTS once GTT is accessed.
 +* Keep the stolen memory reservation until this solved.
 +*/
 +   /* amdgpu_bo_late_init(adev); /
 +
>>>
>>> We still need to free this somewhere.  I'd suggest calling it in
>>> gmc_v9_0_sw_fini() and add a comment there about moving it when we fix
>>> the issue.
>>>
  for(i = 0; i < adev->num_rings; ++i) {
  struct amdgpu_ring *ring = adev->rings[i];
  unsigned vmhub = ring->funcs->vmhub;
 @@ -884,10 +894,6 @@ static int gmc_v9_0_sw_init(void *handle)
   */
  adev->gmc.mc_mask = 0xULL; /* 48 bit MC */

 -   /*
 -* It needs to reserve 8M stolen memory for vega10
 -* TODO: Figure out how to avoid that...
 -*/
  adev->gmc.stolen_size = gmc_v9_0_get_vbios_fb_size(adev);
>>>
>>> We may also just want to return 8MB or 9MB temporarily in
>>> gmc_v9_0_get_vbios_fb_size until we sort out the root cause of the S3
>>> issue otherwise we're potentially wasting a lot more memory.
>>
>> But what if we have 4k display ? In this case returning 9M probably will not
>> hide the corruption  we were originally dealing with. I remember in that
>> case pre OS FB size would be 32M.
> 
> I guess it's a trade off, possible garbage monentary during bios to
> driver transition vs. wasting an additional 24 MB of CPU accessible
> vram for the life of the driver.

Can we free the reserved memory after initialization, then reserve it
again on resume?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"

2018-04-12 Thread Harry Wentland
This reverts commit cd2d6c92a8e39d7e50a5af9fcc67d07e6a89e91d.

Cc: Shirish S 
Cc: Alex Deucher 
Cc: sta...@vger.kernel.org
Signed-off-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 3684350f5e9d..3848d6823196 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4998,9 +4998,6 @@ static int dm_atomic_check_plane_state_fb(struct 
drm_atomic_state *state,
return -EDEADLK;
 
crtc_state = drm_atomic_get_crtc_state(plane_state->state, 
crtc);
-   if (IS_ERR(crtc_state))
-   return PTR_ERR(crtc_state);
-
if (crtc->primary == plane && crtc_state->active) {
if (!plane_state->fb)
return -EINVAL;
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"

2018-04-12 Thread Harry Wentland
This seems to cause flickering and lock-ups for a wide range of users.
Revert until we've found a proper fix for the flickering and lock-ups.

This reverts commit 36cc549d59864b7161f0e23d710c1c4d1b9cf022.

Cc: Shirish S 
Cc: Alex Deucher 
Cc: sta...@vger.kernel.org
Signed-off-by: Harry Wentland 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 28 ---
 1 file changed, 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 3848d6823196..6f92a19bebd6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4982,30 +4982,6 @@ static int dm_update_planes_state(struct dc *dc,
return ret;
 }
 
-static int dm_atomic_check_plane_state_fb(struct drm_atomic_state *state,
- struct drm_crtc *crtc)
-{
-   struct drm_plane *plane;
-   struct drm_crtc_state *crtc_state;
-
-   WARN_ON(!drm_atomic_get_new_crtc_state(state, crtc));
-
-   drm_for_each_plane_mask(plane, state->dev, crtc->state->plane_mask) {
-   struct drm_plane_state *plane_state =
-   drm_atomic_get_plane_state(state, plane);
-
-   if (IS_ERR(plane_state))
-   return -EDEADLK;
-
-   crtc_state = drm_atomic_get_crtc_state(plane_state->state, 
crtc);
-   if (crtc->primary == plane && crtc_state->active) {
-   if (!plane_state->fb)
-   return -EINVAL;
-   }
-   }
-   return 0;
-}
-
 static int amdgpu_dm_atomic_check(struct drm_device *dev,
  struct drm_atomic_state *state)
 {
@@ -5032,10 +5008,6 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
struct dm_crtc_state *dm_new_crtc_state = 
to_dm_crtc_state(new_crtc_state);
struct dm_crtc_state *dm_old_crtc_state  = 
to_dm_crtc_state(old_crtc_state);
 
-   ret = dm_atomic_check_plane_state_fb(state, crtc);
-   if (ret)
-   goto fail;
-
if (!drm_atomic_crtc_needs_modeset(new_crtc_state) &&
!new_crtc_state->color_mgmt_changed &&
(dm_old_crtc_state->freesync_enabled == 
dm_new_crtc_state->freesync_enabled))
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-12 Thread Michel Dänzer
On 2018-04-12 01:39 PM, Nicolai Hähnle wrote:
> On 12.04.2018 01:30, Cyr, Aric wrote:
>>> At least with VDPAU, video players are already explicitly specifying the
>>> target presentation time, so no changes should be required at that
>>> level. Don't know about other video APIs.
>>>
>>> The X11 Present extension protocol is also prepared for specifying the
>>> target presentation time already, the support for it just needs to be
>>> implemented.
>>
>> I'm perfectly OK with presentation time-based *API*.  I get it from a
>> user mode/app perspective, and that's fine.  We need that feedback and
>> would like help defining that portions of the stack.
>> However, I think it doesn't make as much sense as a *DDI* because it
>> doesn't correspond to any hardware real or logical (i.e. no one would
>> implement it in HW this way) and the industry specs aren't defined
>> that way.
>> You can have libdrm or some other usermode component translate your
>> presentation time into a frame duration and schedule it.  What's the
>> advantage of having this in kernel besides the fact we lose the intent
>> of the application and could prevent features and optimizations.  When
>> it gets to kernel, I think it is much more elegant for the flip
>> structure to contain a simple duration that says "hey, show this frame
>> on the screen for this long".  Then we don't need any clocks or timers
>> just some simple math and program the hardware.
> 
> There isn't necessarily an inherent advantage to having this translation
> in the kernel.

One such advantage is that it doesn't require userspace to predict the
future, where at least in the Vulkan case there is no information to
base the prediction on. I fail to see how that can work at all.


> P.S.: I'm also a little surprised that you seem to be saying that
> requesting a target present time is basically impossible (at least,
> that's kind of implied by your statement about mGPUs), and yet there's
> precedent for such APIs in both Vulkan and VDPAU.

Keep in mind that the constraint is "present no earlier than", which can
be satisfied e.g. by waiting for the target time to pass before
programming the flip to the hardware.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-12 Thread Andrey Grodzovsky



On 04/12/2018 10:33 AM, Michel Dänzer wrote:

On 2018-04-12 03:33 PM, Alex Deucher wrote:

On Thu, Apr 12, 2018 at 7:17 AM, Andrey Grodzovsky
 wrote:

On 04/12/2018 12:32 AM, Alex Deucher wrote:

On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
 wrote:

Reserved VRAM is used to avoid overriding pre OS FB.
Once our display stack takes over we don't need the reserved
VRAM anymore.

v2:
Remove comment, we know actually why we need to reserve the stolen VRAM.
Fix return type for amdgpu_ttm_late_init.
v3:
Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
v4:
Don't release stolen memory for GMC9 ASICs untill GART corruption
on S3 resume is resolved.

Signed-off-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher 

[...]


Not sure what it means ?



diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 252a6c69..099e3ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -659,6 +659,16 @@ static int gmc_v9_0_late_init(void *handle)
  unsigned i;
  int r;

+   /*
+* TODO:
+* Currently there is a bug where some memory client outside
+* of the driver writes to first 8M of VRAM on S3 resume,
+* this overrides GART which by default gets placed in first 8M
and
+* causes VM_FAULTS once GTT is accessed.
+* Keep the stolen memory reservation until this solved.
+*/
+   /* amdgpu_bo_late_init(adev); /
+

We still need to free this somewhere.  I'd suggest calling it in
gmc_v9_0_sw_fini() and add a comment there about moving it when we fix
the issue.


  for(i = 0; i < adev->num_rings; ++i) {
  struct amdgpu_ring *ring = adev->rings[i];
  unsigned vmhub = ring->funcs->vmhub;
@@ -884,10 +894,6 @@ static int gmc_v9_0_sw_init(void *handle)
   */
  adev->gmc.mc_mask = 0xULL; /* 48 bit MC */

-   /*
-* It needs to reserve 8M stolen memory for vega10
-* TODO: Figure out how to avoid that...
-*/
  adev->gmc.stolen_size = gmc_v9_0_get_vbios_fb_size(adev);

We may also just want to return 8MB or 9MB temporarily in
gmc_v9_0_get_vbios_fb_size until we sort out the root cause of the S3
issue otherwise we're potentially wasting a lot more memory.

But what if we have 4k display ? In this case returning 9M probably will not
hide the corruption  we were originally dealing with. I remember in that
case pre OS FB size would be 32M.

I guess it's a trade off, possible garbage monentary during bios to
driver transition vs. wasting an additional 24 MB of CPU accessible
vram for the life of the driver.

Can we free the reserved memory after initialization, then reserve it
again on resume?


The issue here was that someone overrides the first 8M of VRAM and 
corrupts the GART table, which causes VM_FAULTS. Until we find who is 
writing into this area of VRAM and when exactly I think we cannot allow 
any data to be placed there since it's might get corrupted (even if we 
avoid placing the GART table there).


Andrey





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"

2018-04-12 Thread Michel Dänzer
On 2018-04-12 04:51 PM, Harry Wentland wrote:
> This seems to cause flickering and lock-ups for a wide range of users.
> Revert until we've found a proper fix for the flickering and lock-ups.
> 
> This reverts commit 36cc549d59864b7161f0e23d710c1c4d1b9cf022.
> 
> Cc: Shirish S 
> Cc: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Harry Wentland 

Thanks Harry, both patches are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-12 Thread Michel Dänzer
On 2018-04-12 04:58 PM, Andrey Grodzovsky wrote:
> On 04/12/2018 10:33 AM, Michel Dänzer wrote:
>> On 2018-04-12 03:33 PM, Alex Deucher wrote:
>>> On Thu, Apr 12, 2018 at 7:17 AM, Andrey Grodzovsky
>>>  wrote:
 On 04/12/2018 12:32 AM, Alex Deucher wrote:
> On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
>  wrote:
>> Reserved VRAM is used to avoid overriding pre OS FB.
>> Once our display stack takes over we don't need the reserved
>> VRAM anymore.
>>
>> v2:
>> Remove comment, we know actually why we need to reserve the stolen
>> VRAM.
>> Fix return type for amdgpu_ttm_late_init.
>> v3:
>> Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
>> v4:
>> Don't release stolen memory for GMC9 ASICs untill GART corruption
>> on S3 resume is resolved.
>>
>> Signed-off-by: Andrey Grodzovsky 
>> Signed-off-by: Alex Deucher 
>> [...]
> 
> Not sure what it means ?

It means I trimmed some quoted text. Would it be clearer if I put
quotation markers before it?


>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 252a6c69..099e3ce5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -659,6 +659,16 @@ static int gmc_v9_0_late_init(void *handle)
>>   unsigned i;
>>   int r;
>>
>> +   /*
>> +    * TODO:
>> +    * Currently there is a bug where some memory client outside
>> +    * of the driver writes to first 8M of VRAM on S3 resume,
>> +    * this overrides GART which by default gets placed in
>> first 8M
>> and
>> +    * causes VM_FAULTS once GTT is accessed.
>> +    * Keep the stolen memory reservation until this solved.
>> +    */
>> +   /* amdgpu_bo_late_init(adev); /
>> +
> We still need to free this somewhere.  I'd suggest calling it in
> gmc_v9_0_sw_fini() and add a comment there about moving it when we fix
> the issue.
>
>>   for(i = 0; i < adev->num_rings; ++i) {
>>   struct amdgpu_ring *ring = adev->rings[i];
>>   unsigned vmhub = ring->funcs->vmhub;
>> @@ -884,10 +894,6 @@ static int gmc_v9_0_sw_init(void *handle)
>>    */
>>   adev->gmc.mc_mask = 0xULL; /* 48 bit MC */
>>
>> -   /*
>> -    * It needs to reserve 8M stolen memory for vega10
>> -    * TODO: Figure out how to avoid that...
>> -    */
>>   adev->gmc.stolen_size = gmc_v9_0_get_vbios_fb_size(adev);
> We may also just want to return 8MB or 9MB temporarily in
> gmc_v9_0_get_vbios_fb_size until we sort out the root cause of the S3
> issue otherwise we're potentially wasting a lot more memory.
 But what if we have 4k display ? In this case returning 9M probably
 will not
 hide the corruption  we were originally dealing with. I remember in
 that
 case pre OS FB size would be 32M.
>>> I guess it's a trade off, possible garbage monentary during bios to
>>> driver transition vs. wasting an additional 24 MB of CPU accessible
>>> vram for the life of the driver.
>> Can we free the reserved memory after initialization, then reserve it
>> again on resume?
> 
> The issue here was that someone overrides the first 8M of VRAM and
> corrupts the GART table, which causes VM_FAULTS. Until we find who is
> writing into this area of VRAM and when exactly I think we cannot allow
> any data to be placed there since it's might get corrupted (even if we
> avoid placing the GART table there).

I think it shouldn't be too hard in general to make sure the GART table,
and any other BOs which stay in VRAM across suspend/resume, don't fall
in the affected area.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Marek Olšák
Can you be more specific, Christian? Mesa has this, I don't think it needs
anything else:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

Marek

On Wed, Mar 28, 2018 at 3:46 AM, Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 28.03.2018 um 00:22 schrieb Samuel Li:
>
>> It's auto by default. For CZ/ST, auto setting enables sg display
>> when vram size is small; otherwise still uses vram.
>> This patch fixed some potential hang issue introduced by change
>> "allow framebuffer in GART memory as well" due to CZ/ST hardware
>> limitation.
>>
>
> Well that is still a NAK.
>
> As discussed now multiple times please implement the necessary changes in
> Mesa.
>
> Regards,
> Christian.
>
>
>
>> v2: Change default setting to auto, also some misc changes.
>> Signed-off-by: Samuel Li 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   | 10 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.h   |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  4 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 ++
>>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
>>   6 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index a7e2229..c942362 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
>>   extern int amdgpu_compute_multipipe;
>>   extern int amdgpu_gpu_recovery;
>>   extern int amdgpu_emu_mode;
>> +extern int amdgpu_sg_display;
>> #ifdef CONFIG_DRM_AMDGPU_SI
>>   extern int amdgpu_si_support;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> index 5495b29..1e7b950 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> @@ -513,8 +513,14 @@ uint32_t amdgpu_display_framebuffer_domains(struct
>> amdgpu_device *adev)
>>   #if defined(CONFIG_DRM_AMD_DC)
>> if (adev->asic_type >= CHIP_CARRIZO && adev->asic_type <
>> CHIP_RAVEN &&
>> adev->flags & AMD_IS_APU &&
>> -   amdgpu_device_asic_has_dc_support(adev->asic_type))
>> -   domain |= AMDGPU_GEM_DOMAIN_GTT;
>> +   amdgpu_device_asic_has_dc_support(adev->asic_type)) {
>> +   if (amdgpu_sg_display == 1)
>> +   domain = AMDGPU_GEM_DOMAIN_GTT;
>> +   else if (amdgpu_sg_display == -1) {
>> +   if (adev->gmc.real_vram_size <
>> AMDGPU_SG_THRESHOLD)
>> +   domain = AMDGPU_GEM_DOMAIN_GTT;
>> +   }
>> +   }
>>   #endif
>> return domain;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>> index 2b11d80..2b25393 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>> @@ -23,6 +23,8 @@
>>   #ifndef __AMDGPU_DISPLAY_H__
>>   #define __AMDGPU_DISPLAY_H__
>>   +#define AMDGPU_SG_THRESHOLD  (256*1024*1024)
>> +
>>   uint32_t amdgpu_display_framebuffer_domains(struct amdgpu_device
>> *adev);
>>   struct drm_framebuffer *
>>   amdgpu_display_user_framebuffer_create(struct drm_device *dev,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 1bfce79..19f11a5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -132,6 +132,7 @@ int amdgpu_lbpw = -1;
>>   int amdgpu_compute_multipipe = -1;
>>   int amdgpu_gpu_recovery = -1; /* auto */
>>   int amdgpu_emu_mode = 0;
>> +int amdgpu_sg_display = -1;
>> MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in
>> megabytes");
>>   module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
>> @@ -290,6 +291,9 @@ module_param_named(gpu_recovery,
>> amdgpu_gpu_recovery, int, 0444);
>>   MODULE_PARM_DESC(emu_mode, "Emulation mode, (1 = enable, 0 = disable)");
>>   module_param_named(emu_mode, amdgpu_emu_mode, int, 0444);
>>   +MODULE_PARM_DESC(sg_display, "Enable scatter gather display, (1 =
>> enable, 0 = disable, -1 = auto");
>> +module_param_named(sg_display, amdgpu_sg_display, int, 0444);
>> +
>>   #ifdef CONFIG_DRM_AMDGPU_SI
>> #if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> index 1206301..f57c355 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> @@ -138,6 +138,8 @@ static int amdgpufb_create_pinned_object(struct
>> amdgpu_fbdev *rfbdev,
>> mode_cmd->pitches[0] = amdgpu_align_pitch(adev, mode_cmd->width,
>> cpp,
>>   fb_tiled);
>>  

Re: [PATCH xf86-video-amdgpu 3/5] Keep CRTC properties consistent

2018-04-12 Thread Leo Li



On 2018-04-12 06:30 AM, Michel Dänzer wrote:

On 2018-04-11 11:26 PM, Leo Li wrote:

On 2018-04-11 04:39 AM, Michel Dänzer wrote:


Hmm. So either legacy or non-legacy clients won't work at all, or
they'll step on each other's toes, clobbering the HW gamma LUT from
each other.

I'm afraid neither of those alternatives sound like a good user
experience to me.

Consider on the one hand something like Night Light / redshift,
using legacy APIs to adjust colour temperature to the time of day.
On the other hand, another client using the non-legacy API for say
fine-tuning of a display's advanced gamma capabilities.

Ideally, in this case the gamma LUTs from the legacy and non-legacy
APIs should be combined, such that the hardware LUT reflects both
the colour temperature set by Night Light / refshift and the
fine-tuning set by the non-legacy client. Is that feasible? If not,
can you explain a little why?


Hmm, combining LUTs could be feasible, but I don't think that's the
right approach.

LUTs can be combined through composition h(x) = f(g(x)), with some
interpolation involved. Once combined, it can be set using the
non-legacy API, since that'll likely have a larger LUT size.

But I think what you've mentioned raises a bigger issue of color
management conflicts in general. It doesn't have to be redshift vs
monitor correction, what if there more than 2 applications contending
to set gamma? xrandr's brightness already conflicts with redshift,
and so does some apps running on WINE.


What you're describing is an X11 design flaw, which we cannot do
anything about in the DDX driver.

What I want to avoid is repeating a similar situation as we had before
xserver 1.19, where one could have either working per-window colormaps
and global gamma, or per-CRTC gamma via RandR, not all at the same
time. (Before xserver 1.7, they would clobber the HW LUT from each
other) I fixed this in
https://cgit.freedesktop.org/xorg/xserver/commit/?id=b4e46c0444bb09f4af59d9d13acc939a0fbbc6d6
by composing the LUTs.



For the small example at hand, the ideal solution is to have
redshift use the color transformation matrix (CTM), which will be
exposed as part of the non-legacy color API. It'll need modification
of redshift, but at least it won't conflict with anything gamma
related.


 From past experience, it can take a long time (up to forever) for some
clients to be updated like this. E.g. it's unlikely at this point that
libraries such as SDL1 will ever be updated for the new API, so I'm
afraid users would run into things like
https://bugs.freedesktop.org/show_bug.cgi?id=27222 again.

(Besides, it would conflict with anything else using the same API, as
you described above, so it's not even obviously the ideal solution)



Fair enough, it's better to have the frequent redshift+monitor adjust
use-case working now instead of pushing an entirely new API.




Jumping back on some patch 1 topics:

Are there ways to get arbitrarily (no more than 4096 elements) sized
arrays from a client, to the DDX driver?


Seems like the RRChangeOutputProperty request could work.



I agree, it would definitely be nicer for clients to not worry about
DRM blobs at all.


Indeed. E.g. as a bonus, it would allow this to work even with a remote
display connection.


I'm holding off on the more detailed discussion about the other patches
until there is a plan for this fundamental issue.




I'll take another look at RRChangeOutputProperty. Seems I missed the
'length' argument when I first went through it.

Once the blobs are gone, the other issues should be much simpler to
solve. I'll see if I can come up with some v2's.

A side question: How does xrandr display lengthy properties, like LUTs?
Can users set it via --set still?

Leo


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Christian König
Can you be more specific, Christian? Mesa has this, I don't think it 
needs anything else:

Completely agree, that's what I suggested to implement.

The point is this kernel change now needs to be reworked and adapted to 
what Mesa is doing.


Regards,
Christian.

Am 12.04.2018 um 18:40 schrieb Marek Olšák:
Can you be more specific, Christian? Mesa has this, I don't think it 
needs anything else:

https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

Marek

On Wed, Mar 28, 2018 at 3:46 AM, Christian König 
> wrote:


Am 28.03.2018 um 00:22 schrieb Samuel Li:

It's auto by default. For CZ/ST, auto setting enables sg display
when vram size is small; otherwise still uses vram.
This patch fixed some potential hang issue introduced by change
"allow framebuffer in GART memory as well" due to CZ/ST hardware
limitation.


Well that is still a NAK.

As discussed now multiple times please implement the necessary
changes in Mesa.

Regards,
Christian.



v2: Change default setting to auto, also some misc changes.
Signed-off-by: Samuel Li mailto:samuel...@amd.com>>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
  6 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a7e2229..c942362 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
  extern int amdgpu_compute_multipipe;
  extern int amdgpu_gpu_recovery;
  extern int amdgpu_emu_mode;
+extern int amdgpu_sg_display;
    #ifdef CONFIG_DRM_AMDGPU_SI
  extern int amdgpu_si_support;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 5495b29..1e7b950 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -513,8 +513,14 @@ uint32_t
amdgpu_display_framebuffer_domains(struct amdgpu_device *adev)
  #if defined(CONFIG_DRM_AMD_DC)
        if (adev->asic_type >= CHIP_CARRIZO && adev->asic_type
< CHIP_RAVEN &&
            adev->flags & AMD_IS_APU &&
-           amdgpu_device_asic_has_dc_support(adev->asic_type))
-               domain |= AMDGPU_GEM_DOMAIN_GTT;
+           amdgpu_device_asic_has_dc_support(adev->asic_type)) {
+               if (amdgpu_sg_display == 1)
+                       domain = AMDGPU_GEM_DOMAIN_GTT;
+               else if (amdgpu_sg_display == -1) {
+                       if (adev->gmc.real_vram_size <
AMDGPU_SG_THRESHOLD)
+                               domain = AMDGPU_GEM_DOMAIN_GTT;
+               }
+       }
  #endif
        return domain;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
index 2b11d80..2b25393 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
@@ -23,6 +23,8 @@
  #ifndef __AMDGPU_DISPLAY_H__
  #define __AMDGPU_DISPLAY_H__
  +#define AMDGPU_SG_THRESHOLD  (256*1024*1024)
+
  uint32_t amdgpu_display_framebuffer_domains(struct
amdgpu_device *adev);
  struct drm_framebuffer *
  amdgpu_display_user_framebuffer_create(struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1bfce79..19f11a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -132,6 +132,7 @@ int amdgpu_lbpw = -1;
  int amdgpu_compute_multipipe = -1;
  int amdgpu_gpu_recovery = -1; /* auto */
  int amdgpu_emu_mode = 0;
+int amdgpu_sg_display = -1;
    MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in
megabytes");
  module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
@@ -290,6 +291,9 @@ module_param_named(gpu_recovery,
amdgpu_gpu_recovery, int, 0444);
  MODULE_PARM_DESC(emu_mode, "Emulation mode, (1 = enable, 0 =
disable)");
  module_param_named(emu_mode, amdgpu_emu_mode, int, 0444);
  +MODULE_PARM_DESC(sg_d

Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Samuel Li
> The point is this kernel change now needs to be reworked and adapted to what 
> Mesa is doing.
Three options have been brought up for kernel,
1) Turn off immediate mode when flipping between VRAM/GTT.
2) Check the domain of the current fb and then adjust the new one before 
pinning it.
3) Use only VRAM or GTT depending on a threshhold.

As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

Regards,
Samuel Li



On 2018-04-12 01:03 PM, Christian König wrote:
>> Can you be more specific, Christian? Mesa has this, I don't think it needs 
>> anything else:
> Completely agree, that's what I suggested to implement.
> 
> The point is this kernel change now needs to be reworked and adapted to what 
> Mesa is doing.
> 
> Regards,
> Christian.
> 
> Am 12.04.2018 um 18:40 schrieb Marek Olšák:
>> Can you be more specific, Christian? Mesa has this, I don't think it needs 
>> anything else:
>> https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b
>>
>> Marek
>>
>> On Wed, Mar 28, 2018 at 3:46 AM, Christian König 
>> mailto:ckoenig.leichtzumer...@gmail.com>> 
>> wrote:
>>
>>     Am 28.03.2018 um 00:22 schrieb Samuel Li:
>>
>>     It's auto by default. For CZ/ST, auto setting enables sg display
>>     when vram size is small; otherwise still uses vram.
>>     This patch fixed some potential hang issue introduced by change
>>     "allow framebuffer in GART memory as well" due to CZ/ST hardware
>>     limitation.
>>
>>
>>     Well that is still a NAK.
>>
>>     As discussed now multiple times please implement the necessary
>>     changes in Mesa.
>>
>>     Regards,
>>     Christian.
>>
>>
>>
>>     v2: Change default setting to auto, also some misc changes.
>>     Signed-off-by: Samuel Li >     >
>>     ---
>>       drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
>>       drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
>>       drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
>>       drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
>>       drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
>>       drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
>>       6 files changed, 19 insertions(+), 3 deletions(-)
>>
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>     index a7e2229..c942362 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>     @@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
>>       extern int amdgpu_compute_multipipe;
>>       extern int amdgpu_gpu_recovery;
>>       extern int amdgpu_emu_mode;
>>     +extern int amdgpu_sg_display;
>>         #ifdef CONFIG_DRM_AMDGPU_SI
>>       extern int amdgpu_si_support;
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>     index 5495b29..1e7b950 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>     @@ -513,8 +513,14 @@ uint32_t
>>     amdgpu_display_framebuffer_domains(struct amdgpu_device *adev)
>>       #if defined(CONFIG_DRM_AMD_DC)
>>             if (adev->asic_type >= CHIP_CARRIZO && adev->asic_type
>>     < CHIP_RAVEN &&
>>                 adev->flags & AMD_IS_APU &&
>>     -           amdgpu_device_asic_has_dc_support(adev->asic_type))
>>     -               domain |= AMDGPU_GEM_DOMAIN_GTT;
>>     +           amdgpu_device_asic_has_dc_support(adev->asic_type)) {
>>     +               if (amdgpu_sg_display == 1)
>>     +                       domain = AMDGPU_GEM_DOMAIN_GTT;
>>     +               else if (amdgpu_sg_display == -1) {
>>     +                       if (adev->gmc.real_vram_size <
>>     AMDGPU_SG_THRESHOLD)
>>     +                               domain = AMDGPU_GEM_DOMAIN_GTT;
>>     +               }
>>     +       }
>>       #endif
>>             return domain;
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>>     index 2b11d80..2b25393 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
>>     @@ -23,6 +23,8 @@
>>       #ifndef __AMDGPU_DISPLAY_H__
>>       #define __AMDGPU_DISPLAY_H__
>>       +#define AMDGPU_SG_THRESHOLD  (256*1024*1024)
>>     +
>>       uint32_t amdgpu_display_framebuffer_domains(struct
>>     amdgpu_device *adev);
>>       struct drm_framebuffer *
>>       amdgpu_display_user_framebuffer_create(struct drm_device *dev,
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>     index 1bfce79..19f1

Re: RFC for a render API to support adaptive sync and VRR

2018-04-12 Thread Harry Wentland
On 2018-04-12 07:39 AM, Nicolai Hähnle wrote:
> On 12.04.2018 01:30, Cyr, Aric wrote:
>>> At least with VDPAU, video players are already explicitly specifying the
>>> target presentation time, so no changes should be required at that
>>> level. Don't know about other video APIs.
>>>
>>> The X11 Present extension protocol is also prepared for specifying the
>>> target presentation time already, the support for it just needs to be
>>> implemented.
>>
>> I'm perfectly OK with presentation time-based *API*.  I get it from a user 
>> mode/app perspective, and that's fine.  We need that feedback and would like 
>> help defining that portions of the stack.
>> However, I think it doesn't make as much sense as a *DDI* because it doesn't 
>> correspond to any hardware real or logical (i.e. no one would implement it 
>> in HW this way) and the industry specs aren't defined that way.
>> You can have libdrm or some other usermode component translate your 
>> presentation time into a frame duration and schedule it.  What's the 
>> advantage of having this in kernel besides the fact we lose the intent of 
>> the application and could prevent features and optimizations.  When it gets 
>> to kernel, I think it is much more elegant for the flip structure to contain 
>> a simple duration that says "hey, show this frame on the screen for this 
>> long".  Then we don't need any clocks or timers just some simple math and 
>> program the hardware.
> 
> There isn't necessarily an inherent advantage to having this translation in 
> the kernel. However, we *must* do this translation in a place that is owned 
> by display experts (i.e., you guys), because only you guys know how to 
> actually do that translation reliably and correctly.
> 
> Since your work is currently limited to the kernel, it makes sense to do it 
> in the kernel.

We're actively trying to change this. I want us (the display team) to 
eventually own anything display related across the stack, or at least closely 
work with the owners of those components.

> 
> If the translation doesn't happen in a place that you feel comfortable 
> working on, we're setting ourselves up for a future where this hypothetical 
> future UMD component will get this wrong, and there'll be a lot of 
> finger-pointing between you guys and whoever writes that UMD, with likely 
> little willingness to actually go into the respective other codebase to fix 
> what's wrong. And that's a pretty sucky future.
> 

If finger-pointing happened I'd like to apologize. Again, this is something we 
actively try to change.

Ultimately I'm looking for a solution that works for everyone and is not owned 
on a SW component basis, but rather expertise basis.

Harry

> Cheers,
> Nicolai
> 
> P.S.: I'm also a little surprised that you seem to be saying that requesting 
> a target present time is basically impossible (at least, that's kind of 
> implied by your statement about mGPUs), and yet there's precedent for such 
> APIs in both Vulkan and VDPAU.
> 
> 
>>
>> In short,
>>   1) We can simplify media players' lives by helping them get really, really 
>> close to their content rate, so they wouldn't need any frame rate conversion.
>>   They'll still need A/V syncing though, and variable refresh cannot 
>> solve this and thus is way out of scope of what we're proposing.
>>
>>   2) For gaming, don't even try to guess a frame duration, the 
>> driver/hardware will do a better job every time, just specify duration=0 and 
>> flip as fast as you can.
>>
>> Regards,
>>    Aric
>>
>> P.S. Thanks for the Croteam link.  Interesting, but basically nullified by 
>> variable refresh rate displays.  You won't have 
>> stuttering/microstuttering/juddering/tearing if your display's refresh rate 
>> matches the render/present rate of the game.  Maybe I should grab The Talos 
>> Principle to see how well it works with FreeSync display :)
>>
>> -- 
>> ARIC CYR
>> PMTS Software Engineer | SW – Display Technologies
>>
>>
>>
>>
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing

2018-04-12 Thread Robin Murphy

On 11/04/18 19:28, Christian König wrote:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:

Now that drm_prime_sg_to_page_addr_arrays() understands the case where
dma_map_sg() has coalesced segments and returns 0 < count < nents, we
can relax the check to only consider genuine failure.


That pattern is repeated in pretty much all drivers using TTM.


AFAICS from a look through drivers/gpu/ only 3 drivers consider the 
actual value of what dma_map_sg() returns - others only handle the 
failure case of 0, and some don't check it at all - and of those it's 
only amdgpu and radeon barfing on it being different from nents (vmwgfx 
appears to just stash it to use slightly incorrectly later).


So you would need to fix all of them, but as I said I don't think that 
this approach is a good idea.


We essentially wanted to get rid of the dma_address array in the mid 
term and that change goes into the exactly opposite direction.


But this patch bears no intention of making any fundamental change to 
the existing behaviour of this particular driver, it simply permits said 
behaviour to work at all on more systems than it currently does. I would 
consider that entirely orthogonal to future TTM-wide development :/


Robin.



Regards,
Christian.



Signed-off-by: Robin Murphy 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 205da3ff9cd0..f81e96a4242f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt 
*ttm)

  r = -ENOMEM;
  nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, 
direction);

-    if (nents != ttm->sg->nents)
+    if (nents == 0)
  goto release_sg;
  drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Christian König

1) Turn off immediate mode when flipping between VRAM/GTT.

That must be implemented independently.

See working around the hardware bug should be a different patch than 
implementing a placement policy.



As per discussion, the 3rd one, which is the current patch, seems the best so 
far.
And I can only repeat myself. Alex and I are the maintainers of the 
kernel module, so we are the one who decide on how to implement things here.


And we both noted that this approach of overriding user space decisions 
is not a good design.


The placement policy I suggest by preferring GTT over VRAM on APUs 
should be trivial to implement and as far as I can see avoids all 
negative side effects.


Regards,
Christian.

Am 12.04.2018 um 19:21 schrieb Samuel Li:

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Three options have been brought up for kernel,
1) Turn off immediate mode when flipping between VRAM/GTT.
2) Check the domain of the current fb and then adjust the new one before 
pinning it.
3) Use only VRAM or GTT depending on a threshhold.

As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

Regards,
Samuel Li



On 2018-04-12 01:03 PM, Christian König wrote:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:

Completely agree, that's what I suggested to implement.

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Regards,
Christian.

Am 12.04.2018 um 18:40 schrieb Marek Olšák:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

Marek

On Wed, Mar 28, 2018 at 3:46 AM, Christian König mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

     Am 28.03.2018 um 00:22 schrieb Samuel Li:

     It's auto by default. For CZ/ST, auto setting enables sg display
     when vram size is small; otherwise still uses vram.
     This patch fixed some potential hang issue introduced by change
     "allow framebuffer in GART memory as well" due to CZ/ST hardware
     limitation.


     Well that is still a NAK.

     As discussed now multiple times please implement the necessary
     changes in Mesa.

     Regards,
     Christian.



     v2: Change default setting to auto, also some misc changes.
     Signed-off-by: Samuel Li mailto:samuel...@amd.com>>
     ---
       drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
       drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
       drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
       drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
       drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
       drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
       6 files changed, 19 insertions(+), 3 deletions(-)

     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
     b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
     index a7e2229..c942362 100644
     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
     @@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
       extern int amdgpu_compute_multipipe;
       extern int amdgpu_gpu_recovery;
       extern int amdgpu_emu_mode;
     +extern int amdgpu_sg_display;
         #ifdef CONFIG_DRM_AMDGPU_SI
       extern int amdgpu_si_support;
     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
     b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
     index 5495b29..1e7b950 100644
     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
     @@ -513,8 +513,14 @@ uint32_t
     amdgpu_display_framebuffer_domains(struct amdgpu_device *adev)
       #if defined(CONFIG_DRM_AMD_DC)
             if (adev->asic_type >= CHIP_CARRIZO && adev->asic_type
     < CHIP_RAVEN &&
                 adev->flags & AMD_IS_APU &&
     -           amdgpu_device_asic_has_dc_support(adev->asic_type))
     -               domain |= AMDGPU_GEM_DOMAIN_GTT;
     +           amdgpu_device_asic_has_dc_support(adev->asic_type)) {
     +               if (amdgpu_sg_display == 1)
     +                       domain = AMDGPU_GEM_DOMAIN_GTT;
     +               else if (amdgpu_sg_display == -1) {
     +                       if (adev->gmc.real_vram_size <
     AMDGPU_SG_THRESHOLD)
     +                               domain = AMDGPU_GEM_DOMAIN_GTT;
     +               }
     +       }
       #endif
             return domain;
     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
     b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
     index 2b11d80..2b25393 100644
     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h
     +++ b/drivers/gpu/drm/amd/amdgpu/amdgp

Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Samuel Li
Please clarify, Christian. How would you like it to be implemented?

Sam


On 2018-04-12 02:00 PM, Christian König wrote:
>> 1) Turn off immediate mode when flipping between VRAM/GTT.
> That must be implemented independently.
> 
> See working around the hardware bug should be a different patch than 
> implementing a placement policy.
> 
>> As per discussion, the 3rd one, which is the current patch, seems the best 
>> so far.
> And I can only repeat myself. Alex and I are the maintainers of the kernel 
> module, so we are the one who decide on how to implement things here.
> 
> And we both noted that this approach of overriding user space decisions is 
> not a good design.
> 
> The placement policy I suggest by preferring GTT over VRAM on APUs should be 
> trivial to implement and as far as I can see avoids all negative side effects.
> 
> Regards,
> Christian.
> 
> Am 12.04.2018 um 19:21 schrieb Samuel Li:
>>> The point is this kernel change now needs to be reworked and adapted to 
>>> what Mesa is doing.
>> Three options have been brought up for kernel,
>> 1) Turn off immediate mode when flipping between VRAM/GTT.
>> 2) Check the domain of the current fb and then adjust the new one before 
>> pinning it.
>> 3) Use only VRAM or GTT depending on a threshhold.
>>
>> As per discussion, the 3rd one, which is the current patch, seems the best 
>> so far.
>>
>> Regards,
>> Samuel Li
>>
>>
>>
>> On 2018-04-12 01:03 PM, Christian König wrote:
 Can you be more specific, Christian? Mesa has this, I don't think it needs 
 anything else:
>>> Completely agree, that's what I suggested to implement.
>>>
>>> The point is this kernel change now needs to be reworked and adapted to 
>>> what Mesa is doing.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 12.04.2018 um 18:40 schrieb Marek Olšák:
 Can you be more specific, Christian? Mesa has this, I don't think it needs 
 anything else:
 https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

 Marek

 On Wed, Mar 28, 2018 at 3:46 AM, Christian König 
 >>> > wrote:

  Am 28.03.2018 um 00:22 schrieb Samuel Li:

  It's auto by default. For CZ/ST, auto setting enables sg display
  when vram size is small; otherwise still uses vram.
  This patch fixed some potential hang issue introduced by change
  "allow framebuffer in GART memory as well" due to CZ/ST hardware
  limitation.


  Well that is still a NAK.

  As discussed now multiple times please implement the necessary
  changes in Mesa.

  Regards,
  Christian.



  v2: Change default setting to auto, also some misc changes.
  Signed-off-by: Samuel Li >>>  >
  ---
    drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
    drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
    drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
    6 files changed, 19 insertions(+), 3 deletions(-)

  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  index a7e2229..c942362 100644
  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  @@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
    extern int amdgpu_compute_multipipe;
    extern int amdgpu_gpu_recovery;
    extern int amdgpu_emu_mode;
  +extern int amdgpu_sg_display;
      #ifdef CONFIG_DRM_AMDGPU_SI
    extern int amdgpu_si_support;
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  index 5495b29..1e7b950 100644
  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  @@ -513,8 +513,14 @@ uint32_t
  amdgpu_display_framebuffer_domains(struct amdgpu_device *adev)
    #if defined(CONFIG_DRM_AMD_DC)
          if (adev->asic_type >= CHIP_CARRIZO && adev->asic_type
  < CHIP_RAVEN &&
              adev->flags & AMD_IS_APU &&
  -           amdgpu_device_asic_has_dc_support(adev->asic_type))
  -               domain |= AMDGPU_GEM_DOMAIN_GTT;
  +           amdgpu_device_asic_has_dc_support(adev->asic_type)) {
  +               if (amdgpu_sg_display == 1)
  +                       domain = AMDGPU_GEM_DOMAIN_GTT;
>>

Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Christian König

Patch #1: Avoid the hardware bug!

E.g. even when we avoid different placements it would be good to have a 
patch which turns off immediate flipping when switching from VRAM to GTT.


That is as safety net and to document that we need to avoid this 
condition on the hardware.


Patch #2: Go into amdgpu_bo_pin_restricted() and change the pinning 
preference.


In other words add something like the following:

if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
AMDGPU_GEM_DOMAIN_GTT)

    domain = AMDGPU_GEM_DOMAIN_GTT;

That should be everything we need here.

Regards,
Christian.

Am 12.04.2018 um 20:07 schrieb Samuel Li:

Please clarify, Christian. How would you like it to be implemented?

Sam


On 2018-04-12 02:00 PM, Christian König wrote:

1) Turn off immediate mode when flipping between VRAM/GTT.

That must be implemented independently.

See working around the hardware bug should be a different patch than 
implementing a placement policy.


As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

And I can only repeat myself. Alex and I are the maintainers of the kernel 
module, so we are the one who decide on how to implement things here.

And we both noted that this approach of overriding user space decisions is not 
a good design.

The placement policy I suggest by preferring GTT over VRAM on APUs should be 
trivial to implement and as far as I can see avoids all negative side effects.

Regards,
Christian.

Am 12.04.2018 um 19:21 schrieb Samuel Li:

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Three options have been brought up for kernel,
1) Turn off immediate mode when flipping between VRAM/GTT.
2) Check the domain of the current fb and then adjust the new one before 
pinning it.
3) Use only VRAM or GTT depending on a threshhold.

As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

Regards,
Samuel Li



On 2018-04-12 01:03 PM, Christian König wrote:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:

Completely agree, that's what I suggested to implement.

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Regards,
Christian.

Am 12.04.2018 um 18:40 schrieb Marek Olšák:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

Marek

On Wed, Mar 28, 2018 at 3:46 AM, Christian König mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

  Am 28.03.2018 um 00:22 schrieb Samuel Li:

  It's auto by default. For CZ/ST, auto setting enables sg display
  when vram size is small; otherwise still uses vram.
  This patch fixed some potential hang issue introduced by change
  "allow framebuffer in GART memory as well" due to CZ/ST hardware
  limitation.


  Well that is still a NAK.

  As discussed now multiple times please implement the necessary
  changes in Mesa.

  Regards,
  Christian.



  v2: Change default setting to auto, also some misc changes.
  Signed-off-by: Samuel Li mailto:samuel...@amd.com>>
  ---
    drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
    drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
    drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
    6 files changed, 19 insertions(+), 3 deletions(-)

  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  index a7e2229..c942362 100644
  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
  @@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
    extern int amdgpu_compute_multipipe;
    extern int amdgpu_gpu_recovery;
    extern int amdgpu_emu_mode;
  +extern int amdgpu_sg_display;
      #ifdef CONFIG_DRM_AMDGPU_SI
    extern int amdgpu_si_support;
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  index 5495b29..1e7b950 100644
  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
  @@ -513,8 +513,14 @@ uint32_t
  amdgpu_display_framebuffer_domains(struct amdgpu_device *adev)
    #if defined(CONFIG_DRM_AMD_DC)
          if (adev->asic_type >= CHIP_CARRIZO && adev->asic_type
  < CHIP_RAVEN &&
              adev->flags & AMD_IS_APU &&
  -           amdgpu_device_asic_has_dc_support(adev->asic_type))
  -               domain |= 

Re: [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing

2018-04-12 Thread Christian König

Am 12.04.2018 um 19:53 schrieb Robin Murphy:

On 11/04/18 19:28, Christian König wrote:

Am 11.04.2018 um 19:11 schrieb Robin Murphy:

Now that drm_prime_sg_to_page_addr_arrays() understands the case where
dma_map_sg() has coalesced segments and returns 0 < count < nents, we
can relax the check to only consider genuine failure.


That pattern is repeated in pretty much all drivers using TTM.


AFAICS from a look through drivers/gpu/ only 3 drivers consider the 
actual value of what dma_map_sg() returns - others only handle the 
failure case of 0, and some don't check it at all - and of those it's 
only amdgpu and radeon barfing on it being different from nents 
(vmwgfx appears to just stash it to use slightly incorrectly later).


So you would need to fix all of them, but as I said I don't think 
that this approach is a good idea.


We essentially wanted to get rid of the dma_address array in the mid 
term and that change goes into the exactly opposite direction.


But this patch bears no intention of making any fundamental change to 
the existing behaviour of this particular driver, it simply permits 
said behaviour to work at all on more systems than it currently does. 
I would consider that entirely orthogonal to future TTM-wide 
development :/


That's a really good point. Please try to fix radeon as well and maybe 
ping the vmwgfx maintainers.


With that in place I'm perfectly ok with going ahead with that approach.

Christian.



Robin.



Regards,
Christian.



Signed-off-by: Robin Murphy 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 205da3ff9cd0..f81e96a4242f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct 
ttm_tt *ttm)

  r = -ENOMEM;
  nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, 
direction);

-    if (nents != ttm->sg->nents)
+    if (nents == 0)
  goto release_sg;
  drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/radeon: change function signature to pass full range

2018-04-12 Thread Mathieu Malaterre
In function ‘radeon_process_i2c_ch’ a comparison of a u8 value against
255 is done. Since it is always false, change the signature of this
function to use an `int` instead, which match the type used in caller:
`radeon_atom_hw_i2c_xfer`.

Fix the following warning triggered with W=1:

  CC [M]  drivers/gpu/drm/radeon/atombios_i2c.o
  drivers/gpu/drm/radeon/atombios_i2c.c: In function ‘radeon_process_i2c_ch’:
  drivers/gpu/drm/radeon/atombios_i2c.c:71:11: warning: comparison is always 
false due to limited range of data type [-Wtype-limits]
   if (num > ATOM_MAX_HW_I2C_READ) {
   ^

Signed-off-by: Mathieu Malaterre 
---
 drivers/gpu/drm/radeon/atombios_i2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/atombios_i2c.c 
b/drivers/gpu/drm/radeon/atombios_i2c.c
index 4157780585a0..9022e9af11a0 100644
--- a/drivers/gpu/drm/radeon/atombios_i2c.c
+++ b/drivers/gpu/drm/radeon/atombios_i2c.c
@@ -35,7 +35,7 @@
 
 static int radeon_process_i2c_ch(struct radeon_i2c_chan *chan,
 u8 slave_addr, u8 flags,
-u8 *buf, u8 num)
+u8 *buf, int num)
 {
struct drm_device *dev = chan->dev;
struct radeon_device *rdev = dev->dev_private;
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-12 Thread Andrey Grodzovsky



On 04/12/2018 11:10 AM, Michel Dänzer wrote:

On 2018-04-12 04:58 PM, Andrey Grodzovsky wrote:

On 04/12/2018 10:33 AM, Michel Dänzer wrote:

On 2018-04-12 03:33 PM, Alex Deucher wrote:

On Thu, Apr 12, 2018 at 7:17 AM, Andrey Grodzovsky
 wrote:

On 04/12/2018 12:32 AM, Alex Deucher wrote:

On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
 wrote:

Reserved VRAM is used to avoid overriding pre OS FB.
Once our display stack takes over we don't need the reserved
VRAM anymore.

v2:
Remove comment, we know actually why we need to reserve the stolen
VRAM.
Fix return type for amdgpu_ttm_late_init.
v3:
Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
v4:
Don't release stolen memory for GMC9 ASICs untill GART corruption
on S3 resume is resolved.

Signed-off-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher 

[...]

Not sure what it means ?

It means I trimmed some quoted text. Would it be clearer if I put
quotation markers before it?


Got it now.





diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 252a6c69..099e3ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -659,6 +659,16 @@ static int gmc_v9_0_late_init(void *handle)
   unsigned i;
   int r;

+   /*
+    * TODO:
+    * Currently there is a bug where some memory client outside
+    * of the driver writes to first 8M of VRAM on S3 resume,
+    * this overrides GART which by default gets placed in
first 8M
and
+    * causes VM_FAULTS once GTT is accessed.
+    * Keep the stolen memory reservation until this solved.
+    */
+   /* amdgpu_bo_late_init(adev); /
+

We still need to free this somewhere.  I'd suggest calling it in
gmc_v9_0_sw_fini() and add a comment there about moving it when we fix
the issue.


   for(i = 0; i < adev->num_rings; ++i) {
   struct amdgpu_ring *ring = adev->rings[i];
   unsigned vmhub = ring->funcs->vmhub;
@@ -884,10 +894,6 @@ static int gmc_v9_0_sw_init(void *handle)
    */
   adev->gmc.mc_mask = 0xULL; /* 48 bit MC */

-   /*
-    * It needs to reserve 8M stolen memory for vega10
-    * TODO: Figure out how to avoid that...
-    */
   adev->gmc.stolen_size = gmc_v9_0_get_vbios_fb_size(adev);

We may also just want to return 8MB or 9MB temporarily in
gmc_v9_0_get_vbios_fb_size until we sort out the root cause of the S3
issue otherwise we're potentially wasting a lot more memory.

But what if we have 4k display ? In this case returning 9M probably
will not
hide the corruption  we were originally dealing with. I remember in
that
case pre OS FB size would be 32M.

I guess it's a trade off, possible garbage monentary during bios to
driver transition vs. wasting an additional 24 MB of CPU accessible
vram for the life of the driver.

Can we free the reserved memory after initialization, then reserve it
again on resume?

The issue here was that someone overrides the first 8M of VRAM and
corrupts the GART table, which causes VM_FAULTS. Until we find who is
writing into this area of VRAM and when exactly I think we cannot allow
any data to be placed there since it's might get corrupted (even if we
avoid placing the GART table there).

I think it shouldn't be too hard in general to make sure the GART table,
and any other BOs which stay in VRAM across suspend/resume, don't fall
in the affected area.


Not sure I understand how easily to search for all this kind of objects 
across all IPs. Also how can we be sure the
effected region is ran over only during resume, we know that GART table 
is ran over during resume but we can't be
sure other areas in that region are not ran over during other times and 
if so it's dangerous to allow allocations there.


Andrey






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Samuel Li
The 4th proposal :)

> In other words add something like the following:
> 
> if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
> AMDGPU_GEM_DOMAIN_GTT)
> domain = AMDGPU_GEM_DOMAIN_GTT;
> 
> That should be everything we need here.

This is basically against the idea of Marek's change: in Mesa, both GTT/VRAM 
are allowed; but now in your kernel change, all buffers uses GTT only(not 
limited to display buffer now).
To compare, current patch still seems better, since it only circumscribes 
display buffer.

Sam


On 2018-04-12 02:47 PM, Christian König wrote:
> Patch #1: Avoid the hardware bug!
> 
> E.g. even when we avoid different placements it would be good to have a patch 
> which turns off immediate flipping when switching from VRAM to GTT.
> 
> That is as safety net and to document that we need to avoid this condition on 
> the hardware.
> 
> Patch #2: Go into amdgpu_bo_pin_restricted() and change the pinning 
> preference.
> 
> In other words add something like the following:
> 
> if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
> AMDGPU_GEM_DOMAIN_GTT)
>     domain = AMDGPU_GEM_DOMAIN_GTT;
> 
> That should be everything we need here.
> 
> Regards,
> Christian.
> 
> Am 12.04.2018 um 20:07 schrieb Samuel Li:
>> Please clarify, Christian. How would you like it to be implemented?
>>
>> Sam
>>
>>
>> On 2018-04-12 02:00 PM, Christian König wrote:
 1) Turn off immediate mode when flipping between VRAM/GTT.
>>> That must be implemented independently.
>>>
>>> See working around the hardware bug should be a different patch than 
>>> implementing a placement policy.
>>>
 As per discussion, the 3rd one, which is the current patch, seems the best 
 so far.
>>> And I can only repeat myself. Alex and I are the maintainers of the kernel 
>>> module, so we are the one who decide on how to implement things here.
>>>
>>> And we both noted that this approach of overriding user space decisions is 
>>> not a good design.
>>>
>>> The placement policy I suggest by preferring GTT over VRAM on APUs should 
>>> be trivial to implement and as far as I can see avoids all negative side 
>>> effects.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 12.04.2018 um 19:21 schrieb Samuel Li:
> The point is this kernel change now needs to be reworked and adapted to 
> what Mesa is doing.
 Three options have been brought up for kernel,
 1) Turn off immediate mode when flipping between VRAM/GTT.
 2) Check the domain of the current fb and then adjust the new one before 
 pinning it.
 3) Use only VRAM or GTT depending on a threshhold.

 As per discussion, the 3rd one, which is the current patch, seems the best 
 so far.

 Regards,
 Samuel Li



 On 2018-04-12 01:03 PM, Christian König wrote:
>> Can you be more specific, Christian? Mesa has this, I don't think it 
>> needs anything else:
> Completely agree, that's what I suggested to implement.
>
> The point is this kernel change now needs to be reworked and adapted to 
> what Mesa is doing.
>
> Regards,
> Christian.
>
> Am 12.04.2018 um 18:40 schrieb Marek Olšák:
>> Can you be more specific, Christian? Mesa has this, I don't think it 
>> needs anything else:
>> https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b
>>
>> Marek
>>
>> On Wed, Mar 28, 2018 at 3:46 AM, Christian König 
>> > > wrote:
>>
>>   Am 28.03.2018 um 00:22 schrieb Samuel Li:
>>
>>   It's auto by default. For CZ/ST, auto setting enables sg 
>> display
>>   when vram size is small; otherwise still uses vram.
>>   This patch fixed some potential hang issue introduced by change
>>   "allow framebuffer in GART memory as well" due to CZ/ST 
>> hardware
>>   limitation.
>>
>>
>>   Well that is still a NAK.
>>
>>   As discussed now multiple times please implement the necessary
>>   changes in Mesa.
>>
>>   Regards,
>>   Christian.
>>
>>
>>
>>   v2: Change default setting to auto, also some misc changes.
>>   Signed-off-by: Samuel Li >   >
>>   ---
>>     drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
>>     drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 
>> --
>>     drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
>>     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
>>     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
>>     6 files changed, 19 insertions(+), 3 deletions(-)
>>
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>   b/drivers/

Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Christian König

Am 12.04.2018 um 22:01 schrieb Samuel Li:

The 4th proposal :)


In other words add something like the following:

if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
AMDGPU_GEM_DOMAIN_GTT)
 domain = AMDGPU_GEM_DOMAIN_GTT;

That should be everything we need here.

This is basically against the idea of Marek's change: in Mesa, both GTT/VRAM 
are allowed; but now in your kernel change, all buffers uses GTT only(not 
limited to display buffer now).
To compare, current patch still seems better, since it only circumscribes 
display buffer.


What I suggested here is for pinning preference, that only affects BOs 
used for scanout and it also only affects them while they are scanned out.


Please implement as advised or otherwise we need to assign the work to 
somebody else.


Thanks,
Christian.



Sam


On 2018-04-12 02:47 PM, Christian König wrote:

Patch #1: Avoid the hardware bug!

E.g. even when we avoid different placements it would be good to have a patch 
which turns off immediate flipping when switching from VRAM to GTT.

That is as safety net and to document that we need to avoid this condition on 
the hardware.

Patch #2: Go into amdgpu_bo_pin_restricted() and change the pinning preference.

In other words add something like the following:

if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
AMDGPU_GEM_DOMAIN_GTT)
     domain = AMDGPU_GEM_DOMAIN_GTT;

That should be everything we need here.

Regards,
Christian.

Am 12.04.2018 um 20:07 schrieb Samuel Li:

Please clarify, Christian. How would you like it to be implemented?

Sam


On 2018-04-12 02:00 PM, Christian König wrote:

1) Turn off immediate mode when flipping between VRAM/GTT.

That must be implemented independently.

See working around the hardware bug should be a different patch than 
implementing a placement policy.


As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

And I can only repeat myself. Alex and I are the maintainers of the kernel 
module, so we are the one who decide on how to implement things here.

And we both noted that this approach of overriding user space decisions is not 
a good design.

The placement policy I suggest by preferring GTT over VRAM on APUs should be 
trivial to implement and as far as I can see avoids all negative side effects.

Regards,
Christian.

Am 12.04.2018 um 19:21 schrieb Samuel Li:

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Three options have been brought up for kernel,
1) Turn off immediate mode when flipping between VRAM/GTT.
2) Check the domain of the current fb and then adjust the new one before 
pinning it.
3) Use only VRAM or GTT depending on a threshhold.

As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

Regards,
Samuel Li



On 2018-04-12 01:03 PM, Christian König wrote:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:

Completely agree, that's what I suggested to implement.

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Regards,
Christian.

Am 12.04.2018 um 18:40 schrieb Marek Olšák:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

Marek

On Wed, Mar 28, 2018 at 3:46 AM, Christian König mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

   Am 28.03.2018 um 00:22 schrieb Samuel Li:

   It's auto by default. For CZ/ST, auto setting enables sg display
   when vram size is small; otherwise still uses vram.
   This patch fixed some potential hang issue introduced by change
   "allow framebuffer in GART memory as well" due to CZ/ST hardware
   limitation.


   Well that is still a NAK.

   As discussed now multiple times please implement the necessary
   changes in Mesa.

   Regards,
   Christian.



   v2: Change default setting to auto, also some misc changes.
   Signed-off-by: Samuel Li mailto:samuel...@amd.com>>
   ---
     drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
     drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
     drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
     drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2 ++
     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  3 ++-
     6 files changed, 19 insertions(+), 3 deletions(-)

   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
   b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
   index a7e2229..c942362 100644
   --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
   +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
   @@ -129,6 +129,7 @@ extern int amdgpu_lbpw;
     extern int amdgpu_compute_multipipe;
   

[pull] radeon and amdgpu drm-next-4.17

2018-04-12 Thread Alex Deucher
Hi Dave,

More fixes for 4.17 and stable:
- Add a PX quirk for radeon
- Fix flickering and stability issues with DC on some platforms
- Fix HDMI audio regression
- Few other misc DC and base driver fixes

The following changes since commit 871e899db19da3dbd17a5d263b555dc5b7d8fed5:

  Merge branch 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux 
into drm-next (2018-04-11 08:35:41 +1000)

are available in the git repository at:

  git://people.freedesktop.org/~agd5f/linux drm-next-4.17

for you to fetch changes up to 1cb19e8267a57c5174da09e0d52d1477baceccca:

  Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane 
(V2)" (2018-04-12 14:19:43 -0500)


Charlene Liu (1):
  drm/amd/display: HDMI has no sound after Panel power off/on

Evan Quan (1):
  drm/amdgpu: add MP1 and THM hw ip base reg offset

Harry Wentland (3):
  drm/amd/display: Only register backlight device if embedded panel 
connected
  Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"
  Revert "drm/amd/display: disable CRTCs with NULL FB on their primary 
plane (V2)"

Huang Rui (1):
  drm/amdgpu: fix null pointer panic with direct fw loading on gpu reset

Leo (Sunpeng) Li (1):
  drm/amd/display: Fix regamma not affecting full-intensity color values

Nico Sneck (1):
  drm/radeon: add PX quirk for Asus K73TK

Roman Li (2):
  drm/amd/display: fix brightness level after resume from suspend
  drm/amd/display: Fix FBC text console corruption

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  3 +
 drivers/gpu/drm/amd/amdgpu/vega10_reg_init.c   |  3 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 89 --
 drivers/gpu/drm/amd/display/dc/core/dc_link.c  | 13 
 drivers/gpu/drm/amd/display/dc/dc_link.h   |  2 +
 .../drm/amd/display/dc/dce/dce_stream_encoder.c|  2 +
 .../drm/amd/display/dc/dce110/dce110_compressor.c  | 67 
 .../amd/display/dc/dce110/dce110_hw_sequencer.c| 13 +++-
 drivers/gpu/drm/radeon/radeon_device.c |  4 +
 10 files changed, 124 insertions(+), 74 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Samuel Li
Hi Christian,

If you have any good proposal, please describe here.
Otherwise kindly avoid saying anything based on your emotion.

Regards,
Sam



On 2018-04-12 04:13 PM, Christian König wrote:
> Am 12.04.2018 um 22:01 schrieb Samuel Li:
>> The 4th proposal :)
>>
>>> In other words add something like the following:
>>>
>>> if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
>>> AMDGPU_GEM_DOMAIN_GTT)
>>>  domain = AMDGPU_GEM_DOMAIN_GTT;
>>>
>>> That should be everything we need here.
>> This is basically against the idea of Marek's change: in Mesa, both GTT/VRAM 
>> are allowed; but now in your kernel change, all buffers uses GTT only(not 
>> limited to display buffer now).
>> To compare, current patch still seems better, since it only circumscribes 
>> display buffer.
> 
> What I suggested here is for pinning preference, that only affects BOs used 
> for scanout and it also only affects them while they are scanned out.
> 
> Please implement as advised or otherwise we need to assign the work to 
> somebody else.
> 
> Thanks,
> Christian.
> 
>>
>> Sam
>>
>>
>> On 2018-04-12 02:47 PM, Christian König wrote:
>>> Patch #1: Avoid the hardware bug!
>>>
>>> E.g. even when we avoid different placements it would be good to have a 
>>> patch which turns off immediate flipping when switching from VRAM to GTT.
>>>
>>> That is as safety net and to document that we need to avoid this condition 
>>> on the hardware.
>>>
>>> Patch #2: Go into amdgpu_bo_pin_restricted() and change the pinning 
>>> preference.
>>>
>>> In other words add something like the following:
>>>
>>> if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
>>> AMDGPU_GEM_DOMAIN_GTT)
>>>  domain = AMDGPU_GEM_DOMAIN_GTT;
>>>
>>> That should be everything we need here.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 12.04.2018 um 20:07 schrieb Samuel Li:
 Please clarify, Christian. How would you like it to be implemented?

 Sam


 On 2018-04-12 02:00 PM, Christian König wrote:
>> 1) Turn off immediate mode when flipping between VRAM/GTT.
> That must be implemented independently.
>
> See working around the hardware bug should be a different patch than 
> implementing a placement policy.
>
>> As per discussion, the 3rd one, which is the current patch, seems the 
>> best so far.
> And I can only repeat myself. Alex and I are the maintainers of the 
> kernel module, so we are the one who decide on how to implement things 
> here.
>
> And we both noted that this approach of overriding user space decisions 
> is not a good design.
>
> The placement policy I suggest by preferring GTT over VRAM on APUs should 
> be trivial to implement and as far as I can see avoids all negative side 
> effects.
>
> Regards,
> Christian.
>
> Am 12.04.2018 um 19:21 schrieb Samuel Li:
>>> The point is this kernel change now needs to be reworked and adapted to 
>>> what Mesa is doing.
>> Three options have been brought up for kernel,
>> 1) Turn off immediate mode when flipping between VRAM/GTT.
>> 2) Check the domain of the current fb and then adjust the new one before 
>> pinning it.
>> 3) Use only VRAM or GTT depending on a threshhold.
>>
>> As per discussion, the 3rd one, which is the current patch, seems the 
>> best so far.
>>
>> Regards,
>> Samuel Li
>>
>>
>>
>> On 2018-04-12 01:03 PM, Christian König wrote:
 Can you be more specific, Christian? Mesa has this, I don't think it 
 needs anything else:
>>> Completely agree, that's what I suggested to implement.
>>>
>>> The point is this kernel change now needs to be reworked and adapted to 
>>> what Mesa is doing.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 12.04.2018 um 18:40 schrieb Marek Olšák:
 Can you be more specific, Christian? Mesa has this, I don't think it 
 needs anything else:
 https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

 Marek

 On Wed, Mar 28, 2018 at 3:46 AM, Christian König 
 >>> > wrote:

    Am 28.03.2018 um 00:22 schrieb Samuel Li:

    It's auto by default. For CZ/ST, auto setting enables sg 
 display
    when vram size is small; otherwise still uses vram.
    This patch fixed some potential hang issue introduced by 
 change
    "allow framebuffer in GART memory as well" due to CZ/ST 
 hardware
    limitation.


    Well that is still a NAK.

    As discussed now multiple times please implement the necessary
    changes in Mesa.

    Regards,
    Christian.




Re: RFC for a render API to support adaptive sync and VRR

2018-04-12 Thread Stéphane Marchesin
On Tue, Apr 10, 2018 at 12:37 AM, Michel Dänzer  wrote:
> On 2018-04-10 08:45 AM, Christian König wrote:
>> Am 09.04.2018 um 23:45 schrieb Manasi Navare:
>>> Thanks for initiating the discussion. Find my comments below:
>>> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
 On 2018-04-09 03:56 PM, Harry Wentland wrote:
>
> === A DRM render API to support variable refresh rates ===
>
> In order to benefit from adaptive sync and VRR userland needs a way
> to let us know whether to vary frame timings or to target a
> different frame time. These can be provided as atomic properties on
> a CRTC:
>   * boolvariable_refresh_compatible
>   * inttarget_frame_duration_ns (nanosecond frame duration)
>
> This gives us the following cases:
>
> variable_refresh_compatible = 0, target_frame_duration_ns = 0
>   * drive monitor at timing's normal refresh rate
>
> variable_refresh_compatible = 1, target_frame_duration_ns = 0
>   * send new frame to monitor as soon as it's available, if within
> min/max of monitor's reported capabilities
>
> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
>   * send new frame to monitor with the specified
> target_frame_duration_ns
>
> When a target_frame_duration_ns or variable_refresh_compatible
> cannot be supported the atomic check will reject the commit.
>
>>> What I would like is two sets of properties on a CRTC or preferably on
>>> a connector:
>>>
>>> KMD properties that UMD can query:
>>> * vrr_capable -  This will be an immutable property for exposing
>>> hardware's capability of supporting VRR. This will be set by the
>>> kernel after
>>> reading the EDID mode information and monitor range capabilities.
>>> * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max
>>> refresh rates supported.
>>> These properties are optional and will be created and attached to the
>>> DP/eDP connector when the connector
>>> is getting intialized.
>>
>> Mhm, aren't those properties actually per mode and not per CRTC/connector?
>>
>>> Properties that you mentioned above that the UMD can set before kernel
>>> can enable VRR functionality
>>> *bool vrr_enable or vrr_compatible
>>> target_frame_duration_ns
>>
>> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad
>> name/semantics.
>>
>> We should use an absolute timestamp where the frame should be presented,
>> otherwise you could run into a bunch of trouble with IOCTL restarts or
>> missed blanks.
>
> Also, a fixed target frame duration isn't suitable even for video
> playback, due to drift between the video and audio clocks.
>
> Time-based presentation seems to be the right approach for preventing
> micro-stutter in games as well, Croteam developers have been researching
> this.

Another case that you can handle with time-based presentation but not
with refresh-based API is the use of per-scanline flips in conjunction
with damage rects. For example if you know that the damage rect covers
a certain Y range, you can flip when you're outside that range if the
time that you were given allows it. That's even independent from VRR
displays.

Stéphane


>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/1] drm/amdgpu: Enable scatter gather display support

2018-04-12 Thread Christian König

Hi Sam,

yeah sorry for that. It's already rather late here and I got a bit taken 
away.


But I still insist that you at least try as I advised, I'm pretty sure 
that this approach will work and cover all use cases discussed so far.


Regards,
Christian.

Am 12.04.2018 um 22:42 schrieb Samuel Li:

Hi Christian,

If you have any good proposal, please describe here.
Otherwise kindly avoid saying anything based on your emotion.

Regards,
Sam



On 2018-04-12 04:13 PM, Christian König wrote:

Am 12.04.2018 um 22:01 schrieb Samuel Li:

The 4th proposal :)


In other words add something like the following:

if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
AMDGPU_GEM_DOMAIN_GTT)
  domain = AMDGPU_GEM_DOMAIN_GTT;

That should be everything we need here.

This is basically against the idea of Marek's change: in Mesa, both GTT/VRAM 
are allowed; but now in your kernel change, all buffers uses GTT only(not 
limited to display buffer now).
To compare, current patch still seems better, since it only circumscribes 
display buffer.

What I suggested here is for pinning preference, that only affects BOs used for 
scanout and it also only affects them while they are scanned out.

Please implement as advised or otherwise we need to assign the work to somebody 
else.

Thanks,
Christian.


Sam


On 2018-04-12 02:47 PM, Christian König wrote:

Patch #1: Avoid the hardware bug!

E.g. even when we avoid different placements it would be good to have a patch 
which turns off immediate flipping when switching from VRAM to GTT.

That is as safety net and to document that we need to avoid this condition on 
the hardware.

Patch #2: Go into amdgpu_bo_pin_restricted() and change the pinning preference.

In other words add something like the following:

if (domain & AMDGPU_GEM_DOMAIN_GTT && bo->preferred_domains & 
AMDGPU_GEM_DOMAIN_GTT)
  domain = AMDGPU_GEM_DOMAIN_GTT;

That should be everything we need here.

Regards,
Christian.

Am 12.04.2018 um 20:07 schrieb Samuel Li:

Please clarify, Christian. How would you like it to be implemented?

Sam


On 2018-04-12 02:00 PM, Christian König wrote:

1) Turn off immediate mode when flipping between VRAM/GTT.

That must be implemented independently.

See working around the hardware bug should be a different patch than 
implementing a placement policy.


As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

And I can only repeat myself. Alex and I are the maintainers of the kernel 
module, so we are the one who decide on how to implement things here.

And we both noted that this approach of overriding user space decisions is not 
a good design.

The placement policy I suggest by preferring GTT over VRAM on APUs should be 
trivial to implement and as far as I can see avoids all negative side effects.

Regards,
Christian.

Am 12.04.2018 um 19:21 schrieb Samuel Li:

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Three options have been brought up for kernel,
1) Turn off immediate mode when flipping between VRAM/GTT.
2) Check the domain of the current fb and then adjust the new one before 
pinning it.
3) Use only VRAM or GTT depending on a threshhold.

As per discussion, the 3rd one, which is the current patch, seems the best so 
far.

Regards,
Samuel Li



On 2018-04-12 01:03 PM, Christian König wrote:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:

Completely agree, that's what I suggested to implement.

The point is this kernel change now needs to be reworked and adapted to what 
Mesa is doing.

Regards,
Christian.

Am 12.04.2018 um 18:40 schrieb Marek Olšák:

Can you be more specific, Christian? Mesa has this, I don't think it needs 
anything else:
https://cgit.freedesktop.org/mesa/mesa/commit/?id=7d2079908d9ef05ec3f35b7078833e57846cab5b

Marek

On Wed, Mar 28, 2018 at 3:46 AM, Christian König mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

    Am 28.03.2018 um 00:22 schrieb Samuel Li:

    It's auto by default. For CZ/ST, auto setting enables sg display
    when vram size is small; otherwise still uses vram.
    This patch fixed some potential hang issue introduced by change
    "allow framebuffer in GART memory as well" due to CZ/ST hardware
    limitation.


    Well that is still a NAK.

    As discussed now multiple times please implement the necessary
    changes in Mesa.

    Regards,
    Christian.



    v2: Change default setting to auto, also some misc changes.
    Signed-off-by: Samuel Li mailto:samuel...@amd.com>>
    ---
      drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
      drivers/gpu/drm/amd/amdgpu/amdgpu_display.c    | 10 --
      drivers/gpu/drm/amd/amdgpu/amdgpu_display.h    |  2 ++
      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 
      drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c   |  2

RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-12 Thread Deng, Emily
Ping

Best Wishes,
Emily Deng




> -Original Message-
> From: Emily Deng [mailto:emily.d...@amd.com]
> Sent: Thursday, April 12, 2018 6:22 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deng, Emily ; Liu, Monk 
> Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> 
> issue:
> there are VMC page fault occurred if force APP kill during 3dmark test, the
> cause is in entity_fini we manually signal all those jobs in entity's queue
> which confuse the sync/dep
> mechanism:
> 
> 1)page fault occurred in sdma's clear job which operate on shadow buffer,
> and shadow buffer's Gart table is cleaned by ttm_bo_release since the fence
> in its reservation was fake signaled by entity_fini() under the case of 
> SIGKILL
> received.
> 
> 2)page fault occurred in gfx' job because during the lifetime of gfx job we
> manually fake signal all jobs from its entity in entity_fini(), thus the
> unmapping/clear PTE job depend on those result fence is satisfied and sdma
> start clearing the PTE and lead to GFX page fault.
> 
> fix:
> 1)should at least wait all jobs already scheduled complete in entity_fini() if
> SIGKILL is the case.
> 
> 2)if a fence signaled and try to clear some entity's dependency, should set
> this entity guilty to prevent its job really run since the dependency is fake
> signaled.
> 
> v2:
> splitting drm_sched_entity_fini() into two functions:
> 1)The first one is does the waiting, removes the entity from the runqueue
> and returns an error when the process was killed.
> 2)The second one then goes over the entity, install it as completion signal 
> for
> the remaining jobs and signals all jobs with an error code.
> 
> v3:
> 1)Replace the fini1 and fini2 with better name 2)Call the first part before 
> the
> VM teardown in
> amdgpu_driver_postclose_kms() and the second part after the VM teardown
> 3)Keep the original function drm_sched_entity_fini to refine the code.
> 
> Signed-off-by: Monk Liu 
> Signed-off-by: Emily Deng 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64
> ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
>  drivers/gpu/drm/scheduler/gpu_scheduler.c | 74
> ++-
>  include/drm/gpu_scheduler.h   |  7 +++
>  5 files changed, 131 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 2babfad..200db73 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void
> *data,  int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx, unsigned
> ring_id);
> 
>  void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
> +void amdgpu_ctx_mgr_entity_cleanup(struct amdgpu_ctx_mgr *mgr); void
> +amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
>  void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
> 
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 09d35051..659add4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -111,8 +111,9 @@ static int amdgpu_ctx_init(struct amdgpu_device
> *adev,
>   return r;
>  }
> 
> -static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
> +static void amdgpu_ctx_fini(struct kref *ref)
>  {
> + struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx,
> +refcount);
>   struct amdgpu_device *adev = ctx->adev;
>   unsigned i, j;
> 
> @@ -125,13 +126,11 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx
> *ctx)
>   kfree(ctx->fences);
>   ctx->fences = NULL;
> 
> - for (i = 0; i < adev->num_rings; i++)
> - drm_sched_entity_fini(&adev->rings[i]->sched,
> -   &ctx->rings[i].entity);
> -
>   amdgpu_queue_mgr_fini(adev, &ctx->queue_mgr);
> 
>   mutex_destroy(&ctx->lock);
> +
> + kfree(ctx);
>  }
> 
>  static int amdgpu_ctx_alloc(struct amdgpu_device *adev, @@ -170,12
> +169,15 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
> static void amdgpu_ctx_do_release(struct kref *ref)  {
>   struct amdgpu_ctx *ctx;
> + u32 i;
> 
>   ctx = container_of(ref, struct amdgpu_ctx, refcount);
> 
> - amdgpu_ctx_fini(ctx);
> + for (i = 0; i < ctx->adev->num_rings; i++)
> + drm_sched_entity_fini(&ctx->adev->rings[i]->sched,
> + &ctx->rings[i].entity);
> 
> - kfree(ctx);
> + amdgpu_ctx_fini(ref);
>  }
> 
>  static int amdgpu_ctx_free(struct amdgpu_fpriv *fpriv, uint32_t id) @@ -
> 435,16 +437,62 @@ void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr
> *mgr)
>   idr_init(&mgr->ctx_handles);
>  }
> 
> +void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr) {
> + struct amdgpu_ctx *ctx;
> + struct idr *idp;
> + uint32_t id, i;
> +
> + idp = &mgr->ctx_handle

RE: [PATCH 2/2] Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"

2018-04-12 Thread S, Shirish
Hi Harry, Alex,

The solution given while reviewing my patch was that "DC should support 
enabling a CRTC without a framebuffer."

Since the revert is a temporary workaround to address issue at hand and 
considering the bigger regression it will cause on ChromeOS(explained below),
I would strongly recommend that the revert should not be mainlined (to linus 
tree), until a proper fix for both the issues i.e., flickering and BUG hit on 
atomic is found.

For the sake of everyone's understanding in the list below is a brief 
background.

Mainline patch from intel folks, "846c7df drm/atomic: Try to preserve the crtc 
enabled state in drm_atomic_remove_fb, v2." 
introduces a slight behavioral change to rmfb. Instead of disabling a crtc when 
the primary plane is disabled, it now preserves it.

This change leads to BUG hit while performing atomic commit on amd driver 
leading to reboot/system instability on ChromeOS which has enabled
drm atomic way of rendering, I also remember it causing some issue on other OS 
as well.

Thanks & Regards,
Shirish S

-Original Message-
From: Michel Dänzer [mailto:mic...@daenzer.net] 
Sent: Thursday, April 12, 2018 8:39 PM
To: Wentland, Harry 
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; S, Shirish 
Subject: Re: [PATCH 2/2] Revert "drm/amd/display: disable CRTCs with NULL FB on 
their primary plane (V2)"

On 2018-04-12 04:51 PM, Harry Wentland wrote:
> This seems to cause flickering and lock-ups for a wide range of users.
> Revert until we've found a proper fix for the flickering and lock-ups.
> 
> This reverts commit 36cc549d59864b7161f0e23d710c1c4d1b9cf022.
> 
> Cc: Shirish S 
> Cc: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Harry Wentland 

Thanks Harry, both patches are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-12 Thread Shirish S
amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
 
/* delayed work_func for deferring clockgating during resume */
struct delayed_work late_init_work;
+   /* delayed work_func to defer testing IB's on rings during boot */
+   struct delayed_work late_init_test_ib_work;
 
struct amdgpu_virt  virt;
/* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
 
 #define AMDGPU_RESUME_MS   2000
+#define AMDGPU_IB_TEST_SCHED_MS2000
 
 static const char *amdgpu_asic_name[] = {
"TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)
}
 }
 
+static void amdgpu_device_late_init_test_ib_func_handler(struct work_struct 
*work)
+{
+   struct amdgpu_device *adev =
+   container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);
+   int r = amdgpu_ib_ring_tests(adev);
+
+   if (r)
+   DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
 /**
  * amdgpu_device_has_dc_support - check if dc is supported
  *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_LIST_HEAD(&adev->ring_lru_list);
spin_lock_init(&adev->ring_lru_list_lock);
 
+   INIT_DELAYED_WORK(&adev->late_init_test_ib_work,
+ amdgpu_device_late_init_test_ib_func_handler);
INIT_DELAYED_WORK(&adev->late_init_work,
  amdgpu_device_ip_late_init_func_handler);
 
@@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
goto failed;
}
 
-   r = amdgpu_ib_ring_tests(adev);
-   if (r)
-   DRM_ERROR("ib ring test failed (%d).\n", r);
+   /* Schedule amdgpu_ib_ring_tests() */
+   mod_delayed_work(system_wq, &adev->late_init_test_ib_work,
+   msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));
 
if (amdgpu_sriov_vf(adev))
amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
}
adev->accel_working = false;
cancel_delayed_work_sync(&adev->late_init_work);
+   cancel_delayed_work_sync(&adev->late_init_test_ib_work);
/* free i2c buses */
if (!amdgpu_device_has_dc_support(adev))
amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 487d39e..057bd9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
if (!info->return_size || !info->return_pointer)
return -EINVAL;
 
+   if (delayed_work_pending(&adev->late_init_test_ib_work))
+   DRM_ERROR("IB test on ring not executed\n");
+
switch (info->query) {
case AMDGPU_INFO_ACCEL_WORKING:
ui32 = adev->accel_working;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-12 Thread monk

Hi Christian & Emily

This v3 version looks pretty good to me, but still some parts need to 
improve:


e.g.

1)entity->finished doesn't presenting what it really means, better 
rename it to entity->last_scheduled.


2)drm_sched_entity_fini_job_cb() better renamed to 
drm_sched_entity_kill_jobs_cb()


3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to 
drm_sched_entity_fini_job_cb() if -ENOENT returned after 
dma_fence_add_callback since this parm is not needed at all in this 
callback routine


4)better change type of entity->fini_status to "int" instead of 
"uint32_t" ... it should be aligned with the return type of 
wait_event_killable()


5)

+   if (entity->finished) {
+   dma_fence_put(entity->finished);
+   entity->finished = NULL;
}

no need to check entity->finished because dma_fence_put() will do it inside...



and the same here in job_recovery:

+
+   if (s_job->entity->finished)
+   dma_fence_put(s_job->entity->finished);

and the same here in sched_main:

+   if (entity->finished)
+   dma_fence_put(entity->finished);


6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason for that 
?


thanks

/Monk






On 04/13/2018 10:06 AM, Deng, Emily wrote:

Ping

Best Wishes,
Emily Deng





-Original Message-
From: Emily Deng [mailto:emily.d...@amd.com]
Sent: Thursday, April 12, 2018 6:22 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Liu, Monk 
Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

issue:
there are VMC page fault occurred if force APP kill during 3dmark test, the
cause is in entity_fini we manually signal all those jobs in entity's queue
which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on shadow buffer,
and shadow buffer's Gart table is cleaned by ttm_bo_release since the fence
in its reservation was fake signaled by entity_fini() under the case of SIGKILL
received.

2)page fault occurred in gfx' job because during the lifetime of gfx job we
manually fake signal all jobs from its entity in entity_fini(), thus the
unmapping/clear PTE job depend on those result fence is satisfied and sdma
start clearing the PTE and lead to GFX page fault.

fix:
1)should at least wait all jobs already scheduled complete in entity_fini() if
SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency, should set
this entity guilty to prevent its job really run since the dependency is fake
signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the runqueue
and returns an error when the process was killed.
2)The second one then goes over the entity, install it as completion signal for
the remaining jobs and signals all jobs with an error code.

v3:
1)Replace the fini1 and fini2 with better name 2)Call the first part before the
VM teardown in
amdgpu_driver_postclose_kms() and the second part after the VM teardown
3)Keep the original function drm_sched_entity_fini to refine the code.

Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64
++
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 74
++-
  include/drm/gpu_scheduler.h   |  7 +++
  5 files changed, 131 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2babfad..200db73 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void
*data,  int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx, unsigned
ring_id);

  void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_cleanup(struct amdgpu_ctx_mgr *mgr); void
+amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
  void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 09d35051..659add4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -111,8 +111,9 @@ static int amdgpu_ctx_init(struct amdgpu_device
*adev,
return r;
  }

-static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
+static void amdgpu_ctx_fini(struct kref *ref)
  {
+   struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx,
+refcount);
struct amdgpu_device *adev = ctx->adev;
unsigned i, j;

@@ -125,13 +126,11 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx
*ctx)
kfree(ctx->fences);
ctx->fences = NULL;

-   for (i = 0; i < adev->num_rings; i++)
-   drm_sched_entity_fini(&adev->rings[

Re: [PATCH 10/21] drm/amd/display: Add back code to allow for rounding error

2018-04-12 Thread Nils Wallménius
Den tis 10 apr. 2018 23:11Harry Wentland  skrev:

> From: Anthony Koo 
>
> Signed-off-by: Anthony Koo 
> Reviewed-by: Aric Cyr 
> Acked-by: Harry Wentland 
> ---
>  drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
> b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
> index 4887c888bbe7..abd5c9374eb3 100644
> --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
> +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
> @@ -896,6 +896,17 @@ bool mod_freesync_is_valid_range(struct mod_freesync
> *mod_freesync,
> unsigned long long nominal_field_rate_in_uhz =
> mod_freesync_calc_nominal_field_rate(stream);
>
> +   /* Allow for some rounding error of actual video timing by taking
> ceil.
> +* For example, 144 Hz mode timing may actually be 143.xxx Hz when
> +* calculated from pixel rate and vertical/horizontal totals, but
> +* this should be allowed instead of blocking FreeSync.
> +*/
>

Hi, with this change the var names ending in *_in_uhz are a bit confusing,
also the integer division will not round up (take ceil) as seems to be the
intention from the above comment. Perhaps the comment needs to be improved?

BR
Nils


+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
> 100);
> +   min_refresh_cap_in_uhz /= 100;
> +   max_refresh_cap_in_uhz /= 100;
> +   min_refresh_request_in_uhz /= 100;
> +   max_refresh_request_in_uhz /= 100;
> +
> // Check nominal is within range
> if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz ||
> nominal_field_rate_in_uhz < min_refresh_cap_in_uhz)
> @@ -921,7 +932,7 @@ bool mod_freesync_is_valid_range(struct mod_freesync
> *mod_freesync,
>
> // For variable range, check for at least 10 Hz range
> if ((max_refresh_request_in_uhz != min_refresh_request_in_uhz) &&
> -   (max_refresh_request_in_uhz - min_refresh_request_in_uhz <
> 1000))
> +   (max_refresh_request_in_uhz - min_refresh_request_in_uhz <
> 10))
> return false;
>
> return true;
> --
> 2.15.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-12 Thread Deng, Emily
Hi Monk,
 Thanks your review, I will refine the code as your suggestion 1),2),3), 
4),5).
 About 6): I think it is no relation to put the amdgpu_ctx_mgr_fini before 
or after amdgpu_vm_fini.

Hi Christian, 
 Do you have any thoughts about 6)?

Best Wishes,
Emily Deng

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of monk
> Sent: Friday, April 13, 2018 1:01 PM
> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> 
> Hi Christian & Emily
> 
> This v3 version looks pretty good to me, but still some parts need to
> improve:
> 
> e.g.
> 
> 1)entity->finished doesn't presenting what it really means, better rename it
> to entity->last_scheduled.
> 
> 2)drm_sched_entity_fini_job_cb() better renamed to
> drm_sched_entity_kill_jobs_cb()
> 
> 3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to
> drm_sched_entity_fini_job_cb() if -ENOENT returned after
> dma_fence_add_callback since this parm is not needed at all in this callback
> routine
> 
> 4)better change type of entity->fini_status to "int" instead of "uint32_t" 
> ... it
> should be aligned with the return type of
> wait_event_killable()
> 
> 5)
> 
> + if (entity->finished) {
> + dma_fence_put(entity->finished);
> + entity->finished = NULL;
>   }
> 
> no need to check entity->finished because dma_fence_put() will do it inside...
> 
> 
> 
> and the same here in job_recovery:
> 
> +
> + if (s_job->entity->finished)
> + dma_fence_put(s_job->entity->finished);
> 
> and the same here in sched_main:
> 
> + if (entity->finished)
> + dma_fence_put(entity->finished);
> 
> 
> 6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason
> for that ?
> 
> 
> thanks
> 
> /Monk
> 
> 
> 
> 
> 
> 
> On 04/13/2018 10:06 AM, Deng, Emily wrote:
> > Ping
> >
> > Best Wishes,
> > Emily Deng
> >
> >
> >
> >
> >> -Original Message-
> >> From: Emily Deng [mailto:emily.d...@amd.com]
> >> Sent: Thursday, April 12, 2018 6:22 PM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Deng, Emily ; Liu, Monk
> 
> >> Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> >>
> >> issue:
> >> there are VMC page fault occurred if force APP kill during 3dmark
> >> test, the cause is in entity_fini we manually signal all those jobs
> >> in entity's queue which confuse the sync/dep
> >> mechanism:
> >>
> >> 1)page fault occurred in sdma's clear job which operate on shadow
> >> buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release
> >> since the fence in its reservation was fake signaled by entity_fini()
> >> under the case of SIGKILL received.
> >>
> >> 2)page fault occurred in gfx' job because during the lifetime of gfx
> >> job we manually fake signal all jobs from its entity in
> >> entity_fini(), thus the unmapping/clear PTE job depend on those
> >> result fence is satisfied and sdma start clearing the PTE and lead to GFX
> page fault.
> >>
> >> fix:
> >> 1)should at least wait all jobs already scheduled complete in
> >> entity_fini() if SIGKILL is the case.
> >>
> >> 2)if a fence signaled and try to clear some entity's dependency,
> >> should set this entity guilty to prevent its job really run since the
> >> dependency is fake signaled.
> >>
> >> v2:
> >> splitting drm_sched_entity_fini() into two functions:
> >> 1)The first one is does the waiting, removes the entity from the
> >> runqueue and returns an error when the process was killed.
> >> 2)The second one then goes over the entity, install it as completion
> >> signal for the remaining jobs and signals all jobs with an error code.
> >>
> >> v3:
> >> 1)Replace the fini1 and fini2 with better name 2)Call the first part
> >> before the VM teardown in
> >> amdgpu_driver_postclose_kms() and the second part after the VM
> >> teardown 3)Keep the original function drm_sched_entity_fini to refine the
> code.
> >>
> >> Signed-off-by: Monk Liu 
> >> Signed-off-by: Emily Deng 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64
> >> ++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
> >>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 74
> >> ++-
> >>   include/drm/gpu_scheduler.h   |  7 +++
> >>   5 files changed, 131 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index 2babfad..200db73 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev,
> void
> >> *data,  int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx,
> >> unsigned ring_id);
> >>
> >>   void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr 

RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-12 Thread Deng, Emily
Hi Monk,
Another consideration, it will be better to put amdgpu_ctx_mgr_fini() 
beneath amdgpu_vm_fini, in 
this case, it will first set all ctx and vm entity rq to null, then set all the 
not scheduled job's fence to signal.

Best Wishes,
Emily Deng

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Deng, Emily
> Sent: Friday, April 13, 2018 2:08 PM
> To: Liu, Monk ; Christian König
> 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> 
> Hi Monk,
>  Thanks your review, I will refine the code as your suggestion 1),2),3), 
> 4),5).
>  About 6): I think it is no relation to put the amdgpu_ctx_mgr_fini 
> before or
> after amdgpu_vm_fini.
> 
> Hi Christian,
>  Do you have any thoughts about 6)?
> 
> Best Wishes,
> Emily Deng
> 
> > -Original Message-
> > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> > Of monk
> > Sent: Friday, April 13, 2018 1:01 PM
> > To: Deng, Emily ; amd-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> >
> > Hi Christian & Emily
> >
> > This v3 version looks pretty good to me, but still some parts need to
> > improve:
> >
> > e.g.
> >
> > 1)entity->finished doesn't presenting what it really means, better
> > rename it to entity->last_scheduled.
> >
> > 2)drm_sched_entity_fini_job_cb() better renamed to
> > drm_sched_entity_kill_jobs_cb()
> >
> > 3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to
> > drm_sched_entity_fini_job_cb() if -ENOENT returned after
> > dma_fence_add_callback since this parm is not needed at all in this
> > callback routine
> >
> > 4)better change type of entity->fini_status to "int" instead of
> > "uint32_t" ... it should be aligned with the return type of
> > wait_event_killable()
> >
> > 5)
> >
> > +   if (entity->finished) {
> > +   dma_fence_put(entity->finished);
> > +   entity->finished = NULL;
> > }
> >
> > no need to check entity->finished because dma_fence_put() will do it
> inside...
> >
> >
> >
> > and the same here in job_recovery:
> >
> > +
> > +   if (s_job->entity->finished)
> > +   dma_fence_put(s_job->entity->finished);
> >
> > and the same here in sched_main:
> >
> > +   if (entity->finished)
> > +   dma_fence_put(entity->finished);
> >
> >
> > 6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason
> > for that ?
> >
> >
> > thanks
> >
> > /Monk
> >
> >
> >
> >
> >
> >
> > On 04/13/2018 10:06 AM, Deng, Emily wrote:
> > > Ping
> > >
> > > Best Wishes,
> > > Emily Deng
> > >
> > >
> > >
> > >
> > >> -Original Message-
> > >> From: Emily Deng [mailto:emily.d...@amd.com]
> > >> Sent: Thursday, April 12, 2018 6:22 PM
> > >> To: amd-gfx@lists.freedesktop.org
> > >> Cc: Deng, Emily ; Liu, Monk
> > 
> > >> Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> > >>
> > >> issue:
> > >> there are VMC page fault occurred if force APP kill during 3dmark
> > >> test, the cause is in entity_fini we manually signal all those jobs
> > >> in entity's queue which confuse the sync/dep
> > >> mechanism:
> > >>
> > >> 1)page fault occurred in sdma's clear job which operate on shadow
> > >> buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release
> > >> since the fence in its reservation was fake signaled by
> > >> entity_fini() under the case of SIGKILL received.
> > >>
> > >> 2)page fault occurred in gfx' job because during the lifetime of
> > >> gfx job we manually fake signal all jobs from its entity in
> > >> entity_fini(), thus the unmapping/clear PTE job depend on those
> > >> result fence is satisfied and sdma start clearing the PTE and lead
> > >> to GFX
> > page fault.
> > >>
> > >> fix:
> > >> 1)should at least wait all jobs already scheduled complete in
> > >> entity_fini() if SIGKILL is the case.
> > >>
> > >> 2)if a fence signaled and try to clear some entity's dependency,
> > >> should set this entity guilty to prevent its job really run since
> > >> the dependency is fake signaled.
> > >>
> > >> v2:
> > >> splitting drm_sched_entity_fini() into two functions:
> > >> 1)The first one is does the waiting, removes the entity from the
> > >> runqueue and returns an error when the process was killed.
> > >> 2)The second one then goes over the entity, install it as
> > >> completion signal for the remaining jobs and signals all jobs with an
> error code.
> > >>
> > >> v3:
> > >> 1)Replace the fini1 and fini2 with better name 2)Call the first
> > >> part before the VM teardown in
> > >> amdgpu_driver_postclose_kms() and the second part after the VM
> > >> teardown 3)Keep the original function drm_sched_entity_fini to
> > >> refine the
> > code.
> > >>
> > >> Signed-off-by: Monk Liu 
> > >> Signed-off-by: Emily Deng 
> > >> ---
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
> > >>  

Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-12 Thread Christian König

Am 13.04.2018 um 06:07 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.


That is rather suboptimal, but see below.



Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
  
  	/* delayed work_func for deferring clockgating during resume */

struct delayed_work late_init_work;
+   /* delayed work_func to defer testing IB's on rings during boot */
+   struct delayed_work late_init_test_ib_work;


You must put the IB test into the late_init_work as well, otherwise the 
two delayed workers can race with each other.


  
  	struct amdgpu_virt	virt;

/* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
  
  #define AMDGPU_RESUME_MS		2000

+#define AMDGPU_IB_TEST_SCHED_MS2000
  
  static const char *amdgpu_asic_name[] = {

"TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)
}
  }
  
+static void amdgpu_device_late_init_test_ib_func_handler(struct work_struct *work)

+{
+   struct amdgpu_device *adev =
+   container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);
+   int r = amdgpu_ib_ring_tests(adev);
+
+   if (r)
+   DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_LIST_HEAD(&adev->ring_lru_list);
spin_lock_init(&adev->ring_lru_list_lock);
  
+	INIT_DELAYED_WORK(&adev->late_init_test_ib_work,

+ amdgpu_device_late_init_test_ib_func_handler);
INIT_DELAYED_WORK(&adev->late_init_work,
  amdgpu_device_ip_late_init_func_handler);
  
@@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,

goto failed;
}
  
-	r = amdgpu_ib_ring_tests(adev);

-   if (r)
-   DRM_ERROR("ib ring test failed (%d).\n", r);
+   /* Schedule amdgpu_ib_ring_tests() */
+   mod_delayed_work(system_wq, &adev->late_init_test_ib_work,
+   msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));


That doesn't work like you intended. mod_delayed_work() overrides the 
existing handler.


What you wanted to use is queue_delayed_work(), but as I said we should 
only have one delayed worker.


  
  	if (amdgpu_sriov_vf(adev))

amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
}
adev->accel_working = false;
cancel_delayed_work_sync(&adev->late_init_work);
+   cancel_delayed_work_sync(&adev->late_init_test_ib_work);
/* free i2c buses */
if (!amdgpu_device_has_dc_support(adev))
amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 487d39e..057bd9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
if (!info->return_size || !info->return_pointer)
return -EINVAL;
  
+	if (delayed_work_pending(&adev->late_init_test_ib_work))

+   DRM_ERROR("IB test on ring not executed\n");
+


Please use flush_delayed_work() instead of issuing and error here.

Regards,
Christian.


switch (info->query) {
case AMDGPU_INFO_ACCEL_WORKING:
ui32 = adev->accel_working;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-12 Thread Christian König

Hi Monk/Emily,

give me the weekend to take a closer look since I'm very busy this morning.

In general the order of ctx_fini and vm fini is very important cause we 
otherwise dereference invalid pointers here.


Regards,
Christian.

Am 13.04.2018 um 08:18 schrieb Deng, Emily:

Hi Monk,
 Another consideration, it will be better to put amdgpu_ctx_mgr_fini() 
beneath amdgpu_vm_fini, in
this case, it will first set all ctx and vm entity rq to null, then set all the 
not scheduled job's fence to signal.

Best Wishes,
Emily Deng


-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
Of Deng, Emily
Sent: Friday, April 13, 2018 2:08 PM
To: Liu, Monk ; Christian König

Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

Hi Monk,
  Thanks your review, I will refine the code as your suggestion 1),2),3), 
4),5).
  About 6): I think it is no relation to put the amdgpu_ctx_mgr_fini before 
or
after amdgpu_vm_fini.

Hi Christian,
  Do you have any thoughts about 6)?

Best Wishes,
Emily Deng


-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
Of monk
Sent: Friday, April 13, 2018 1:01 PM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

Hi Christian & Emily

This v3 version looks pretty good to me, but still some parts need to
improve:

e.g.

1)entity->finished doesn't presenting what it really means, better
rename it to entity->last_scheduled.

2)drm_sched_entity_fini_job_cb() better renamed to
drm_sched_entity_kill_jobs_cb()

3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to
drm_sched_entity_fini_job_cb() if -ENOENT returned after
dma_fence_add_callback since this parm is not needed at all in this
callback routine

4)better change type of entity->fini_status to "int" instead of
"uint32_t" ... it should be aligned with the return type of
wait_event_killable()

5)

+   if (entity->finished) {
+   dma_fence_put(entity->finished);
+   entity->finished = NULL;
}

no need to check entity->finished because dma_fence_put() will do it

inside...



and the same here in job_recovery:

+
+   if (s_job->entity->finished)
+   dma_fence_put(s_job->entity->finished);

and the same here in sched_main:

+   if (entity->finished)
+   dma_fence_put(entity->finished);


6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason
for that ?


thanks

/Monk






On 04/13/2018 10:06 AM, Deng, Emily wrote:

Ping

Best Wishes,
Emily Deng





-Original Message-
From: Emily Deng [mailto:emily.d...@amd.com]
Sent: Thursday, April 12, 2018 6:22 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Liu, Monk



Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

issue:
there are VMC page fault occurred if force APP kill during 3dmark
test, the cause is in entity_fini we manually signal all those jobs
in entity's queue which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on shadow
buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release
since the fence in its reservation was fake signaled by
entity_fini() under the case of SIGKILL received.

2)page fault occurred in gfx' job because during the lifetime of
gfx job we manually fake signal all jobs from its entity in
entity_fini(), thus the unmapping/clear PTE job depend on those
result fence is satisfied and sdma start clearing the PTE and lead
to GFX

page fault.

fix:
1)should at least wait all jobs already scheduled complete in
entity_fini() if SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency,
should set this entity guilty to prevent its job really run since
the dependency is fake signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the
runqueue and returns an error when the process was killed.
2)The second one then goes over the entity, install it as
completion signal for the remaining jobs and signals all jobs with an

error code.

v3:
1)Replace the fini1 and fini2 with better name 2)Call the first
part before the VM teardown in
amdgpu_driver_postclose_kms() and the second part after the VM
teardown 3)Keep the original function drm_sched_entity_fini to
refine the

code.

Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64
++
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 74
++-
   include/drm/gpu_scheduler.h   |  7 +++
   5 files changed, 131 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/a

[PATCH] drm/amd/pp: Move common code to smu_help.c

2018-04-12 Thread Rex Zhu
Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c   | 56 ++
 drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h   | 21 
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 43 +
 3 files changed, 78 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
index e6178b0..7c23741 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
@@ -650,3 +650,59 @@ int smu_get_voltage_dependency_table_ppt_v1(
 
return 0;
 }
+
+int smu_set_watermarks_for_clocks_ranges(void *wt_table,
+   struct pp_wm_sets_with_clock_ranges_soc15 *wm_with_clock_ranges)
+{
+   uint32_t i;
+   struct watermarks *table = wt_table;
+
+   if (!table || wm_with_clock_ranges)
+   return -EINVAL;
+
+   if (wm_with_clock_ranges->num_wm_sets_dmif > 4 || 
wm_with_clock_ranges->num_wm_sets_mcif > 4)
+   return -EINVAL;
+
+   for (i = 0; i < wm_with_clock_ranges->num_wm_sets_dmif; i++) {
+   table->WatermarkRow[1][i].MinClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_min_dcefclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].MaxClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_max_dcefclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].MinUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_min_memclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].MaxUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_max_memclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].WmSetting = (uint8_t)
+   wm_with_clock_ranges->wm_sets_dmif[i].wm_set_id;
+   }
+
+   for (i = 0; i < wm_with_clock_ranges->num_wm_sets_mcif; i++) {
+   table->WatermarkRow[0][i].MinClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_min_socclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].MaxClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_max_socclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].MinUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_min_memclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].MaxUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_max_memclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].WmSetting = (uint8_t)
+   wm_with_clock_ranges->wm_sets_mcif[i].wm_set_id;
+   }
+   return 0;
+}
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
index 2243e29..916cc01 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
@@ -26,10 +26,27 @@
 struct pp_atomctrl_voltage_table;
 struct pp_hwmgr;
 struct phm_ppt_v1_voltage_lookup_table;
+struct Watermarks_t;
+struct pp_wm_sets_with_clock_ranges_soc15;
 
 uint8_t convert_to_vid(uint16_t vddc);
 uint16_t convert_to_vddc(uint8_t vid);
 
+struct watermark_row_generic_t {
+   uint16_t MinClock;
+   uint16_t MaxClock;
+   uint16_t MinUclk;
+   uint16_t MaxUclk;
+
+   uint8_t  WmSetting;
+   uint8_t  Padding[3];
+};
+
+struct watermarks {
+   struct watermark_row_generic_t WatermarkRow[2][4];
+   uint32_t padding[7];
+};
+
 extern int phm_wait_for_register_unequal(struct pp_hwmgr *hwmgr,
uint32_t index,
uint32_t value, uint32_t mask);
@@ -88,6 +105,10 @@ void *smu_atom_get_data_table(void *dev, uint32_t table, 
uint16_t *size,
 int smu_get_voltage_dependency_table_ppt_v1(
const struct phm_ppt_v1_clock_voltage_dependency_table 
*allowed_dep_table,
struct phm_ppt_v1_clock_voltage_dependency_table *dep_table);
+
+int smu_set_watermarks_for_clocks_ranges(void *wt_table,
+   struct pp_wm_sets_with_clock_ranges_soc15 
*wm_with_clock_ranges);
+
 #define PHM_FIELD_SHIFT(reg, field) reg##__##field##__SHIFT
 #define PHM_FIELD_MASK(reg, field) reg##__##field##_MASK
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega

Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-12 Thread S, Shirish



On 4/13/2018 11:53 AM, Christian König wrote:

Am 13.04.2018 um 06:07 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.


That is rather suboptimal, but see below.


Which part is sub-optimal, deferring or checking if the work is scheduled?


Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
    /* delayed work_func for deferring clockgating during resume */
  struct delayed_work late_init_work;
+    /* delayed work_func to defer testing IB's on rings during boot */
+    struct delayed_work late_init_test_ib_work;


You must put the IB test into the late_init_work as well, otherwise 
the two delayed workers can race with each other.


I thought  from the comment above the declaration, its clear why i am 
creating 2 work structures.
late_init_work is to optimize resume time and late_init_test_ib_work is 
to optimize the boot time.
There cant be race as the context in which they are called are totally 
different.

    struct amdgpu_virt    virt;
  /* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
    #define AMDGPU_RESUME_MS    2000
+#define AMDGPU_IB_TEST_SCHED_MS    2000
    static const char *amdgpu_asic_name[] = {
  "TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)

  }
  }
  +static void amdgpu_device_late_init_test_ib_func_handler(struct 
work_struct *work)

+{
+    struct amdgpu_device *adev =
+    container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);

+    int r = amdgpu_ib_ring_tests(adev);
+
+    if (r)
+    DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  INIT_LIST_HEAD(&adev->ring_lru_list);
  spin_lock_init(&adev->ring_lru_list_lock);
  +    INIT_DELAYED_WORK(&adev->late_init_test_ib_work,
+  amdgpu_device_late_init_test_ib_func_handler);
  INIT_DELAYED_WORK(&adev->late_init_work,
    amdgpu_device_ip_late_init_func_handler);
  @@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  goto failed;
  }
  -    r = amdgpu_ib_ring_tests(adev);
-    if (r)
-    DRM_ERROR("ib ring test failed (%d).\n", r);
+    /* Schedule amdgpu_ib_ring_tests() */
+    mod_delayed_work(system_wq, &adev->late_init_test_ib_work,
+    msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));


That doesn't work like you intended. mod_delayed_work() overrides the 
existing handler.


What you wanted to use is queue_delayed_work(), but as I said we 
should only have one delayed worker.
mod_delayed_work() is safer and optimal method that replaces 
cancel_delayed_work() followed by queue_delayed_work().

(https://lkml.org/lkml/2011/2/3/175)
But if you strongly insist i don't mind changing it.



    if (amdgpu_sriov_vf(adev))
  amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device 
*adev)

  }
  adev->accel_working = false;
  cancel_delayed_work_sync(&adev->late_init_work);
+ cancel_delayed_work_sync(&adev->late_init_test_ib_work);
  /* free i2c buses */
  if (!amdgpu_device_has_dc_support(adev))
  amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

index 487d39e..057bd9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device 
*dev, void *data, struct drm_file

  if (!info->return_size || !info->return_pointer)
  return -EINVAL;
  +    if (delayed_work_pending(&adev->late_init_test_ib_work))
+    DRM_ERROR("IB test on ring not executed\n");
+


Please use flush_delayed_work() instead of issuing and error here.


Agree, wasn't sure of what to do here :).
So i will re-spin with the flush part added. Hope this reply clarifies 
your comments.

Thanks.
Regards,
Shirish S

Reg