[pull] amdgpu, amdkfd drm-next-5.14

2021-05-20 Thread Alex Deucher
Hi Dave, Daniel,

More updates for 5.14. On top of the stuff from last week's PR.

The following changes since commit 2bb5b5f688cbbd5030629905d3ed8032ab46e79f:

  drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are 
connected (2021-05-19 22:29:40 -0400)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.14-2021-05-21

for you to fetch changes up to 81db370c88196400972acd6ebbaa73a1d1e4145f:

  drm/amdgpu: stop touching sched.ready in the backend (2021-05-19 22:45:00 
-0400)


amd-drm-next-5.14-2021-05-21:

amdgpu:
- RAS fixes
- SR-IOV fixes
- More BO management cleanups
- Aldebaran fixes
- Display fixes
- Support for new GPU, Beige Goby
- Backlight fixes

amdkfd:
- RAS fixes
- DMA mapping fixes
- HMM SVM fixes


Aaron Liu (1):
  drm/amdgpu: modify system reference clock source for navi+ (V2)

Alex Deucher (3):
  drm/amdgpu: add mmhub client support for beige goby
  drm/amdgpu/display: add helper functions to get/set backlight (v2)
  drm/amdgpu/display: restore the backlight on modeset (v2)

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.66

Aric Cyr (1):
  drm/amd/display: 3.2.136

Aurabindo Pillai (8):
  drm/amd/display: Add register definitions for Beige Goby
  drm/amd/display: Initial DC support for Beige Goby
  drm/amd/display: Edit license info for beige goby DC files
  drm/amd/display: Add DM support for Beige Goby
  drm/amd/amdgpu: Enable DCN IP init for Beige Goby
  drm/amd/display: Add callback for update_soc_for_wm_a for dcn303
  drm/amd/display: Enable HDCP for Beige Goby
  drm/amd/display: enable idle optimizations for beige goby

Bhawanpreet Lakha (1):
  drm/amd/display: Add Overflow check to skip MALL

Bokun Zhang (1):
  drm/amdgpu: Complete multimedia bandwidth interface

Changfeng (1):
  drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang

Chengming Gui (27):
  drm/amd/amdgpu: add beige_goby asic type
  drm/amd/amdgpu: set fw load type for beige_goby
  drm/amd/amdgpu: set asic family and ip blocks for beige_goby
  drm/amd/amdgpu: add support for beige_goby firmware
  drm/amd/amdgpu: add gmc support for beige_goby
  drm/amd/amdgpu: add common support for beige_goby
  drm/amd/amdgpu: initialize IP offset for beige_goby
  drm/amd/amdgpu: add mmhub support for beige_goby
  drm/amd/amdgpu: add common ip block for beige_goby
  drm/amd/amdgpu: add gmc ip block for beige_goby
  drm/amd/amdgpu: add ih ip block for beige_goby
  drm/amd/amdgpu: add gfx ip block for beige_goby
  drm/amd/amdgpu: add sdma ip block for beige_goby
  drm/amd/amdgpu: configure beige_goby gfx according to gfx 10.3's 
definition
  drm/amd/amdgpu: add virtual display support for beige_goby
  drm/amd/amdgpu: support cp_fw_write_wait for beige_goby
  drm/amd/amdgpu: Use IP discovery table for beige goby
  drm/amdkfd: support beige_goby KFD
  drm/amdkfd: add kfd2kgd funcs for beige_goby kfd support
  drm/amd/amdgpu: add smu support for beige_goby
  drm/amd/amdgpu: add psp support for beige_goby
  drm/amd/amdgpu: update golden_setting_10_3_5 for beige_goby
  drm/amd/pm: add mode1 support for beige_goby
  drm/amd/pm: update smu11 driver interface header for beige_goby
  drm/amd/pm: use macro to get pptable members
  drm/amd/pm: Use the PPTable from VBIOS for beige_goby
  drm/amd/amdgpu: Enable gfxoff for beige_goby

Chris Park (1):
  drm/amd/display: Disconnect non-DP with no EDID

Christian König (7):
  drm/amdgpu: re-apply "use the new cursor in the VM code" v2
  drm/amdgpu: use cursor functions in amdgpu_bo_in_cpu_visible_vram
  drm/amdgpu: set the contiguous flag if possible
  drm/amdgpu: check contiguous flags instead of mm_node
  drm/amdgpu: move struct amdgpu_vram_reservation into vram mgr
  drm/radeon: use the dummy page for GART if needed
  drm/amdgpu: stop touching sched.ready in the backend

Dennis Li (2):
  drm/amdkfd: refine the poison data consumption handling
  drm/amdkfd: fix a resource leakage issue

Felix Kuehling (10):
  drm/amdgpu: Arcturus: MTYPE_NC for coarse-grain remote memory
  drm/amdgpu: Albebaran: MTYPE_NC for coarse-grain remote memory
  drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
  drm/amdgpu: Keep a bo-reference per-attachment
  drm/amdgpu: Simplify AQL queue mapping
  drm/amdgpu: Add multi-GPU DMA mapping helpers
  drm/amdgpu: DMA map/unmap when updating GPU mappings
  drm/amdgpu: Move kfd_mem_attach outside reservation
  drm/amdgpu: Add DMA mapping of GTT BOs
  drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind

George Shen (1):
  drm/amd/display: Minor refactor of DP PHY test 

[git pull] drm fixes for 5.13-rc3

2021-05-20 Thread Dave Airlie
Hi Linus,

Usual collection, mostly amdgpu and some i915 regression fixes. I
nearly managed to hose my build/sign machine this week, but I
recovered it just in time, and I even got clang12 built.

Dave.

drm-fixes-2021-05-21-1:
drm fixes for 5.13-rc3

dma-buf:
- WARN fix

amdgpu:
- Fix downscaling ratio on DCN3.x
- Fix for non-4K pages
- PCO/RV compute hang fix
- Dongle fix
- Aldebaran codec query support
- Refcount leak fix
- Use after free fix
- Navi12 golden settings updates
- GPU reset fixes

radeon:
- Fix for imported BO handling

i915:
- Pin the L-shape quirked object as unshrinkable to fix crashes
- Disable HiZ Raw Stall Optimization on broken gen7 to fix glitches,
gfx corruption
- GVT: Move mdev attribute groups into kvmgt module to fix kconfig deps issue

exynos:
- Correct kerneldoc of fimd_shadow_protect_win function.
- Drop redundant error messages.
The following changes since commit d07f6ca923ea0927a1024dfccafc5b53b61cfecc:

  Linux 5.13-rc2 (2021-05-16 15:27:44 -0700)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-05-21-1

for you to fetch changes up to dd6ad0516ee38112321e99ce368fddd49ee3b9db:

  Merge tag 'amd-drm-fixes-5.13-2021-05-19' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes (2021-05-21
14:08:04 +1000)


drm fixes for 5.13-rc3

dma-buf:
- WARN fix

amdgpu:
- Fix downscaling ratio on DCN3.x
- Fix for non-4K pages
- PCO/RV compute hang fix
- Dongle fix
- Aldebaran codec query support
- Refcount leak fix
- Use after free fix
- Navi12 golden settings updates
- GPU reset fixes

radeon:
- Fix for imported BO handling

i915:
- Pin the L-shape quirked object as unshrinkable to fix crashes
- Disable HiZ Raw Stall Optimization on broken gen7 to fix glitches,
gfx corruption
- GVT: Move mdev attribute groups into kvmgt module to fix kconfig deps issue

exynos:
- Correct kerneldoc of fimd_shadow_protect_win function.
- Drop redundant error messages.


Changfeng (1):
  drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang

Chris Park (1):
  drm/amd/display: Disconnect non-DP with no EDID

Chris Wilson (1):
  drm/i915/gem: Pin the L-shape quirked object as unshrinkable

Christian König (3):
  drm/radeon: use the dummy page for GART if needed
  drm/amdgpu: stop touching sched.ready in the backend
  dma-buf: fix unintended pin/unpin warnings

Dave Airlie (4):
  Merge tag 'exynos-drm-fixes-for-v5.13-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into
drm-fixes
  Merge tag 'drm-misc-fixes-2021-05-20' of
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
  Merge tag 'drm-intel-fixes-2021-05-20' of
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
  Merge tag 'amd-drm-fixes-5.13-2021-05-19' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

Guchun Chen (2):
  drm/amdgpu: update gc golden setting for Navi12
  drm/amdgpu: update sdma golden setting for Navi12

James Zhu (1):
  drm/amdgpu: add video_codecs query support for aldebaran

Jani Nikula (1):
  Merge tag 'gvt-fixes-2021-05-19' of
https://github.com/intel/gvt-linux into drm-intel-fixes

Jingwen Chen (1):
  drm/amd/amdgpu: fix refcount leak

Krzysztof Kozlowski (1):
  drm/exynos: correct exynos_drm_fimd kerneldoc

Lang Yu (1):
  drm/amd/amdgpu: fix a potential deadlock in gpu reset

Nikola Cornij (1):
  drm/amd/display: Use the correct max downscaling value for DCN3.x family

Simon Rettberg (1):
  drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7

Yi Li (1):
  drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE

Zhen Lei (2):
  drm/exynos: Remove redundant error printing in exynos_dsi_probe()
  drm/exynos/decon5433: Remove redundant error printing in
exynos5433_decon_probe()

Zhenyu Wang (1):
  drm/i915/gvt: Move mdev attribute groups into kvmgt module

xinhui pan (1):
  drm/amdgpu: Fix a use-after-free

 drivers/dma-buf/dma-buf.c  |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|   3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  10 +-
 drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c |   2 -
 drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c |   2 -
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c |   4 +
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c |   5 -
 drivers/gpu/drm/amd/amdgpu/soc15.c |   3 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c  |   8 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c  |  18 +++
 .../gpu/drm/amd/display/dc/dcn30/dcn30_resource.c  |   7 +-
 

[PATCH i-g-t] tests/drm_read: drm_read fails for subtest invalid-buffer on chrome

2021-05-20 Thread Vidya Srinivas
Chrome platforms is unable to handle reading from -1.
It terminates the test after reporting buffer overflow.
Hence, changed the address for invalid buffer to NULL instead of -1.
With this change, errno becomes EINTR when reading from NULL
location. To accomodate, also changing the check of errno to EINTR
instead of EFAULT

Change-Id: I5f844af087c9826fcbcfbe301f0df5f727cb013b
Signed-off-by: Vidya Srinivas 
---
 tests/drm_read.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/drm_read.c b/tests/drm_read.c
index ccf9d822fd8d..a8816bc1e587 100644
--- a/tests/drm_read.c
+++ b/tests/drm_read.c
@@ -106,8 +106,8 @@ static void test_invalid_buffer(int in)
 
alarm(1);
 
-   igt_assert_eq(read(fd, (void *)-1, 4096), -1);
-   igt_assert_eq(errno, EFAULT);
+   igt_assert_eq(read(fd, (void *)NULL, 4096), -1);
+   igt_assert_eq(errno, EINTR);
 
teardown(fd);
 }
-- 
2.7.4



[PATCH i-g-t] tests/kms_big_fb: Wait for vblank before collecting CRC

2021-05-20 Thread Vidya Srinivas
Without wait for vblank, CRC mismatch is seen
between big and small CRC on few systems

Change-Id: I3bec931aa901130997e693ac1cacf389e2a8100f
Signed-off-by: Vidya Srinivas 
---
 tests/kms_big_fb.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/kms_big_fb.c b/tests/kms_big_fb.c
index b2027b6b9d1b..7d78ff829d41 100644
--- a/tests/kms_big_fb.c
+++ b/tests/kms_big_fb.c
@@ -254,6 +254,7 @@ static void unset_lut(data_t *data)
 static bool test_plane(data_t *data)
 {
igt_plane_t *plane = data->plane;
+   igt_display_t *display = >display;
struct igt_fb *small_fb = >small_fb;
struct igt_fb *big_fb = >big_fb;
int w = data->big_fb_width - small_fb->width;
@@ -337,16 +338,17 @@ static bool test_plane(data_t *data)
igt_display_commit2(>display, data->display.is_atomic ?
COMMIT_ATOMIC : COMMIT_UNIVERSAL);
 
-
+   igt_wait_for_vblank(data->drm_fd, 
display->pipes[data->pipe].crtc_offset);
igt_pipe_crc_collect_crc(data->pipe_crc, _crc);
 
igt_plane_set_fb(plane, big_fb);
igt_fb_set_position(big_fb, plane, x, y);
igt_fb_set_size(big_fb, plane, small_fb->width, 
small_fb->height);
+
igt_plane_set_size(plane, data->width, data->height);
igt_display_commit2(>display, data->display.is_atomic ?
COMMIT_ATOMIC : COMMIT_UNIVERSAL);
-
+   igt_wait_for_vblank(data->drm_fd, 
display->pipes[data->pipe].crtc_offset);
igt_pipe_crc_collect_crc(data->pipe_crc, _crc);
 
igt_plane_set_fb(plane, NULL);
-- 
2.7.4



Re: [PATCH v8 8/8] nouveau/svm: Implement atomic SVM access

2021-05-20 Thread Ben Skeggs
On Wed, 7 Apr 2021 at 18:43, Alistair Popple  wrote:
>
> Some NVIDIA GPUs do not support direct atomic access to system memory
> via PCIe. Instead this must be emulated by granting the GPU exclusive
> access to the memory. This is achieved by replacing CPU page table
> entries with special swap entries that fault on userspace access.
>
> The driver then grants the GPU permission to update the page undergoing
> atomic access via the GPU page tables. When CPU access to the page is
> required a CPU fault is raised which calls into the device driver via
> MMU notifiers to revoke the atomic access. The original page table
> entries are then restored allowing CPU access to proceed.
>
> Signed-off-by: Alistair Popple 
The Nouveau bits at least look good to me.

For patches 7/8:
Reviewed-by: Ben Skeggs 

>
> ---
>
> v7:
> * Removed magic values for fault access levels
> * Improved readability of fault comparison code
>
> v4:
> * Check that page table entries haven't changed before mapping on the
>   device
> ---
>  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 126 --
>  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
>  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|   6 +
>  4 files changed, 123 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h 
> b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
> index d6dd40f21eed..9c7ff56831c5 100644
> --- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
> +++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
> @@ -77,6 +77,7 @@ struct nvif_vmm_pfnmap_v0 {
>  #define NVIF_VMM_PFNMAP_V0_APER   
> 0x00f0ULL
>  #define NVIF_VMM_PFNMAP_V0_HOST   
> 0xULL
>  #define NVIF_VMM_PFNMAP_V0_VRAM   
> 0x0010ULL
> +#define NVIF_VMM_PFNMAP_V0_A 
> 0x0004ULL
>  #define NVIF_VMM_PFNMAP_V0_W  
> 0x0002ULL
>  #define NVIF_VMM_PFNMAP_V0_V  
> 0x0001ULL
>  #define NVIF_VMM_PFNMAP_V0_NONE   
> 0xULL
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
> b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index a195e48c9aee..81526d65b4e2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct nouveau_svm {
> struct nouveau_drm *drm;
> @@ -67,6 +68,11 @@ struct nouveau_svm {
> } buffer[1];
>  };
>
> +#define FAULT_ACCESS_READ 0
> +#define FAULT_ACCESS_WRITE 1
> +#define FAULT_ACCESS_ATOMIC 2
> +#define FAULT_ACCESS_PREFETCH 3
> +
>  #define SVM_DBG(s,f,a...) NV_DEBUG((s)->drm, "svm: "f"\n", ##a)
>  #define SVM_ERR(s,f,a...) NV_WARN((s)->drm, "svm: "f"\n", ##a)
>
> @@ -411,6 +417,24 @@ nouveau_svm_fault_cancel_fault(struct nouveau_svm *svm,
>   fault->client);
>  }
>
> +static int
> +nouveau_svm_fault_priority(u8 fault)
> +{
> +   switch (fault) {
> +   case FAULT_ACCESS_PREFETCH:
> +   return 0;
> +   case FAULT_ACCESS_READ:
> +   return 1;
> +   case FAULT_ACCESS_WRITE:
> +   return 2;
> +   case FAULT_ACCESS_ATOMIC:
> +   return 3;
> +   default:
> +   WARN_ON_ONCE(1);
> +   return -1;
> +   }
> +}
> +
>  static int
>  nouveau_svm_fault_cmp(const void *a, const void *b)
>  {
> @@ -421,9 +445,8 @@ nouveau_svm_fault_cmp(const void *a, const void *b)
> return ret;
> if ((ret = (s64)fa->addr - fb->addr))
> return ret;
> -   /*XXX: atomic? */
> -   return (fa->access == 0 || fa->access == 3) -
> -  (fb->access == 0 || fb->access == 3);
> +   return nouveau_svm_fault_priority(fa->access) -
> +   nouveau_svm_fault_priority(fb->access);
>  }
>
>  static void
> @@ -487,6 +510,10 @@ static bool nouveau_svm_range_invalidate(struct 
> mmu_interval_notifier *mni,
> struct svm_notifier *sn =
> container_of(mni, struct svm_notifier, notifier);
>
> +   if (range->event == MMU_NOTIFY_EXCLUSIVE &&
> +   range->owner == sn->svmm->vmm->cli->drm->dev)
> +   return true;
> +
> /*
>  * serializes the update to mni->invalidate_seq done by caller and
>  * prevents invalidation of the PTE from progressing while HW is being
> @@ -555,6 +582,71 @@ static void nouveau_hmm_convert_pfn(struct nouveau_drm 
> *drm,
> args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W;
>  }
>
> +static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
> +  struct nouveau_drm *drm,
> +  struct nouveau_pfnmap_args *args, u32 size,
> +  struct svm_notifier 

Re: NVIDIA GPU fallen off the bus after exiting s2idle

2021-05-20 Thread Chris Chiu
On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki  wrote:
>
> On Tue, May 4, 2021 at 10:08 AM Chris Chiu  wrote:
> >
> > Hi,
> > We have some Intel laptops (11th generation CPU) with NVIDIA GPU
> > suffering the same GPU falling off the bus problem while exiting
> > s2idle with external display connected. These laptops connect the
> > external display via the HDMI/DisplayPort on a USB Type-C interfaced
> > dock. If we enter and exit s2idle with the dock connected, the NVIDIA
> > GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come
> > back to D0 w/o problem. If we enter the s2idle, disconnect the dock,
> > then exit the s2idle, both external display and the panel will remain
> > with no output. The dmesg as follows shows the "nvidia :01:00.0:
> > can't change power state from D3cold to D0 (config space
> > inaccessible)" due to the following ACPI error
> > [ 154.446781]
> > [ 154.446783]
> > [ 154.446783] Initialized Local Variables for Method [IPCS]:
> > [ 154.446784] Local0: 9863e365  Integer 09C5
> > [ 154.446790]
> > [ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments
> > defined for method invocation)
> > [ 154.446792] Arg0: 25568fbd  Integer 00AC
> > [ 154.446795] Arg1: 9ef30e76  Integer 
> > [ 154.446798] Arg2: fdf820f0  Integer 0010
> > [ 154.446801] Arg3: 9fc2a088  Integer 0001
> > [ 154.446804] Arg4: 3a3418f7  Integer 0001
> > [ 154.446807] Arg5: 20c4b87c  Integer 
> > [ 154.446810] Arg6: 8b965a8a  Integer 
> > [ 154.446813]
> > [ 154.446815] ACPI Error: Aborting method \IPCS due to previous error
> > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446824] ACPI Error: Aborting method \MCUI due to previous error
> > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446829] ACPI Error: Aborting method \SPCX due to previous error
> > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to
> > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to
> > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to
> > previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due
> > to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
> > [ 154.446860] acpi device:02: Failed to change power state to D0
> > [ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0
> > for parent in (unknown)
>
> If I were to guess, I would say that AML tries to access memory that
> is not accessible while suspended, probably PCI config space.
>
> > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
> > which we expect it to prepare everything before bringing back the
> > NVIDIA GPU but it's stuck in the infinite loop as described below.
> > Please refer to
> > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for
> > the full DSDT.dsl.
>
> The DSDT alone may not be sufficient.
>
> Can you please create a bug entry at bugzilla.kernel.org for this
> issue and attach the full output of acpidump from one of the affected
> machines to it?  And please let me know the number of the bug.
>
> Also please attach the output of dmesg including a suspend-resume
> cycle including dock disconnection while suspended and the ACPI
> messages quoted below.
>
> >While (One)
> > {
> > If ((!IBSY || (IERR == One)))
> > {
> > Break
> > }
> >
> > If ((Local0 > TMOV))
> > {
> > RPKG [Zero] = 0x03
> > Return (RPKG) /* \IPCS.RPKG */
> > }
> >
> > Sleep (One)
> > Local0++
> > }
> >
> > And the upstream PCIe port of NVIDIA seems to become inaccessible due
> > to the messages as follows.
> > [ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream
> > link, after activation
> > [ 292.882296] pci :01:00.0: waiting additional 100 ms to become 
> > accessible
> > [ 316.876997] pci :01:00.0: can't change power state from D3cold
> > to D0 (config space inaccessible)
> >
> > Since the IPCS is the Intel Reference Code and we don't really know
> > why the never-end loop happens just because we unplug the dock while
> > the system still stays in s2idle. Can anyone from Intel suggest what
> > happens here?
>
> This list is not the right channel for inquiries related to Intel
> support, we can only help you as Linux kernel developers in this
> venue.
>
> > And one thing also worth mentioning, if we unplug the display cable
> > from the dock before entering the s2idle, NVIDIA GPU can 

Re: [PATCH v8 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-05-20 Thread Alistair Popple
On Friday, 21 May 2021 6:24:27 AM AEST Liam Howlett wrote:

[...]
 
> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index 977e70803ed8..f09d522725b9 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -1405,10 +1405,6 @@ static bool try_to_unmap_one(struct page *page,
> > > > struct vm_area_struct *vma,>
> > > > 
> > > >   struct mmu_notifier_range range;
> > > >   enum ttu_flags flags = (enum ttu_flags)(long)arg;
> > > > 
> > > > - /* munlock has nothing to gain from examining un-locked vmas */
> > > > - if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
> > > > - return true;
> > > > -
> > > > 
> > > >   if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> > > >   
> > > >   is_zone_device_page(page) && !is_device_private_page(page))
> > > >   
> > > >   return true;
> > > > 
> > > > @@ -1469,8 +1465,6 @@ static bool try_to_unmap_one(struct page *page,
> > > > struct vm_area_struct *vma,>
> > > > 
> > > >   page_vma_mapped_walk_done();
> > > >   break;
> > > >   
> > > >   }
> > > > 
> > > > - if (flags & TTU_MUNLOCK)
> > > > - continue;
> > > > 
> > > >   }
> > > >   
> > > >   /* Unexpected PMD-mapped THP? */
> > > > 
> > > > @@ -1784,8 +1778,39 @@ bool try_to_unmap(struct page *page, enum
> > > > ttu_flags
> > > > flags)>
> > > > 
> > > >   return !page_mapcount(page) ? true : false;
> > > >  
> > > >  }
> > > 
> > > Please add a comment here, especially around locking.
> 
> Did you miss this comment?  I think the name confusion alone means this
> needs some documentation.  It's also worth mentioning arg is unused.

Ack. Was meant to come back to that after discussing some of the locking 
questions below. The other side effect of splitting this code out is it leaves 
space for more specific documentation which is only a good thing. I will try 
and summarise some of the discussion below into a comment here.

> > > > +static bool page_mlock_one(struct page *page, struct vm_area_struct
> > > > *vma,
> > > > +  unsigned long address, void *arg)
> > > > +{
> > > > + struct page_vma_mapped_walk pvmw = {
> > > > + .page = page,
> > > > + .vma = vma,
> > > > + .address = address,
> > > > + };
> > > > +
> > > > + /* munlock has nothing to gain from examining un-locked vmas */
> > > > + if (!(vma->vm_flags & VM_LOCKED))
> > > > + return true;
> > > 
> > > The logic here doesn't make sense.  You called page_mlock_one() on a VMA
> > > that isn't locked and it returns true?  Is this a check to see if the
> > > VMA has zero mlock'ed pages?
> > 
> > I'm pretty sure the logic is correct. This is used for an rmap_walk, so we
> > return true to continue the page table scan to see if other VMAs have the
> > page locked.
> 
> yes, sorry.  The logic is correct but doesn't read as though it does.
> I cannot see what is going on easily and there are no comments stating
> what is happening.

Thanks for confirming. The documentation in Documentation/vm/unevictable-
lru.rst is helpful for higher level context but I will put some comments here 
around the logic.

> > > > +
> > > > + while (page_vma_mapped_walk()) {
> > > > + /* PTE-mapped THP are never mlocked */
> > > > + if (!PageTransCompound(page)) {
> > > > + /*
> > > > +  * Holding pte lock, we do *not* need
> > > > +  * mmap_lock here
> > > > +  */
> > > 
> > > Are you sure?  I think you at least need to hold the mmap lock for
> > > reading to ensure there's no race here?  mlock_vma_page() eludes to such
> > > a scenario when lazy mlocking.
> > 
> > Not really. I don't claim to be an mlock expert but as this is a clean-up
> > for try_to_unmap() the intent was to not change existing behaviour.
> > 
> > However presenting the function in this simplified form did raise this and
> > some other questions during previous reviews - see
> > https://lore.kernel.org/
> > dri-devel/20210331115746.ga1463...@nvidia.com/ for the previous
> > discussion.
> 
> From what I can see, at least the following paths have mmap_lock held
> for writing:
> 
> munlock_vma_pages_range() from __do_munmap()
> munlokc_vma_pages_range() from remap_file_pages()
> 
> > To answer the questions around locking though I did do some git sha1
> > mining. The best explanation seemed to be contained in
> > https://git.kernel.org/pub/scm/
> > linux/kernel/git/torvalds/linux.git/commit/?
> > id=b87537d9e2feb30f6a962f27eb32768682698d3b from Hugh (whom I've added
> > again here in case he can help answer some of these).
> 
> Thanks for the pointer.  That race doesn't make the lock unnecessary.
> It is the exception to the rule because the 

Re: [PATCH v2 1/3] dt-bindings: display: convert faraday,tve200

2021-05-20 Thread Rob Herring
On Wed, 19 May 2021 20:35:45 +, Corentin Labbe wrote:
> Converts display/faraday,tve200.txt to yaml.
> 
> Signed-off-by: Corentin Labbe 
> ---
> Changes since v1:
> - added two subsequent patchs fixing issue found when converting
> - fixed all issues reported by Rob Herring
>  .../bindings/display/faraday,tve200.txt   | 54 ---
>  .../bindings/display/faraday,tve200.yaml  | 68 +++
>  2 files changed, 68 insertions(+), 54 deletions(-)
>  delete mode 100644 
> Documentation/devicetree/bindings/display/faraday,tve200.txt
>  create mode 100644 
> Documentation/devicetree/bindings/display/faraday,tve200.yaml
> 

Reviewed-by: Rob Herring 


linux-next: build failure after merge of the drm-intel tree

2021-05-20 Thread Stephen Rothwell
Hi all,

After merging the drm-intel tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/gpu/drm/i915/gvt/handlers.c: In function 'init_skl_mmio_info':
drivers/gpu/drm/i915/gvt/handlers.c:3345:9: error: 'CSR_SSP_BASE' undeclared 
(first use in this function); did you mean 'DMC_SSP_BASE'?
 3345 |  MMIO_D(CSR_SSP_BASE, D_SKL_PLUS);
  | ^~~~
drivers/gpu/drm/i915/gvt/handlers.c:2120:48: note: in definition of macro 
'MMIO_F'
 2120 |  ret = new_mmio_info(gvt, i915_mmio_reg_offset(reg), \
  |^~~
drivers/gpu/drm/i915/gvt/handlers.c:3345:2: note: in expansion of macro 'MMIO_D'
 3345 |  MMIO_D(CSR_SSP_BASE, D_SKL_PLUS);
  |  ^~
drivers/gpu/drm/i915/gvt/handlers.c:3345:9: note: each undeclared identifier is 
reported only once for each function it appears in
 3345 |  MMIO_D(CSR_SSP_BASE, D_SKL_PLUS);
  | ^~~~
drivers/gpu/drm/i915/gvt/handlers.c:2120:48: note: in definition of macro 
'MMIO_F'
 2120 |  ret = new_mmio_info(gvt, i915_mmio_reg_offset(reg), \
  |^~~
drivers/gpu/drm/i915/gvt/handlers.c:3345:2: note: in expansion of macro 'MMIO_D'
 3345 |  MMIO_D(CSR_SSP_BASE, D_SKL_PLUS);
  |  ^~
drivers/gpu/drm/i915/gvt/handlers.c:3346:9: error: 'CSR_HTP_SKL' undeclared 
(first use in this function); did you mean 'DMC_HTP_SKL'?
 3346 |  MMIO_D(CSR_HTP_SKL, D_SKL_PLUS);
  | ^~~
drivers/gpu/drm/i915/gvt/handlers.c:2120:48: note: in definition of macro 
'MMIO_F'
 2120 |  ret = new_mmio_info(gvt, i915_mmio_reg_offset(reg), \
  |^~~
drivers/gpu/drm/i915/gvt/handlers.c:3346:2: note: in expansion of macro 'MMIO_D'
 3346 |  MMIO_D(CSR_HTP_SKL, D_SKL_PLUS);
  |  ^~
drivers/gpu/drm/i915/gvt/handlers.c:3347:9: error: 'CSR_LAST_WRITE' undeclared 
(first use in this function); did you mean 'DMC_LAST_WRITE'?
 3347 |  MMIO_D(CSR_LAST_WRITE, D_SKL_PLUS);
  | ^~
drivers/gpu/drm/i915/gvt/handlers.c:2120:48: note: in definition of macro 
'MMIO_F'
 2120 |  ret = new_mmio_info(gvt, i915_mmio_reg_offset(reg), \
  |^~~
drivers/gpu/drm/i915/gvt/handlers.c:3347:2: note: in expansion of macro 'MMIO_D'
 3347 |  MMIO_D(CSR_LAST_WRITE, D_SKL_PLUS);
  |  ^~
In file included from drivers/gpu/drm/i915/i915_drv.h:64,
 from drivers/gpu/drm/i915/gvt/handlers.c:39:
drivers/gpu/drm/i915/gvt/handlers.c: At top level:
drivers/gpu/drm/i915/gvt/handlers.c:3658:21: error: 'CSR_MMIO_START_RANGE' 
undeclared here (not in a function); did you mean 'DMC_MMIO_START_RANGE'?
 3658 |  {D_SKL_PLUS, _MMIO(CSR_MMIO_START_RANGE), 0x3000, NULL, NULL},
  | ^~~~
drivers/gpu/drm/i915/i915_reg.h:185:47: note: in definition of macro '_MMIO'
  185 | #define _MMIO(r) ((const i915_reg_t){ .reg = (r) })
  |   ^

Caused by commit

  0633cdcbaa77 ("drm/i915/dmc: Rename macro names containing csr")

I have used the drm-intel tree from next-20210520 for today.

-- 
Cheers,
Stephen Rothwell


pgpVjquE6r60Z.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the drm-intel tree with Linus' tree

2021-05-20 Thread Stephen Rothwell
Hi all,

On Thu, 20 May 2021 10:19:10 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the drm-intel tree got a conflict in:
> 
>   drivers/gpu/drm/i915/i915_mm.c
> 
> between commit:
> 
>   293837b9ac8d ("Revert "i915: fix remap_io_sg to verify the pgprot"")
> 
> from Linus' tree and commit:
> 
>   ec279384c6a0 ("drm/i915: Initialize err in remap_io_sg()")
> 
> from the drm-intel tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> 
> diff --cc drivers/gpu/drm/i915/i915_mm.c
> index 9a777b0ff59b,25576fa73ff0..
> --- a/drivers/gpu/drm/i915/i915_mm.c
> +++ b/drivers/gpu/drm/i915/i915_mm.c
> @@@ -82,13 -46,8 +82,13 @@@ int remap_io_sg(struct vm_area_struct *
>   unsigned long addr, unsigned long size,
>   struct scatterlist *sgl, resource_size_t iobase)
>   {
>  -unsigned long pfn, len, remapped = 0;
>  +struct remap_pfn r = {
>  +.mm = vma->vm_mm,
>  +.prot = vma->vm_page_prot,
>  +.sgt = __sgt_iter(sgl, use_dma(iobase)),
>  +.iobase = iobase,
>  +};
> - int err;
> + int err = 0;
>   
>   /* We rely on prevalidation of the io-mapping to skip track_pfn(). */
>   GEM_BUG_ON((vma->vm_flags & EXPECTED_FLAGS) != EXPECTED_FLAGS);

This is now a conflict between the drm tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgp4N0bzNmfeF.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the amdgpu tree with the drm-misc tree

2021-05-20 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the amdgpu tree got a conflict in:

  drivers/gpu/drm/amd/amdkfd/kfd_device.c

between commit:

  e9669fb78262 ("drm/amdgpu: Add early fini callback")

from the drm-misc tree and commit:

  814ab9930cfd ("drm/amdkfd: register HMM device private zone")

from the amdgpu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/amdkfd/kfd_device.c
index b066aa009b6f,80015e866498..
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@@ -861,6 -891,8 +891,7 @@@ out
  void kgd2kfd_device_exit(struct kfd_dev *kfd)
  {
if (kfd->init_complete) {
 -  kgd2kfd_suspend(kfd, false);
+   svm_migrate_fini((struct amdgpu_device *)kfd->kgd);
device_queue_manager_uninit(kfd->dqm);
kfd_interrupt_exit(kfd);
kfd_topology_remove_device(kfd);


pgpujlOCUjRiT.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the amdgpu tree with the drm-misc tree

2021-05-20 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the amdgpu tree got a conflict in:

  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

between commit:

  35bba8313b95 ("drm/amdgpu: Convert driver sysfs attributes to static 
attributes")

from the drm-misc tree and commit:

  589939d40116 ("drm/amdgpu: fix coding style and documentation in 
amdgpu_vram_mgr.c")

from the amdgpu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index a99d196b05df,a52e17ed3df6..
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@@ -162,74 -181,6 +181,10 @@@ static struct attribute *amdgpu_vram_mg
NULL
  };
  
 +const struct attribute_group amdgpu_vram_mgr_attr_group = {
 +  .attrs = amdgpu_vram_mgr_attributes
 +};
 +
- static const struct ttm_resource_manager_func amdgpu_vram_mgr_func;
- 
- /**
-  * amdgpu_vram_mgr_init - init VRAM manager and DRM MM
-  *
-  * @adev: amdgpu_device pointer
-  *
-  * Allocate and initialize the VRAM manager.
-  */
- int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
- {
-   struct amdgpu_vram_mgr *mgr = >mman.vram_mgr;
-   struct ttm_resource_manager *man = >manager;
- 
-   ttm_resource_manager_init(man, adev->gmc.real_vram_size >> PAGE_SHIFT);
- 
-   man->func = _vram_mgr_func;
- 
-   drm_mm_init(>mm, 0, man->size);
-   spin_lock_init(>lock);
-   INIT_LIST_HEAD(>reservations_pending);
-   INIT_LIST_HEAD(>reserved_pages);
- 
-   ttm_set_driver_manager(>mman.bdev, TTM_PL_VRAM, >manager);
-   ttm_resource_manager_set_used(man, true);
-   return 0;
- }
- 
- /**
-  * amdgpu_vram_mgr_fini - free and destroy VRAM manager
-  *
-  * @adev: amdgpu_device pointer
-  *
-  * Destroy and free the VRAM manager, returns -EBUSY if ranges are still
-  * allocated inside it.
-  */
- void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
- {
-   struct amdgpu_vram_mgr *mgr = >mman.vram_mgr;
-   struct ttm_resource_manager *man = >manager;
-   int ret;
-   struct amdgpu_vram_reservation *rsv, *temp;
- 
-   ttm_resource_manager_set_used(man, false);
- 
-   ret = ttm_resource_manager_evict_all(>mman.bdev, man);
-   if (ret)
-   return;
- 
-   spin_lock(>lock);
-   list_for_each_entry_safe(rsv, temp, >reservations_pending, node)
-   kfree(rsv);
- 
-   list_for_each_entry_safe(rsv, temp, >reserved_pages, node) {
-   drm_mm_remove_node(>mm_node);
-   kfree(rsv);
-   }
-   drm_mm_takedown(>mm);
-   spin_unlock(>lock);
- 
-   ttm_resource_manager_cleanup(man);
-   ttm_set_driver_manager(>mman.bdev, TTM_PL_VRAM, NULL);
- }
- 
  /**
   * amdgpu_vram_mgr_vis_size - Calculate visible node size
   *
@@@ -444,10 -396,10 +400,10 @@@ static int amdgpu_vram_mgr_new(struct t
pages_per_node = HPAGE_PMD_NR;
  #else
/* default to 2MB */
-   pages_per_node = (2UL << (20UL - PAGE_SHIFT));
+   pages_per_node = 2UL << (20UL - PAGE_SHIFT);
  #endif
-   pages_per_node = max((uint32_t)pages_per_node,
-tbo->page_alignment);
+   pages_per_node = max_t(uint32_t, pages_per_node,
 - mem->page_alignment);
++ tbo->page_alignment);
num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
}
  
@@@ -465,38 -417,29 +421,29 @@@
mem->start = 0;
pages_left = mem->num_pages;
  
-   spin_lock(>lock);
-   for (i = 0; pages_left >= pages_per_node; ++i) {
-   unsigned long pages = rounddown_pow_of_two(pages_left);
- 
-   /* Limit maximum size to 2GB due to SG table limitations */
-   pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
- 
-   r = drm_mm_insert_node_in_range(mm, [i], pages,
-   pages_per_node, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
-   break;
- 
-   vis_usage += amdgpu_vram_mgr_vis_size(adev, [i]);
-   amdgpu_vram_mgr_virt_start(mem, [i]);
-   pages_left -= pages;
-   }
+   /* Limit maximum size to 2GB due to SG table limitations */
+   pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
  
-   for (; pages_left; ++i) {
-   unsigned long pages = min(pages_left, pages_per_node);
+   i = 0;
+   

linux-next: manual merge of the amdgpu tree with the drm-misc tree

2021-05-20 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the amdgpu tree got a conflict in:

  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

between commit:

  f89f8c6bafd0 ("drm/amdgpu: Guard against write accesses after device removal")

from the drm-misc tree and commit:

  0ccc3ccf5b3a ("drm/amdgpu: re-apply "use the new cursor in the VM code" v2")

from the amdgpu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 90c34491f85d,57a6ad04118c..
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@@ -1594,23 -1618,21 +1620,24 @@@ static int amdgpu_vm_update_ptes(struc
   * Returns:
   * 0 for success, -EINVAL for failure.
   */
- static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
-  struct amdgpu_device *bo_adev,
-  struct amdgpu_vm *vm, bool immediate,
-  bool unlocked, struct dma_resv *resv,
-  uint64_t start, uint64_t last,
-  uint64_t flags, uint64_t offset,
-  struct drm_mm_node *nodes,
-  dma_addr_t *pages_addr,
-  struct dma_fence **fence)
+ int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
+   struct amdgpu_device *bo_adev,
+   struct amdgpu_vm *vm, bool immediate,
+   bool unlocked, struct dma_resv *resv,
+   uint64_t start, uint64_t last,
+   uint64_t flags, uint64_t offset,
+   struct ttm_resource *res,
+   dma_addr_t *pages_addr,
+   struct dma_fence **fence,
+   bool *table_freed)
  {
struct amdgpu_vm_update_params params;
+   struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
-   uint64_t pfn;
 -  int r;
 +  int r, idx;
 +
 +  if (!drm_dev_enter(>ddev, ))
 +  return -ENODEV;
  
memset(, 0, sizeof(params));
params.adev = adev;
@@@ -1717,9 -1722,11 +1727,12 @@@
  
r = vm->update_funcs->commit(, fence);
  
+   if (table_freed)
+   *table_freed = params.table_freed;
+ 
  error_unlock:
amdgpu_vm_eviction_unlock(vm);
 +  drm_dev_exit(idx);
return r;
  }
  


pgpVaLtLGjXFd.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the amdgpu tree with the drm-misc tree

2021-05-20 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the amdgpu tree got a conflict in:

  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c

between commit:

  35bba8313b95 ("drm/amdgpu: Convert driver sysfs attributes to static 
attributes")

from the drm-misc tree and commit:

  a614b336f1c1 ("drm/amdgpu: fix coding style and documentation in 
amdgpu_gtt_mgr.c")

from the amdgpu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index a4404da8ca6d,8860545344c7..
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@@ -75,75 -80,6 +80,16 @@@ static DEVICE_ATTR(mem_info_gtt_total, 
  static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
   amdgpu_mem_info_gtt_used_show, NULL);
  
 +static struct attribute *amdgpu_gtt_mgr_attributes[] = {
 +  _attr_mem_info_gtt_total.attr,
 +  _attr_mem_info_gtt_used.attr,
 +  NULL
 +};
 +
 +const struct attribute_group amdgpu_gtt_mgr_attr_group = {
 +  .attrs = amdgpu_gtt_mgr_attributes
 +};
 +
- static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func;
- /**
-  * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
-  *
-  * @adev: amdgpu_device pointer
-  * @gtt_size: maximum size of GTT
-  *
-  * Allocate and initialize the GTT manager.
-  */
- int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
- {
-   struct amdgpu_gtt_mgr *mgr = >mman.gtt_mgr;
-   struct ttm_resource_manager *man = >manager;
-   uint64_t start, size;
- 
-   man->use_tt = true;
-   man->func = _gtt_mgr_func;
- 
-   ttm_resource_manager_init(man, gtt_size >> PAGE_SHIFT);
- 
-   start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
-   size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
-   drm_mm_init(>mm, start, size);
-   spin_lock_init(>lock);
-   atomic64_set(>available, gtt_size >> PAGE_SHIFT);
- 
-   ttm_set_driver_manager(>mman.bdev, TTM_PL_TT, >manager);
-   ttm_resource_manager_set_used(man, true);
-   return 0;
- }
- 
- /**
-  * amdgpu_gtt_mgr_fini - free and destroy GTT manager
-  *
-  * @adev: amdgpu_device pointer
-  *
-  * Destroy and free the GTT manager, returns -EBUSY if ranges are still
-  * allocated inside it.
-  */
- void amdgpu_gtt_mgr_fini(struct amdgpu_device *adev)
- {
-   struct amdgpu_gtt_mgr *mgr = >mman.gtt_mgr;
-   struct ttm_resource_manager *man = >manager;
-   int ret;
- 
-   ttm_resource_manager_set_used(man, false);
- 
-   ret = ttm_resource_manager_evict_all(>mman.bdev, man);
-   if (ret)
-   return;
- 
-   spin_lock(>lock);
-   drm_mm_takedown(>mm);
-   spin_unlock(>lock);
- 
-   ttm_resource_manager_cleanup(man);
-   ttm_set_driver_manager(>mman.bdev, TTM_PL_TT, NULL);
- }
- 
  /**
   * amdgpu_gtt_mgr_has_gart_addr - Check if mem has address space
   *
@@@ -306,3 -249,76 +259,61 @@@ static const struct ttm_resource_manage
.free = amdgpu_gtt_mgr_del,
.debug = amdgpu_gtt_mgr_debug
  };
+ 
+ /**
+  * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
+  *
+  * @adev: amdgpu_device pointer
+  * @gtt_size: maximum size of GTT
+  *
+  * Allocate and initialize the GTT manager.
+  */
+ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
+ {
+   struct amdgpu_gtt_mgr *mgr = >mman.gtt_mgr;
+   struct ttm_resource_manager *man = >manager;
+   uint64_t start, size;
 -  int ret;
+ 
+   man->use_tt = true;
+   man->func = _gtt_mgr_func;
+ 
+   ttm_resource_manager_init(man, gtt_size >> PAGE_SHIFT);
+ 
+   start = AMDGPU_GTT_MAX_TRANSFER_SIZE * AMDGPU_GTT_NUM_TRANSFER_WINDOWS;
+   size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
+   drm_mm_init(>mm, start, size);
+   spin_lock_init(>lock);
+   atomic64_set(>available, gtt_size >> PAGE_SHIFT);
+ 
 -  ret = device_create_file(adev->dev, _attr_mem_info_gtt_total);
 -  if (ret) {
 -  DRM_ERROR("Failed to create device file mem_info_gtt_total\n");
 -  return ret;
 -  }
 -  ret = device_create_file(adev->dev, _attr_mem_info_gtt_used);
 -  if (ret) {
 -  DRM_ERROR("Failed to create device file mem_info_gtt_used\n");
 -  return ret;
 -  }
 -
+   ttm_set_driver_manager(>mman.bdev, TTM_PL_TT, >manager);
+   ttm_resource_manager_set_used(man, true);
+   return 0;
+ }
+ 
+ /**
+  * amdgpu_gtt_mgr_fini - free and destroy GTT manager
+  *
+  * @adev: amdgpu_device pointer
+  *
+  * Destroy and free the GTT manager, returns -EBUSY if ranges are still
+  * 

Re: [PATCH 3/7] component: Introduce struct aggregate_device

2021-05-20 Thread Saravana Kannan
On Wed, May 19, 2021 at 5:25 PM Stephen Boyd  wrote:
>
> Replace 'struct master' with 'struct aggregate_device' and then rename
> 'master' to 'adev' everywhere in the code. While we're here, put a
> struct device inside the aggregate device so that we can register it
> with a bus_type in the next patch.
>
> The diff is large but that's because this is mostly a rename, where
> sometimes 'master' is replaced with 'adev' and other times it is
> replaced with 'parent' to indicate that the struct device that was being
> used is actually the parent of the aggregate device and driver.
>
> Cc: Daniel Vetter 
> Cc: "Rafael J. Wysocki" 
> Cc: Rob Clark 
> Cc: Russell King 
> Cc: Saravana Kannan 
> Signed-off-by: Stephen Boyd 
> ---
>  drivers/base/component.c  | 249 --
>  include/linux/component.h |   2 +-
>  2 files changed, 134 insertions(+), 117 deletions(-)
>
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 5e79299f6c3f..55e30e0b0952 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -9,6 +9,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -58,18 +59,21 @@ struct component_match {
> struct component_match_array *compare;
>  };
>
> -struct master {
> +struct aggregate_device {
> struct list_head node;
> bool bound;
>
> const struct component_master_ops *ops;
> struct device *parent;
> +   struct device dev;
> struct component_match *match;
> +
> +   int id;
>  };
>
>  struct component {
> struct list_head node;
> -   struct master *master;
> +   struct aggregate_device *adev;
> bool bound;
>
> const struct component_ops *ops;
> @@ -79,7 +83,9 @@ struct component {
>
>  static DEFINE_MUTEX(component_mutex);
>  static LIST_HEAD(component_list);
> -static LIST_HEAD(masters);
> +static LIST_HEAD(aggregate_devices);
> +
> +static DEFINE_IDA(aggregate_ida);
>
>  #ifdef CONFIG_DEBUG_FS
>
> @@ -87,12 +93,12 @@ static struct dentry *component_debugfs_dir;
>
>  static int component_devices_show(struct seq_file *s, void *data)
>  {
> -   struct master *m = s->private;
> +   struct aggregate_device *m = s->private;
> struct component_match *match = m->match;
> size_t i;
>
> mutex_lock(_mutex);
> -   seq_printf(s, "%-40s %20s\n", "master name", "status");
> +   seq_printf(s, "%-40s %20s\n", "aggregate_device name", "status");
> seq_puts(s, 
> "-\n");
> seq_printf(s, "%-40s %20s\n\n",
>dev_name(m->parent), m->bound ? "bound" : "not bound");
> @@ -122,46 +128,46 @@ static int __init component_debug_init(void)
>
>  core_initcall(component_debug_init);
>
> -static void component_master_debugfs_add(struct master *m)
> +static void component_master_debugfs_add(struct aggregate_device *m)
>  {
> debugfs_create_file(dev_name(m->parent), 0444, component_debugfs_dir, 
> m,
> _devices_fops);
>  }
>
> -static void component_master_debugfs_del(struct master *m)
> +static void component_master_debugfs_del(struct aggregate_device *m)
>  {
> debugfs_remove(debugfs_lookup(dev_name(m->parent), 
> component_debugfs_dir));
>  }
>
>  #else
>
> -static void component_master_debugfs_add(struct master *m)
> +static void component_master_debugfs_add(struct aggregate_device *m)
>  { }
>
> -static void component_master_debugfs_del(struct master *m)
> +static void component_master_debugfs_del(struct aggregate_device *m)
>  { }
>
>  #endif
>
> -static struct master *__master_find(struct device *parent,
> +static struct aggregate_device *__aggregate_find(struct device *parent,
> const struct component_master_ops *ops)
>  {
> -   struct master *m;
> +   struct aggregate_device *m;
>
> -   list_for_each_entry(m, , node)
> +   list_for_each_entry(m, _devices, node)
> if (m->parent == parent && (!ops || m->ops == ops))
> return m;
>
> return NULL;
>  }
>
> -static struct component *find_component(struct master *master,
> +static struct component *find_component(struct aggregate_device *adev,
> struct component_match_array *mc)
>  {
> struct component *c;
>
> list_for_each_entry(c, _list, node) {
> -   if (c->master && c->master != master)
> +   if (c->adev && c->adev != adev)
> continue;
>
> if (mc->compare && mc->compare(c->dev, mc->data))
> @@ -175,101 +181,102 @@ static struct component *find_component(struct master 
> *master,
> return NULL;
>  }
>
> -static int find_components(struct master *master)
> +static int find_components(struct aggregate_device *adev)
>  {
> -   struct component_match *match = master->match;
> +   struct component_match *match = adev->match;
> size_t i;

Freenode fallout

2021-05-20 Thread Lyude Paul
Hi everyone! As I'm sure most of you heard by now, Freenode's staff has
had a falling out and it's been recommended by their staff that projects
consider the network a hostile entity. I won't go into the details here,
but those who are interested can read up here:

https://lwn.net/Articles/856543/

At the moment, the vast majority of IRC channels for various Freedesktop
and X.org projects currently reside on Freenode. While the X.org
foundation doesn't have any official policies on IRC hosting, because of
how frequently IRC is used by various projects in our community we on
the board decided to make a non-binding recommendation on an IRC network
we think would be good to move to. We're also looking at ways to provide
some resources to help channels move en masse. We hope this will enable
interested projects to migrate to the same new IRC network in order to
ensure they're all in the same place.

After considering Libera and OFTC as options, the board settled on
recommending OFTC. The primary reason for this is because OFTC is
associated with our parent foundation SPI, and has a long and well known
history of involvement with the open source community. As well, the
board believes OFTC's current Governance model is a lot more clear then
Libera's.

-- 
Sincerely,
   Lyude Paul (she/her)
   Software Engineer at Red Hat
   
Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've
asked me a question, are waiting for a review/merge on a patch, etc. and I
haven't responded in a while, please feel free to send me another email to check
on my status. I don't bite!



Re: [PATCH 0/7] component: Make into an aggregate bus

2021-05-20 Thread Saravana Kannan
On Wed, May 19, 2021 at 6:41 PM Stephen Boyd  wrote:
>
> Quoting Saravana Kannan (2021-05-19 18:27:50)
> > On Wed, May 19, 2021 at 5:25 PM Stephen Boyd  wrote:
> > >
> > > This series is from discussion we had on reordering the device lists for
> > > drm shutdown paths[1]. I've introduced an 'aggregate' bus that we put
> > > the aggregate device onto and then we probe the device once all the
> > > components are probed and call component_add(). The probe/remove hooks
> > > are where the bind/unbind calls go, and then a shutdown hook is added
> > > that can be used to shutdown the drm display pipeline at the right time.
> > >
> > > This works for me on my sc7180 board, but I'm currently struggling with
> > > the last patch where we migrate the msm driver. It runs into a runtime
> > > PM problem where the parent device isn't runtime PM enabled yet. I'm
> > > still trying to figure out a clean solution there. Moving runtime PM
> > > around breaks boot and I think that's because the power domain is off.
> > >
> > > Cc: Daniel Vetter 
> > > Cc: "Rafael J. Wysocki" 
> > > Cc: Rob Clark 
> > > Cc: Russell King 
> > > Cc: Saravana Kannan 
> > >
> > > [1] https://lore.kernel.org/r/20210508074118.1621729-1-swb...@chromium.org
> > >
> >
> > I skimmed through the series and in general the idea is good, but I'm
> > not sure why each component user needs to be converted/"modern" before
> > it can make use of the benefits of this series. Why not just have
> > wrapper functions around the component ops that the new aggregate bus
> > driver can just call? That'll give all the existing component users
> > the new ability to use the new ops without having to have two
> > versions.
>
> The existing users can only have one or the other. Either use the ops
> structure or use the struct aggregate_driver. What benefits of this
> series are they not gaining?

As I mentioned earlier, if we add device links between the aggregate
device (consumer) and all the component devices (suppliers), it'll
take care of a lot of the ordering issues (probe, suspend, runtime PM)
and dependency issues (unbind the master device if a component driver
unbinds). It'll allow us to delete a lot of the code in the component
framework too. I can send the patch for the device links once your
series settles. So having two implementations comes in the way of a
clean up and code improvement because we'll have to keep a lot of the
component code for the purpose of the "legacy" ops.

> > That'll also allow us to do other improvements (I have some
> > in mind) that'll apply to all the component users instead of only the
> > converted ones.
>
> What do you have in mind? I didn't want to convert drivers over to the
> new way of doing things without making them consciously change their
> code.

What ordering/behavior would you be changing with the new ops? If the
new shutdown ops isn't used, it really shouldn't change anything. Put
another way, if we ignore your msm driver changes, we should be able
to switch to having a real device for the "master" without making any
functional change. If you are causing any functional change with the
new ops, maybe you can key it off a flag that needs to be set? That
way, we'll have one API/ops but still be backward compatible if you
are worried about breaking existing users?

> Otherwise I worry it will break things in random, subtle ways. The
> last patch, as I mentioned above in the cover, causes warnings because
> the display driver is enabling runtime PM in an odd spot as part of the
> bind callback of the aggregate/master. That should move out of there and
> into the msm_pdev driver that registers the aggregate from what I can
> tell.

Can you give more context? I think if you create device links with
RPM_ACTIVE and PM_RUNTIME flags, it should ensure runtime PM
correctness.

-Saravana


Re: [Intel-gfx] [Mesa-dev] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Jason Ekstrand
On Thu, May 20, 2021 at 10:46 AM Matthew Brost  wrote:
>
> On Thu, May 20, 2021 at 01:11:59PM +0200, Christian König wrote:
> > Am 19.05.21 um 18:51 schrieb Matthew Brost:
> > > On Wed, May 19, 2021 at 01:45:39PM +0200, Christian König wrote:
> > > > Oh, yeah we call that gang submit on the AMD side.
> > > >
> > > > Had already some internal discussions how to implement this, but so far
> > > > couldn't figure out how to cleanly introduce that into the DRM 
> > > > scheduler.
> > > >
> > > > Can you briefly describe in a few words how that is supposed to work on 
> > > > the
> > > > Intel side?

On Intel, we actually have two cases which don't fit the current
drm/scheduler model well: balanced and bonded.

In the balanced model, we want to submit a batch which can go to any
one of some set of engines and we don't care which.  It's up to the
kernel to pick an engine.  Imagine you had 64 identical HW compute
queues, for instance.  This could be done by making all the identical
engines share a single drm_gpu_scheduler and round-robin around the HW
queues or something.  I don't know that we strictly need drm/scheduler
to be aware of it but it might be nice if it grew support for this
mode so we could maintain a 1:1 relationship between HW queues and
drm_gpu_schedulers.  That said, I'm not sure how this would play with
GuC queues so maybe it doesn't help?

The bonded model is like your ganged, I think.  We want to submit N
batches to run in parallel.  And they actually have to be executing on
the GPU simultaneously and not just sort-of at similar times.  We need
this for video.  There are also potential use-cases in Vulkan or even
GL that might be able to use this.  One difference with the balanced
mode is that bonds don't, strictly speaking, need to be on the same
type of engine.  Imagine, for instance, a 3D batch with a parallel
compute batch doing vertex pre-processing.

I'm pretty sure the bonded case is something that the mobile drivers
(panfrost, etc.) would like as well for doing Vulkan on tilers where
you often have to have two command buffers running in parallel.
They're currently doing it by submitting a giant pile of batches where
they split the batch and add sync primitives every time some GL call
requires them to sync between fragment and vertex pipes.

So, to sum up, I think there's likely some good collaboration to be
had here for everyone. :-)

--Jason

> > > Sure, I've done a quick PoC internally and have been able to hook this
> > > into the DRM scheduler.
> > >
> > > Basically each BB still maps to a single job as each job is somewhat
> > > unique (e.g. each job has its own ring, lrc, seqno, etc...). However all
> > > the jobs configured to run in parallel map to a single sched_entity
> > > which maintains the order each job was generated from the execbuf IOCTL
> > > (1 - N). When the backend receives jobs 1 to N - 1 it basically just
> > > updates some internal state. When the backend sees job N (last job) it
> > > actually does the submit for jobs 1 - N which with GuC submission is a
> > > simple command moving the LRC tail of the N jobs.
> > >
> > > Daniel has suggested that we create a single job for the NN BBs but that
> > > would be huge rework to the internals of the i915 and likely won't
> > > happen by the time this code first lands.
> > >
> > > Also worth noting one way a job isn't really a treated individually is
> > > the excl slot with dma-resv. In that case we create a composite fence of
> > > all jobs (dma_fence_array).
> >
> > Yeah, that's something we have discussed as well.
> >
> > How do you prevent the scheduler from over committing to a single ring
> > buffer in this scenario?
> >
>
> Each job has its own ring, the execbuf IOCTL throttles itself for each
> job if there isn't space in the ring. This is exactly the same as
> non-parallel submits.
>
> I think this is what you were asking? If not, maybe try explaining the
> question a bit more.
>
> Matt
>
> > Christian.
> >
> > >
> > > Matt
> > >
> > > > Thanks,
> > > > Christian.
> > > >
> > > > Am 19.05.21 um 01:58 schrieb Matthew Brost:
> > > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > >
> > > > > v2:
> > > > >(Daniel Vetter):
> > > > > - Expand logical order explaination
> > > > > - Add dummy header
> > > > > - Only allow N BBs in execbuf IOCTL
> > > > > - Configure parallel submission per slot not per gem context
> > > > >
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Tony Ye 
> > > > > CC: Carl Zhang 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Jason Ekstrand 
> > > > > Signed-off-by: Matthew Brost 
> > > > > ---
> > > > >Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > > ++
> > > > >Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > > >2 files changed, 196 insertions(+), 1 deletion(-)
> > > > >create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > >
> > > > > diff --git 

Re: (subset) [PATCH 1/3] gpu: drm: replace occurrences of invalid character

2021-05-20 Thread Mark Brown
On Wed, 19 May 2021 10:15:35 +0200, Mauro Carvalho Chehab wrote:
> There are some places at drm that ended receiving a
> REPLACEMENT CHARACTER U+fffd ('�'), probably because of
> some bad charset conversion.
> 
> Fix them by using what it seems   to be the proper
> character.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[2/3] spi: fix some invalid char occurrences
  commit: 6328caf043208556e782a53a284c9acfcf6be3b0

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH v8 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-05-20 Thread Liam Howlett
* Alistair Popple  [210519 08:38]:
> On Wednesday, 19 May 2021 6:04:51 AM AEST Liam Howlett wrote:
> > External email: Use caution opening links or attachments
> > 
> > * Alistair Popple  [210407 04:43]:
> > > The behaviour of try_to_unmap_one() is difficult to follow because it
> > > performs different operations based on a fairly large set of flags used
> > > in different combinations.
> > > 
> > > TTU_MUNLOCK is one such flag. However it is exclusively used by
> > > try_to_munlock() which specifies no other flags. Therefore rather than
> > > overload try_to_unmap_one() with unrelated behaviour split this out into
> > > it's own function and remove the flag.
> > > 
> > > Signed-off-by: Alistair Popple 
> > > Reviewed-by: Ralph Campbell 
> > > Reviewed-by: Christoph Hellwig 
> > > 
> > > ---
> > > 
> > > v8:
> > > * Renamed try_to_munlock to page_mlock to better reflect what the
> > > 
> > >   function actually does.
> > > 
> > > * Removed the TODO from the documentation that this patch addresses.
> > > 
> > > v7:
> > > * Added Christoph's Reviewed-by
> > > 
> > > v4:
> > > * Removed redundant check for VM_LOCKED
> > > ---
> > > 
> > >  Documentation/vm/unevictable-lru.rst | 33 ---
> > >  include/linux/rmap.h |  3 +-
> > >  mm/mlock.c   | 10 +++---
> > >  mm/rmap.c| 48 +---
> > >  4 files changed, 55 insertions(+), 39 deletions(-)
> > > 
> > > diff --git a/Documentation/vm/unevictable-lru.rst
> > > b/Documentation/vm/unevictable-lru.rst index 0e1490524f53..eae3af17f2d9
> > > 100644
> > > --- a/Documentation/vm/unevictable-lru.rst
> > > +++ b/Documentation/vm/unevictable-lru.rst
> > > @@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone
> > > statistics for the number of> 
> > >  mlocked pages.  Note, however, that at this point we haven't checked
> > >  whether the page is mapped by other VM_LOCKED VMAs.
> > > 
> > > -We can't call try_to_munlock(), the function that walks the reverse map
> > > to
> > > +We can't call page_mlock(), the function that walks the reverse map to
> > > 
> > >  check for other VM_LOCKED VMAs, without first isolating the page from the
> > >  LRU.> 
> > > -try_to_munlock() is a variant of try_to_unmap() and thus requires that
> > > the page +page_mlock() is a variant of try_to_unmap() and thus requires
> > > that the page> 
> > >  not be on an LRU list [more on these below].  However, the call to
> > > 
> > > -isolate_lru_page() could fail, in which case we couldn't
> > > try_to_munlock().  So, +isolate_lru_page() could fail, in which case we
> > > can't call page_mlock().  So,> 
> > >  we go ahead and clear PG_mlocked up front, as this might be the only
> > >  chance we> 
> > > -have.  If we can successfully isolate the page, we go ahead and
> > > -try_to_munlock(), which will restore the PG_mlocked flag and update the
> > > zone +have.  If we can successfully isolate the page, we go ahead and
> > > call +page_mlock(), which will restore the PG_mlocked flag and update the
> > > zone> 
> > >  page statistics if it finds another VMA holding the page mlocked.  If we
> > >  fail to isolate the page, we'll have left a potentially mlocked page on
> > >  the LRU. This is fine, because we'll catch it later if and if vmscan
> > >  tries to reclaim> 
> > > @@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown
> > > (munlock_vma_pages_all), reclaim,> 
> > >  holepunching, and truncation of file pages and their anonymous COWed
> > >  pages.
> > > 
> > > -try_to_munlock() Reverse Map Scan
> > > +page_mlock() Reverse Map Scan
> > > 
> > >  -
> > > 
> > > -.. warning::
> > > -   [!] TODO/FIXME: a better name might be page_mlocked() - analogous to
> > > the -   page_referenced() reverse map walker.
> > > -
> > > 
> > >  When munlock_vma_page() [see section :ref:`munlock()/munlockall() System
> > >  Call Handling ` above] tries to munlock a
> > >  page, it needs to determine whether or not the page is mapped by any
> > >  VM_LOCKED VMA without actually attempting to unmap all PTEs from the
> > >  page.  For this purpose, the unevictable/mlock infrastructure
> > > 
> > > -introduced a variant of try_to_unmap() called try_to_munlock().
> > > +introduced a variant of try_to_unmap() called page_mlock().
> > > 
> > > -try_to_munlock() calls the same functions as try_to_unmap() for anonymous
> > > and -mapped file and KSM pages with a flag argument specifying unlock
> > > versus unmap -processing.  Again, these functions walk the respective
> > > reverse maps looking -for VM_LOCKED VMAs.  When such a VMA is found, as
> > > in the try_to_unmap() case, -the functions mlock the page via
> > > mlock_vma_page() and return SWAP_MLOCK.  This -undoes the pre-clearing of
> > > the page's PG_mlocked done by munlock_vma_page. +page_mlock() walks the
> > > respective reverse maps looking for VM_LOCKED VMAs. When +such a VMA is
> > > found the page 

Re: [PATCH] drm/ttm: Explain why ttm_bo_add_move_fence uses a shared slot

2021-05-20 Thread Daniel Vetter
On Wed, May 19, 2021 at 12:43:49PM +0200, Christian König wrote:
> Am 19.05.21 um 10:24 schrieb Daniel Vetter:
> > Motivated because I got confused and Christian confirmed why this
> > works. I think this is non-obvious enough that it merits a slightly
> > longer comment.
> > 
> > Cc: Christian König 
> > Cc: Christian Koenig 
> > Cc: Huang Rui 
> > Cc: Thomas Hellström 
> > Signed-off-by: Daniel Vetter 
> 
> Reviewed-by: Christian König 

Applied to drm-misc-next, thanks for reviewing.
-Daniel

> 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index ca1b098b6a56..51a94fd63bd7 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -682,7 +682,9 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
> >   }
> >   /*
> > - * Add the last move fence to the BO and reserve a new shared slot.
> > + * Add the last move fence to the BO and reserve a new shared slot. We 
> > only use
> > + * a shared slot to avoid unecessary sync and rely on the subsequent bo 
> > move to
> > + * either stall or use an exclusive fence respectively set bo->moving.
> >*/
> >   static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
> >  struct ttm_resource_manager *man,
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/meson: fix shutdown crash when component not probed

2021-05-20 Thread Martin Blumenstingl
Hi Neil,

since this has not received any Reviewed-by yet I tried my best to
review it myself

On Fri, Apr 30, 2021 at 10:28 AM Neil Armstrong  wrote:
[...]
> --- a/drivers/gpu/drm/meson/meson_drv.c
> +++ b/drivers/gpu/drm/meson/meson_drv.c
> @@ -485,11 +485,12 @@ static int meson_probe_remote(struct platform_device 
> *pdev,
>  static void meson_drv_shutdown(struct platform_device *pdev)
>  {
> struct meson_drm *priv = dev_get_drvdata(>dev);
this part made it hard for me because I was wondering where the
matching dev_set_drvdata call is
it turns out platform_set_drvdata is used instead, meaning for me it
would have been easier to understand if platform_get_drvdata was used
here
that's however nothing which has changed with this patch

> -   struct drm_device *drm = priv->drm;
>
> -   DRM_DEBUG_DRIVER("\n");
> -   drm_kms_helper_poll_fini(drm);
> -   drm_atomic_helper_shutdown(drm);
> +   if (!priv)
> +   return;
> +
> +   drm_kms_helper_poll_fini(priv->drm);
> +   drm_atomic_helper_shutdown(priv->drm);
>  }
then this part finally made sense to me (as non-drm person), as
platform_set_drvdata comes near the end of meson_drv_bind_master (so
any errors would cause the drvdata to be NULL).

with this I can also give me:
Reviewed-by: Martin Blumenstingl 
in addition to my:
Tested-by: Martin Blumenstingl 

Can you please queue this up for -fixes or do we need to ask someone to do it?


Best regards,
Martin


Re: [PATCH 0/7] component: Make into an aggregate bus

2021-05-20 Thread Daniel Vetter
On Wed, May 19, 2021 at 09:41:27PM -0400, Stephen Boyd wrote:
> Quoting Saravana Kannan (2021-05-19 18:27:50)
> > On Wed, May 19, 2021 at 5:25 PM Stephen Boyd  wrote:
> > >
> > > This series is from discussion we had on reordering the device lists for
> > > drm shutdown paths[1]. I've introduced an 'aggregate' bus that we put
> > > the aggregate device onto and then we probe the device once all the
> > > components are probed and call component_add(). The probe/remove hooks
> > > are where the bind/unbind calls go, and then a shutdown hook is added
> > > that can be used to shutdown the drm display pipeline at the right time.
> > >
> > > This works for me on my sc7180 board, but I'm currently struggling with
> > > the last patch where we migrate the msm driver. It runs into a runtime
> > > PM problem where the parent device isn't runtime PM enabled yet. I'm
> > > still trying to figure out a clean solution there. Moving runtime PM
> > > around breaks boot and I think that's because the power domain is off.
> > >
> > > Cc: Daniel Vetter 
> > > Cc: "Rafael J. Wysocki" 
> > > Cc: Rob Clark 
> > > Cc: Russell King 
> > > Cc: Saravana Kannan 
> > >
> > > [1] https://lore.kernel.org/r/20210508074118.1621729-1-swb...@chromium.org
> > >
> >
> > I skimmed through the series and in general the idea is good, but I'm
> > not sure why each component user needs to be converted/"modern" before
> > it can make use of the benefits of this series. Why not just have
> > wrapper functions around the component ops that the new aggregate bus
> > driver can just call? That'll give all the existing component users
> > the new ability to use the new ops without having to have two
> > versions.
> 
> The existing users can only have one or the other. Either use the ops
> structure or use the struct aggregate_driver. What benefits of this
> series are they not gaining?
> 
> > That'll also allow us to do other improvements (I have some
> > in mind) that'll apply to all the component users instead of only the
> > converted ones.
> 
> What do you have in mind? I didn't want to convert drivers over to the
> new way of doing things without making them consciously change their
> code. Otherwise I worry it will break things in random, subtle ways. The
> last patch, as I mentioned above in the cover, causes warnings because
> the display driver is enabling runtime PM in an odd spot as part of the
> bind callback of the aggregate/master. That should move out of there and
> into the msm_pdev driver that registers the aggregate from what I can
> tell.

Hm yeah that's annoying. Another thing to check is that there's no locking
issues with lockdep enabled. But there's plenty of other places that
register/bind drivers within other drivers, so it should all work.

I think this is a good reason why more drivers should be converted (in
separate patches) so that we get a lot more testing and can find bugs in
the design.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


RE: [PATCH 09/38] drm/sti/sti_hqvdp: Fix incorrectly named function 'sti_hqvdp_vtg_cb()'

2021-05-20 Thread Fabien DESSENNE
Hi Lee

Thank you for the patch

BR
Fabien


ST Restricted

> -Original Message-
> From: Lee Jones 
> Sent: jeudi 20 mai 2021 14:02
> To: lee.jo...@linaro.org
> Cc: linux-ker...@vger.kernel.org; Benjamin Gaignard
> ; David Airlie ; Daniel Vetter
> ; Philipp Zabel ; Fabien DESSENNE
> ; dri-devel@lists.freedesktop.org
> Subject: [PATCH 09/38] drm/sti/sti_hqvdp: Fix incorrectly named function
> 'sti_hqvdp_vtg_cb()'
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/gpu/drm/sti/sti_hqvdp.c:796: warning: expecting prototype for
> sti_vdp_vtg_cb(). Prototype was for sti_hqvdp_vtg_cb() instead
> 
> Cc: Benjamin Gaignard 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Philipp Zabel 
> Cc: Fabien Dessenne 
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
Reviewed-by: Fabien Dessenne 

> ---
>  drivers/gpu/drm/sti/sti_hqvdp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/sti/sti_hqvdp.c b/drivers/gpu/drm/sti/sti_hqvdp.c
> index edbb99f53de19..d09b08995b12a 100644
> --- a/drivers/gpu/drm/sti/sti_hqvdp.c
> +++ b/drivers/gpu/drm/sti/sti_hqvdp.c
> @@ -782,7 +782,7 @@ static void sti_hqvdp_disable(struct sti_hqvdp *hqvdp)  }
> 
>  /**
> - * sti_vdp_vtg_cb
> + * sti_hqvdp_vtg_cb
>   * @nb: notifier block
>   * @evt: event message
>   * @data: private data
> --
> 2.31.1


Re: [PATCH 00/10] Documentation build warning fixes

2021-05-20 Thread Jonathan Corbet
Mauro Carvalho Chehab  writes:

> Hi Jon,
>
> This small series contain a series of fixes for the documentation. it is
> against your docs-next branch.
>
> Three of the patches fix duplicated symbols at the ABI documents.
> There are still some ABI warnings from IIO, but all but one were
> already fixed at linux-next. So, hopefully, after having everything
> merged, the ABI warnings will be solved.
>
> Mauro Carvalho Chehab (10):
>   docs: update sysfs-platform_profile.rst reference
>   docs: vcpu-requests.rst: fix reference for atomic ops
>   docs: translations/zh_CN: fix a typo at 8.Conclusion.rst
>   docs: sched-bwc.rst: fix a typo on a doc name
>   docs: update pin-control.rst references
>   docs: virt: api.rst: fix a pointer to SGX documentation
>   docs: ABI: iommu: remove duplicated definition for
> sysfs-kernel-iommu_groups
>   docs: ABI: sysfs-class-backlight: unify ambient light zone nodes
>   docs: ABI: sysfs-class-led-trigger-pattern: remove repeat duplication
>   iio: documentation: fix a typo

Seems like good stuff.  The last patch in the series, though, adds a
warning:

  Documentation/ABI/testing/sysfs-bus-iio:799: WARNING: Inline emphasis 
start-string without end-string.

So I left that one out and applied the rest.

Thanks,

jon


Re: [PATCH 7/7] drm/msm: Migrate to aggregate driver

2021-05-20 Thread Daniel Vetter
On Wed, May 19, 2021 at 05:25:19PM -0700, Stephen Boyd wrote:
> The device lists are poorly ordered when the component device code is
> used. This is because component_master_add_with_match() returns 0
> regardless of component devices calling component_add() first. It can
> really only fail if an allocation fails, in which case everything is
> going bad and we're out of memory. The driver that registers the
> aggregate driver, can succeed at probe and put the attached device on
> the DPM lists before any of the component devices are probed and put on
> the lists.
> 
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.
> 
> Normally, device links or probe defer would reorder the lists and put
> devices that depend on other devices in the lists at the correct
> location, but with component devices this doesn't happen because this
> information isn't expressed anywhere. Drivers simply succeed at
> registering their component or the aggregate driver with the component
> framework and wait for their bind() callback to be called once the other
> components are ready. In summary, the drivers that make up the aggregate
> driver can probe in any order.
> 
> This ordering problem becomes fairly obvious when shutting down the
> device with a DSI controller connected to a DSI bridge that is
> controlled via i2c. In this case, the msm display driver wants to tear
> down the display pipeline on shutdown via msm_pdev_shutdown() by calling
> drm_atomic_helper_shutdown(), and it can't do that unless the whole
> display chain is still probed and active in the system. When a display
> bridge is on i2c, the i2c device for the bridge will be created whenever
> the i2c controller probes, which could be before or after the msm
> display driver probes. If the i2c controller probes after the display
> driver, then the i2c controller will be shutdown before the display
> controller during system wide shutdown and thus i2c transactions will
> stop working before the display pipeline is shut down. This means we'll
> have the display bridge trying to access an i2c bus that's shut down
> because drm_atomic_helper_shutdown() is trying to disable the bridge
> after the bridge is off.
> 
> The solution is to make the aggregate driver into a real struct driver
> that is bound to a device when the other component devices have all
> probed. Now that the component driver code is a proper bus, we can
> simply register an aggregate driver with that bus via
> component_aggregate_register() and then attach the shutdown hook to that
> driver to be sure that the shutdown for the display pipeline is called
> before any of the component device driver shutdown hooks are called.
> 
> Cc: Daniel Vetter 
> Cc: "Rafael J. Wysocki" 
> Cc: Rob Clark 
> Cc: Russell King 
> Cc: Saravana Kannan 
> Signed-off-by: Stephen Boyd 
> ---
> 
> As stated in the cover letter, this isn't perfect but it still works. I
> get a warning from runtime PM that the parent device (e0.mdss) is
> not runtime PM enabled but the child device (the aggregate device) is
> being enabled by the bus logic. I need to move around the place that the
> parent device is runtime PM enabled and probably keep it powered up
> during the entire time that the driver is probed until the aggregate
> driver probes.
> 
>  drivers/gpu/drm/msm/msm_drv.c | 47 +++
>  1 file changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index e1104d2454e2..0c64e6a2ce25 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -1265,19 +1265,35 @@ static int add_gpu_components(struct device *dev,
>   return 0;
>  }
>  
> -static int msm_drm_bind(struct device *dev)
> +static int msm_drm_bind(struct aggregate_device *adev)
>  {
> - return msm_drm_init(dev, _driver);
> + return msm_drm_init(adev->dev.parent, _driver);
>  }
>  
> -static void msm_drm_unbind(struct device *dev)
> +static void msm_drm_unbind(struct aggregate_device *adev)
>  {
> - msm_drm_uninit(dev);
> + msm_drm_uninit(adev->dev.parent);
> +}
> +
> +static void msm_drm_shutdown(struct aggregate_device *adev)
> +{
> + struct drm_device *drm = 
> platform_get_drvdata(to_platform_device(adev->dev.parent));
> + struct msm_drm_private *priv = drm ? drm->dev_private : NULL;
> +
> + if (!priv || !priv->kms)
> + return;
> +
> + drm_atomic_helper_shutdown(drm);
>  }
>  
> -static const 

Re: [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 08:10:59AM -0700, Matthew Brost wrote:
> On Thu, May 20, 2021 at 11:54:25AM +0200, Daniel Vetter wrote:
> > On Wed, May 19, 2021 at 7:19 PM Matthew Brost  
> > wrote:
> > >
> > > On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > >
> > > > > v2:
> > > > >  (Daniel Vetter):
> > > > >   - Expand logical order explaination
> > > > >   - Add dummy header
> > > > >   - Only allow N BBs in execbuf IOCTL
> > > > >   - Configure parallel submission per slot not per gem context
> > > > >
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Tony Ye 
> > > > > CC: Carl Zhang 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Jason Ekstrand 
> > > > > Signed-off-by: Matthew Brost 
> > > > > ---
> > > > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > > ++
> > > > >  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > > >  2 files changed, 196 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > >
> > > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > new file mode 100644
> > > > > index ..8c64b983ccad
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > @@ -0,0 +1,144 @@
> > > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > > i915_context_engines_parallel_submit */
> > > > > +
> > > > > +/*
> > > > > + * i915_context_engines_parallel_submit:
> > > > > + *
> > > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > > execbuf IOCTL.
> > > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > > Multiple
> > > > > + * hardware contexts are created internally in the i915 run these 
> > > > > BBs. Once a
> > > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > > execbuf
> > > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell 
> > > > > the execbuf
> > > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > > are based on
> > > > > + * the slots configuration).
> > > > > + *
> > > > > + * Their are two currently defined ways to control the placement of 
> > > > > the
> > > > > + * hardware contexts on physical engines: default behavior (no 
> > > > > flags) and
> > > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > > in the
> > > > > + * future as new hardware / use cases arise. Details of how to use 
> > > > > this
> > > > > + * interface below above the flags.
> > > > > + *
> > > > > + * Returns -EINVAL if hardware context placement configuration 
> > > > > invalid or if the
> > > > > + * placement configuration isn't supported on the platform / 
> > > > > submission
> > > > > + * interface.
> > > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > > submission
> > > > > + * inteface.
> > > > > + */
> > > > > +struct i915_context_engines_parallel_submit {
> > > > > +   struct i915_user_extension base;
> > > > > +
> > > > > +   __u16 engine_index; /* slot for parallel engine */
> > > > > +   __u16 width;/* number of contexts per parallel engine 
> > > > > */
> > > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > > +   __u16 mbz16;
> > > >
> > > > Ok the big picture looks reasonable now, the flags still confuse me.
> > > >
> > >
> > > Yea, it is a bit confusing.
> > >
> > > > > +/*
> > > > > + * Default placement behvavior (currently unsupported):
> > > > > + *
> > > > > + * Rather than restricting parallel submission to a single class 
> > > > > with a
> > > > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add 
> > > > > a mode that
> > > > > + * enables parallel submission across multiple engine classes. In 
> > > > > this case each
> > > > > + * context's logical engine mask indicates where that context can 
> > > > > placed. It is
> > > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > > placement (e.g.
> > > > > + * if one context is running CS0 no other contexts can run on CS0).
> > > > > + *
> > > > > + * Example 1 pseudo code:
> > > > > + * CSX[Y] = engine class X, logical instance Y
> > > > > + * INVALID = I915_ENGINE_CLASS_INVALID, 
> > > > > I915_ENGINE_CLASS_INVALID_NONE
> > > > > + * set_engines(INVALID)
> > > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > > > + *
> > > > > + * Results in the following valid placements:
> > > > > + * CS0[0], CS1[0]
> > > > > + * CS0[0], CS1[1]
> > > > > + * CS0[1], CS1[0]
> > > > > + * CS0[1], CS1[1]
> > > > > + *
> > > > > + * This can also be though of as 2 virtual engines:
> > > > > + * VE[0] 

Re: [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 11:57:44AM +0100, Tvrtko Ursulin wrote:
> 
> On 20/05/2021 10:54, Daniel Vetter wrote:
> > On Wed, May 19, 2021 at 7:19 PM Matthew Brost  
> > wrote:
> > > 
> > > On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > > 
> > > > > v2:
> > > > >   (Daniel Vetter):
> > > > >- Expand logical order explaination
> > > > >- Add dummy header
> > > > >- Only allow N BBs in execbuf IOCTL
> > > > >- Configure parallel submission per slot not per gem context
> > > > > 
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Tony Ye 
> > > > > CC: Carl Zhang 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Jason Ekstrand 
> > > > > Signed-off-by: Matthew Brost 
> > > > > ---
> > > > >   Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > > ++
> > > > >   Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > > >   2 files changed, 196 insertions(+), 1 deletion(-)
> > > > >   create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > 
> > > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > new file mode 100644
> > > > > index ..8c64b983ccad
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > @@ -0,0 +1,144 @@
> > > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > > i915_context_engines_parallel_submit */
> > > > > +
> > > > > +/*
> > > > > + * i915_context_engines_parallel_submit:
> > > > > + *
> > > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > > execbuf IOCTL.
> > > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > > Multiple
> > > > > + * hardware contexts are created internally in the i915 run these 
> > > > > BBs. Once a
> > > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > > execbuf
> > > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell 
> > > > > the execbuf
> > > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > > are based on
> > > > > + * the slots configuration).
> > > > > + *
> > > > > + * Their are two currently defined ways to control the placement of 
> > > > > the
> > > > > + * hardware contexts on physical engines: default behavior (no 
> > > > > flags) and
> > > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > > in the
> > > > > + * future as new hardware / use cases arise. Details of how to use 
> > > > > this
> > > > > + * interface below above the flags.
> > > > > + *
> > > > > + * Returns -EINVAL if hardware context placement configuration 
> > > > > invalid or if the
> > > > > + * placement configuration isn't supported on the platform / 
> > > > > submission
> > > > > + * interface.
> > > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > > submission
> > > > > + * inteface.
> > > > > + */
> > > > > +struct i915_context_engines_parallel_submit {
> > > > > +   struct i915_user_extension base;
> > > > > +
> > > > > +   __u16 engine_index; /* slot for parallel engine */
> > > > > +   __u16 width;/* number of contexts per parallel engine 
> > > > > */
> > > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > > +   __u16 mbz16;
> > > > 
> > > > Ok the big picture looks reasonable now, the flags still confuse me.
> > > > 
> > > 
> > > Yea, it is a bit confusing.
> > > 
> > > > > +/*
> > > > > + * Default placement behvavior (currently unsupported):
> > > > > + *
> > > > > + * Rather than restricting parallel submission to a single class 
> > > > > with a
> > > > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add 
> > > > > a mode that
> > > > > + * enables parallel submission across multiple engine classes. In 
> > > > > this case each
> > > > > + * context's logical engine mask indicates where that context can 
> > > > > placed. It is
> > > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > > placement (e.g.
> > > > > + * if one context is running CS0 no other contexts can run on CS0).
> > > > > + *
> > > > > + * Example 1 pseudo code:
> > > > > + * CSX[Y] = engine class X, logical instance Y
> > > > > + * INVALID = I915_ENGINE_CLASS_INVALID, 
> > > > > I915_ENGINE_CLASS_INVALID_NONE
> > > > > + * set_engines(INVALID)
> > > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > > > + *
> > > > > + * Results in the following valid placements:
> > > > > + * CS0[0], CS1[0]
> > > > > + * CS0[0], CS1[1]
> > > > > + * CS0[1], CS1[0]
> > > > > + * CS0[1], CS1[1]
> > > > > + *
> > > > > + * This can also be though of as 2 virtual engines:
> > > > > + * VE[0] 

Re: [PATCH -next] drm/amdgpu: fix unused-but-set-variable warnings

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 9:32 AM Wei Yongjun  wrote:
>
> GCC reports the following warnings with W=1:
>
> drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c:190:22: warning:
>  variable 'ring' set but not used [-Wunused-but-set-variable]
>   190 |  struct amdgpu_ring *ring;
>   |  ^~~~
> drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c:162:22: warning:
>  variable 'ring' set but not used [-Wunused-but-set-variable]
>   162 |  struct amdgpu_ring *ring;
>   |  ^~~~
> drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:383:22: warning:
>  variable 'ring' set but not used [-Wunused-but-set-variable]
>   383 |  struct amdgpu_ring *ring;
>   |  ^~~~
>
> Those variables are not really used, so remove them
> to fix the warnings.
>
> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 2 --
>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 2 --
>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c  | 3 ---
>  3 file changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c 
> b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
> index 938ef4ce5b76..af6f45c3f6fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
> +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c
> @@ -187,14 +187,12 @@ static int jpeg_v2_5_hw_init(void *handle)
>  static int jpeg_v2_5_hw_fini(void *handle)
>  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -   struct amdgpu_ring *ring;
> int i;
>
> for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) {
> if (adev->jpeg.harvest_config & (1 << i))
> continue;
>
> -   ring = >jpeg.inst[i].ring_dec;
> if (adev->jpeg.cur_state != AMD_PG_STATE_GATE &&
>   RREG32_SOC15(JPEG, i, mmUVD_JRBC_STATUS))
> jpeg_v2_5_set_powergating_state(adev, 
> AMD_PG_STATE_GATE);
> diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
> index 94be35357f7d..b4d53d1a6123 100644
> --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
> @@ -159,9 +159,7 @@ static int jpeg_v3_0_hw_init(void *handle)
>  static int jpeg_v3_0_hw_fini(void *handle)
>  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -   struct amdgpu_ring *ring;
>
> -   ring = >jpeg.inst->ring_dec;
> if (adev->jpeg.cur_state != AMD_PG_STATE_GATE &&
>   RREG32_SOC15(JPEG, 0, mmUVD_JRBC_STATUS))
> jpeg_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> index 946335d0f19c..d60358767d10 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> @@ -380,15 +380,12 @@ static int vcn_v3_0_hw_init(void *handle)
>  static int vcn_v3_0_hw_fini(void *handle)
>  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -   struct amdgpu_ring *ring;
> int i;
>
> for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
> if (adev->vcn.harvest_config & (1 << i))
> continue;
>
> -   ring = >vcn.inst[i].ring_dec;
> -
> if (!amdgpu_sriov_vf(adev)) {
> if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> (adev->vcn.cur_state != 
> AMD_PG_STATE_GATE &&


Re: [PATCH 34/38] drm/amd/amdgpu/amdgpu_vce: Fix a few incorrectly named functions

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:04 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:98: warning: expecting prototype for 
> amdgpu_vce_init(). Prototype was for amdgpu_vce_sw_init() instead
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:214: warning: expecting prototype 
> for amdgpu_vce_fini(). Prototype was for amdgpu_vce_sw_fini() instead
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:590: warning: expecting prototype 
> for amdgpu_vce_cs_validate_bo(). Prototype was for amdgpu_vce_validate_bo() 
> instead
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:724: warning: expecting prototype 
> for amdgpu_vce_cs_parse(). Prototype was for amdgpu_vce_ring_parse_cs() 
> instead
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:960: warning: expecting prototype 
> for amdgpu_vce_cs_parse_vm(). Prototype was for amdgpu_vce_ring_parse_cs_vm() 
> instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> index ea6a62f67e380..7ad83da613edd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
> @@ -87,7 +87,7 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring 
> *ring, uint32_t handle,
>   bool direct, struct dma_fence **fence);
>
>  /**
> - * amdgpu_vce_init - allocate memory, load vce firmware
> + * amdgpu_vce_sw_init - allocate memory, load vce firmware
>   *
>   * @adev: amdgpu_device pointer
>   * @size: size for the new BO
> @@ -204,7 +204,7 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, 
> unsigned long size)
>  }
>
>  /**
> - * amdgpu_vce_fini - free memory
> + * amdgpu_vce_sw_fini - free memory
>   *
>   * @adev: amdgpu_device pointer
>   *
> @@ -574,7 +574,7 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring 
> *ring, uint32_t handle,
>  }
>
>  /**
> - * amdgpu_vce_cs_validate_bo - make sure not to cross 4GB boundary
> + * amdgpu_vce_validate_bo - make sure not to cross 4GB boundary
>   *
>   * @p: parser context
>   * @ib_idx: indirect buffer to use
> @@ -715,7 +715,7 @@ static int amdgpu_vce_validate_handle(struct 
> amdgpu_cs_parser *p,
>  }
>
>  /**
> - * amdgpu_vce_cs_parse - parse and validate the command stream
> + * amdgpu_vce_ring_parse_cs - parse and validate the command stream
>   *
>   * @p: parser context
>   * @ib_idx: indirect buffer to use
> @@ -951,7 +951,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, 
> uint32_t ib_idx)
>  }
>
>  /**
> - * amdgpu_vce_cs_parse_vm - parse the command stream in VM mode
> + * amdgpu_vce_ring_parse_cs_vm - parse the command stream in VM mode
>   *
>   * @p: parser context
>   * @ib_idx: indirect buffer to use
> --
> 2.31.1
>


RE: [PATCH 07/38] drm/sti/sti_hda: Provide missing function names

2021-05-20 Thread Fabien DESSENNE
Hi Lee

Thank you for the patch

BR
Fabien


ST Restricted

> -Original Message-
> From: Lee Jones 
> Sent: jeudi 20 mai 2021 14:02
> To: lee.jo...@linaro.org
> Cc: linux-ker...@vger.kernel.org; Benjamin Gaignard
> ; David Airlie ; Daniel Vetter
> ; Fabien DESSENNE ; dri-
> de...@lists.freedesktop.org
> Subject: [PATCH 07/38] drm/sti/sti_hda: Provide missing function names
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/gpu/drm/sti/sti_hda.c:283: warning: expecting prototype for Search 
> for
> a video mode in the supported modes table(). Prototype was for
> hda_get_mode_idx() instead
>  drivers/gpu/drm/sti/sti_hda.c:301: warning: expecting prototype for Enable 
> the
> HD DACS(). Prototype was for hda_enable_hd_dacs() instead
>  drivers/gpu/drm/sti/sti_hda.c:383: warning: This comment starts with '/**', 
> but
> isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
> 
> Cc: Benjamin Gaignard 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Fabien Dessenne 
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
Reviewed-by: Fabien Dessenne 

> ---
>  drivers/gpu/drm/sti/sti_hda.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/sti/sti_hda.c b/drivers/gpu/drm/sti/sti_hda.c 
> index
> 5c2b650b561d5..03f3377f918c0 100644
> --- a/drivers/gpu/drm/sti/sti_hda.c
> +++ b/drivers/gpu/drm/sti/sti_hda.c
> @@ -272,7 +272,7 @@ static void hda_write(struct sti_hda *hda, u32 val, int
> offset)  }
> 
>  /**
> - * Search for a video mode in the supported modes table
> + * hda_get_mode_idx - Search for a video mode in the supported modes
> + table
>   *
>   * @mode: mode being searched
>   * @idx: index of the found mode
> @@ -292,7 +292,7 @@ static bool hda_get_mode_idx(struct
> drm_display_mode mode, int *idx)  }
> 
>  /**
> - * Enable the HD DACS
> + * hda_enable_hd_dacs - Enable the HD DACS
>   *
>   * @hda: pointer to HD analog structure
>   * @enable: true if HD DACS need to be enabled, else false @@ -380,7 +380,7
> @@ static void hda_debugfs_init(struct sti_hda *hda, struct drm_minor *minor)
> }
> 
>  /**
> - * Configure AWG, writing instructions
> + * sti_hda_configure_awg - Configure AWG, writing instructions
>   *
>   * @hda: pointer to HD analog structure
>   * @awg_instr: pointer to AWG instructions table
> --
> 2.31.1


Re: [PATCH 38/38] drm/amd/amdgpu/smuio_v13_0: Realign 'smuio_v13_0_is_host_gpu_xgmi_supported()' header

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c:99: warning: expecting prototype 
> for smuio_v13_0_supports_host_gpu_xgmi(). Prototype was for 
> smuio_v13_0_is_host_gpu_xgmi_supported() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Hawking Zhang 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c 
> b/drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c
> index 3c47c94846d6d..39b7c206770f6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/smuio_v13_0.c
> @@ -106,7 +106,7 @@ static u32 smuio_v13_0_get_socket_id(struct amdgpu_device 
> *adev)
>  }
>
>  /**
> - * smuio_v13_0_supports_host_gpu_xgmi - detect xgmi interface between cpu 
> and gpu/s.
> + * smuio_v13_0_is_host_gpu_xgmi_supported - detect xgmi interface between 
> cpu and gpu/s.
>   *
>   * @adev: amdgpu device pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 37/38] drm/amd/amdgpu/gfx_v10_0: Demote kernel-doc abuse

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:04 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:51: warning: This comment starts with 
> '/**', but isn't a kernel-doc comment. Refer 
> Documentation/doc-guide/kernel-doc.rst
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index fc12e3c3e9cae..c833be31e4ae6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -47,7 +47,7 @@
>  #include "gfx_v10_0.h"
>  #include "nbio_v2_3.h"
>
> -/**
> +/*
>   * Navi10 has two graphic rings to share each graphic pipe.
>   * 1. Primary ring
>   * 2. Async ring
> --
> 2.31.1
>


Re: [PATCH 36/38] drm/amd/amdgpu/vcn_v1_0: Fix some function naming disparity

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:775: warning: expecting prototype for 
> vcn_v1_0_start(). Prototype was for vcn_v1_0_start_spg_mode() instead
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:: warning: expecting prototype for 
> vcn_v1_0_stop(). Prototype was for vcn_v1_0_stop_spg_mode() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> index 0c1beefa3e498..2c9af18683feb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> @@ -765,7 +765,7 @@ static void vcn_1_0_enable_static_power_gating(struct 
> amdgpu_device *adev)
>  }
>
>  /**
> - * vcn_v1_0_start - start VCN block
> + * vcn_v1_0_start_spg_mode - start VCN block
>   *
>   * @adev: amdgpu_device pointer
>   *
> @@ -1101,7 +1101,7 @@ static int vcn_v1_0_start(struct amdgpu_device *adev)
>  }
>
>  /**
> - * vcn_v1_0_stop - stop VCN block
> + * vcn_v1_0_stop_spg_mode - stop VCN block
>   *
>   * @adev: amdgpu_device pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 35/38] drm/amd/amdgpu/sdma_v5_2: Repair typo in function name

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:04 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c:501: warning: expecting prototype for 
> sdma_v_0_ctx_switch_enable(). Prototype was for sdma_v5_2_ctx_switch_enable() 
> instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> index ecb82c39b1062..deb907f960906 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> @@ -517,7 +517,7 @@ static void sdma_v5_2_rlc_stop(struct amdgpu_device *adev)
>  }
>
>  /**
> - * sdma_v_0_ctx_switch_enable - stop the async dma engines context switch
> + * sdma_v5_2_ctx_switch_enable - stop the async dma engines context switch
>   *
>   * @adev: amdgpu_device pointer
>   * @enable: enable/disable the DMA MEs context switch.
> --
> 2.31.1
>


Re: [PATCH 33/38] drm/amd/amdgpu/sdma_v5_0: Fix typo in function name

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:04 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:563: warning: expecting prototype for 
> sdma_v_0_ctx_switch_enable(). Prototype was for sdma_v5_0_ctx_switch_enable() 
> instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index 75d7310f84392..2a2b9d50afb70 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -571,7 +571,7 @@ static void sdma_v5_0_rlc_stop(struct amdgpu_device *adev)
>  }
>
>  /**
> - * sdma_v_0_ctx_switch_enable - stop the async dma engines context switch
> + * sdma_v5_0_ctx_switch_enable - stop the async dma engines context switch
>   *
>   * @adev: amdgpu_device pointer
>   * @enable: enable/disable the DMA MEs context switch.
> --
> 2.31.1
>


Re: [PATCH 32/38] drm/amd/amdgpu/sdma_v4_0: Realign functions with their headers

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c:764: warning: expecting prototype for 
> sdma_v4_0_page_ring_set_wptr(). Prototype was for sdma_v4_0_ring_set_wptr() 
> instead
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c:830: warning: expecting prototype for 
> sdma_v4_0_ring_set_wptr(). Prototype was for sdma_v4_0_page_ring_set_wptr() 
> instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index d197185f77890..ae5464e2535a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -754,7 +754,7 @@ static uint64_t sdma_v4_0_ring_get_wptr(struct 
> amdgpu_ring *ring)
>  }
>
>  /**
> - * sdma_v4_0_page_ring_set_wptr - commit the write pointer
> + * sdma_v4_0_ring_set_wptr - commit the write pointer
>   *
>   * @ring: amdgpu ring pointer
>   *
> @@ -820,7 +820,7 @@ static uint64_t sdma_v4_0_page_ring_get_wptr(struct 
> amdgpu_ring *ring)
>  }
>
>  /**
> - * sdma_v4_0_ring_set_wptr - commit the write pointer
> + * sdma_v4_0_page_ring_set_wptr - commit the write pointer
>   *
>   * @ring: amdgpu ring pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 31/38] drm/amd/amdgpu/sdma_v2_4: Correct misnamed function 'sdma_v2_4_ring_emit_hdp_flush()'

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c:281: warning: expecting prototype for 
> sdma_v2_4_hdp_flush_ring_emit(). Prototype was for 
> sdma_v2_4_ring_emit_hdp_flush() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index 9f0dda040ec88..4509bd4cce2d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -271,7 +271,7 @@ static void sdma_v2_4_ring_emit_ib(struct amdgpu_ring 
> *ring,
>  }
>
>  /**
> - * sdma_v2_4_hdp_flush_ring_emit - emit an hdp flush on the DMA ring
> + * sdma_v2_4_ring_emit_hdp_flush - emit an hdp flush on the DMA ring
>   *
>   * @ring: amdgpu ring pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 30/38] drm/amd/amdgpu/gfx_v9_4_2: Mark functions called by reference as static

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1008:5: warning: no previous 
> prototype for ‘gfx_v9_4_2_query_ras_error_count’ [-Wmissing-prototypes]
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1054:6: warning: no previous 
> prototype for ‘gfx_v9_4_2_reset_ras_error_count’ [-Wmissing-prototypes]
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1063:5: warning: no previous 
> prototype for ‘gfx_v9_4_2_ras_error_inject’ [-Wmissing-prototypes]
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1133:6: warning: no previous 
> prototype for ‘gfx_v9_4_2_query_ras_error_status’ [-Wmissing-prototypes]
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1143:6: warning: no previous 
> prototype for ‘gfx_v9_4_2_reset_ras_error_status’ [-Wmissing-prototypes]
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1153:6: warning: no previous 
> prototype for ‘gfx_v9_4_2_enable_watchdog_timer’ [-Wmissing-prototypes]
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
> index dbad9ef002d59..87ec96a18a5dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
> @@ -1641,8 +1641,8 @@ static int gfx_v9_4_2_query_utc_edc_count(struct 
> amdgpu_device *adev,
> return 0;
>  }
>
> -int gfx_v9_4_2_query_ras_error_count(struct amdgpu_device *adev,
> -  void *ras_error_status)
> +static int gfx_v9_4_2_query_ras_error_count(struct amdgpu_device *adev,
> +   void *ras_error_status)
>  {
> struct ras_err_data *err_data = (struct ras_err_data 
> *)ras_error_status;
> uint32_t sec_count = 0, ded_count = 0;
> @@ -1690,7 +1690,7 @@ static void gfx_v9_4_2_reset_ea_err_status(struct 
> amdgpu_device *adev)
> mutex_unlock(>grbm_idx_mutex);
>  }
>
> -void gfx_v9_4_2_reset_ras_error_count(struct amdgpu_device *adev)
> +static void gfx_v9_4_2_reset_ras_error_count(struct amdgpu_device *adev)
>  {
> if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> return;
> @@ -1699,7 +1699,7 @@ void gfx_v9_4_2_reset_ras_error_count(struct 
> amdgpu_device *adev)
> gfx_v9_4_2_query_utc_edc_count(adev, NULL, NULL);
>  }
>
> -int gfx_v9_4_2_ras_error_inject(struct amdgpu_device *adev, void *inject_if)
> +static int gfx_v9_4_2_ras_error_inject(struct amdgpu_device *adev, void 
> *inject_if)
>  {
> struct ras_inject_if *info = (struct ras_inject_if *)inject_if;
> int ret;
> @@ -1772,7 +1772,7 @@ static void gfx_v9_4_2_query_utc_err_status(struct 
> amdgpu_device *adev)
> }
>  }
>
> -void gfx_v9_4_2_query_ras_error_status(struct amdgpu_device *adev)
> +static void gfx_v9_4_2_query_ras_error_status(struct amdgpu_device *adev)
>  {
> if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> return;
> @@ -1782,7 +1782,7 @@ void gfx_v9_4_2_query_ras_error_status(struct 
> amdgpu_device *adev)
> gfx_v9_4_2_query_sq_timeout_status(adev);
>  }
>
> -void gfx_v9_4_2_reset_ras_error_status(struct amdgpu_device *adev)
> +static void gfx_v9_4_2_reset_ras_error_status(struct amdgpu_device *adev)
>  {
> if (!amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> return;
> @@ -1792,7 +1792,7 @@ void gfx_v9_4_2_reset_ras_error_status(struct 
> amdgpu_device *adev)
> gfx_v9_4_2_reset_sq_timeout_status(adev);
>  }
>
> -void gfx_v9_4_2_enable_watchdog_timer(struct amdgpu_device *adev)
> +static void gfx_v9_4_2_enable_watchdog_timer(struct amdgpu_device *adev)
>  {
> uint32_t i;
> uint32_t data;
> --
> 2.31.1
>


Re: [PATCH 29/38] drm/radeon/r100: Realign doc header with function 'r100_cs_packet_parse_vline()'

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/radeon/r100.c:1423: warning: expecting prototype for 
> r100_cs_packet_next_vline(). Prototype was for r100_cs_packet_parse_vline() 
> instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/radeon/r100.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
> index fcfcaec25a9ef..3c4e7c15fd159 100644
> --- a/drivers/gpu/drm/radeon/r100.c
> +++ b/drivers/gpu/drm/radeon/r100.c
> @@ -1406,7 +1406,7 @@ int r100_cs_parse_packet0(struct radeon_cs_parser *p,
>  }
>
>  /**
> - * r100_cs_packet_next_vline() - parse userspace VLINE packet
> + * r100_cs_packet_parse_vline() - parse userspace VLINE packet
>   * @p: parser structure holding parsing context.
>   *
>   * Userspace sends a special sequence for VLINE waits.
> --
> 2.31.1
>


Re: [PATCH 26/38] drm/amd/amdgpu/gmc_v10_0: Fix potential copy/paste issue

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:955: warning: expecting prototype for 
> gmc_v8_0_gart_fini(). Prototype was for gmc_v10_0_gart_fini() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index f02dc904e4cfe..105ed1bf4a88c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -947,7 +947,7 @@ static int gmc_v10_0_sw_init(void *handle)
>  }
>
>  /**
> - * gmc_v8_0_gart_fini - vm fini callback
> + * gmc_v10_0_gart_fini - vm fini callback
>   *
>   * @adev: amdgpu_device pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 24/38] drm/amd/amdgpu/mmhub_v9_4: Fix naming disparity with 'mmhub_v9_4_set_fault_enable_default()'

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c:446: warning: expecting prototype 
> for mmhub_v1_0_set_fault_enable_default(). Prototype was for 
> mmhub_v9_4_set_fault_enable_default() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c 
> b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> index 47c8dd9d1c78e..c4ef822bbe8c5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> @@ -436,7 +436,7 @@ static void mmhub_v9_4_gart_disable(struct amdgpu_device 
> *adev)
>  }
>
>  /**
> - * mmhub_v1_0_set_fault_enable_default - update GART/VM fault handling
> + * mmhub_v9_4_set_fault_enable_default - update GART/VM fault handling
>   *
>   * @adev: amdgpu_device pointer
>   * @value: true redirects VM faults to the default page
> --
> 2.31.1
>


Re: [PATCH 23/38] drm/amd/amdgpu/gmc_v7_0: Fix potential copy/paste issue

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:526: warning: expecting prototype for 
> gmc_v8_0_set_fault_enable_default(). Prototype was for 
> gmc_v7_0_set_fault_enable_default() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index 210ada2289ec9..8e282169f99eb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -516,7 +516,7 @@ static void gmc_v7_0_get_vm_pte(struct amdgpu_device 
> *adev,
>  }
>
>  /**
> - * gmc_v8_0_set_fault_enable_default - update VM fault handling
> + * gmc_v7_0_set_fault_enable_default - update VM fault handling
>   *
>   * @adev: amdgpu_device pointer
>   * @value: true redirects VM faults to the default page
> --
> 2.31.1
>


Re: [PATCH 21/38] drm/amd/include/aldebaran_ip_offset: Mark top-level IP_BASE as __maybe_unused

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:259:29: warning: 
> ‘XGMI2_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:252:29: warning: 
> ‘XGMI1_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:245:29: warning: 
> ‘XGMI0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:238:29: warning: 
> ‘WAFL1_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:231:29: warning: 
> ‘WAFL0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:161:29: warning: 
> ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:119:29: warning: 
> ‘L2IMU0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:112:29: warning: 
> ‘L1IMUPCIE0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:105:29: warning: 
> ‘L1IMUIOAGR0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:98:29: warning: 
> ‘IOHC0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:91:29: warning: 
> ‘IOAPIC0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:84:29: warning: 
> ‘IOAGR0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:63:29: warning: 
> ‘FUSE_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:49:29: warning: 
> ‘DBGU_IO0_BASE’ defined but not used [-Wunused-const-variable=]
>  drivers/gpu/drm/amd/amdgpu/../include/aldebaran_ip_offset.h:42:29: warning: 
> ‘CLK_BASE’ defined but not used [-Wunused-const-variable=]
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Hawking Zhang 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/include/aldebaran_ip_offset.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/include/aldebaran_ip_offset.h 
> b/drivers/gpu/drm/amd/include/aldebaran_ip_offset.h
> index 644ffec2b0ce8..cdd426b41c20e 100644
> --- a/drivers/gpu/drm/amd/include/aldebaran_ip_offset.h
> +++ b/drivers/gpu/drm/amd/include/aldebaran_ip_offset.h
> @@ -30,7 +30,7 @@ struct IP_BASE_INSTANCE {
>
>  struct IP_BASE {
>  struct IP_BASE_INSTANCE instance[MAX_INSTANCE];
> -};
> +} __maybe_unused;
>
>  static const struct IP_BASE ATHUB_BASE = { { { { 0x0C20, 0x02408C00, 0, 
> 0, 0, 0 } },
>  { { 0, 0, 0, 0, 0, 0 } },
> --
> 2.31.1
>


Re: [PATCH 20/38] drm/radeon/radeon_vm: Fix function naming disparities

2021-05-20 Thread Alex Deucher
Applied.  Thanks!


On Thu, May 20, 2021 at 8:05 AM Christian König
 wrote:
>
> Am 20.05.21 um 14:02 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/radeon/radeon_vm.c:61: warning: expecting prototype for 
> > radeon_vm_num_pde(). Prototype was for radeon_vm_num_pdes() instead
> >   drivers/gpu/drm/radeon/radeon_vm.c:642: warning: expecting prototype for 
> > radeon_vm_update_pdes(). Prototype was for 
> > radeon_vm_update_page_directory() instead
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/radeon/radeon_vm.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_vm.c 
> > b/drivers/gpu/drm/radeon/radeon_vm.c
> > index 2dc9c9f98049b..36a38adaaea96 100644
> > --- a/drivers/gpu/drm/radeon/radeon_vm.c
> > +++ b/drivers/gpu/drm/radeon/radeon_vm.c
> > @@ -51,7 +51,7 @@
> >*/
> >
> >   /**
> > - * radeon_vm_num_pde - return the number of page directory entries
> > + * radeon_vm_num_pdes - return the number of page directory entries
> >*
> >* @rdev: radeon_device pointer
> >*
> > @@ -626,7 +626,7 @@ static uint32_t radeon_vm_page_flags(uint32_t flags)
> >   }
> >
> >   /**
> > - * radeon_vm_update_pdes - make sure that page directory is valid
> > + * radeon_vm_update_page_directory - make sure that page directory is valid
> >*
> >* @rdev: radeon_device pointer
> >* @vm: requested vm
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 19/38] drm/radeon/cik: Fix incorrectly named function 'cik_irq_suspend()'

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/radeon/cik.c:7450: warning: expecting prototype for 
> cik_irq_disable(). Prototype was for cik_irq_suspend() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/radeon/cik.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
> index 42a8afa839cbb..73ea5189dfb1a 100644
> --- a/drivers/gpu/drm/radeon/cik.c
> +++ b/drivers/gpu/drm/radeon/cik.c
> @@ -7439,7 +7439,7 @@ static void cik_irq_disable(struct radeon_device *rdev)
>  }
>
>  /**
> - * cik_irq_disable - disable interrupts for suspend
> + * cik_irq_suspend - disable interrupts for suspend
>   *
>   * @rdev: radeon_device pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 17/38] drm/amd/amdgpu/dce_v6_0: Repair function name of 'si_get_number_of_dram_channels()'

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c:468: warning: expecting prototype for 
> cik_get_number_of_dram_channels(). Prototype was for 
> si_get_number_of_dram_channels() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Luben Tuikov 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c 
> b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> index dbcb09cf83e63..c7803dc2b2d53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> @@ -456,7 +456,7 @@ static void dce_v6_0_program_fmt(struct drm_encoder 
> *encoder)
>  }
>
>  /**
> - * cik_get_number_of_dram_channels - get the number of dram channels
> + * si_get_number_of_dram_channels - get the number of dram channels
>   *
>   * @adev: amdgpu_device pointer
>   *
> --
> 2.31.1
>


Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 9:04 PM Jason Ekstrand  wrote:
>
> On Thu, May 20, 2021 at 12:23 PM Jason Ekstrand  wrote:
> >
> > On Thu, May 20, 2021 at 5:50 AM Christian König
> >  wrote:
> > >
> > > Am 20.05.21 um 09:55 schrieb Daniel Vetter:
> > > > On Wed, May 19, 2021 at 5:48 PM Michel Dänzer  
> > > > wrote:
> > > >> On 2021-05-19 5:21 p.m., Jason Ekstrand wrote:
> > > >>> On Wed, May 19, 2021 at 5:52 AM Michel Dänzer  
> > > >>> wrote:
> > >  On 2021-05-19 12:06 a.m., Jason Ekstrand wrote:
> > > > On Tue, May 18, 2021 at 4:17 PM Daniel Vetter  
> > > > wrote:
> > > >> On Tue, May 18, 2021 at 7:40 PM Christian König
> > > >>  wrote:
> > > >>> Am 18.05.21 um 18:48 schrieb Daniel Vetter:
> > >  On Tue, May 18, 2021 at 2:49 PM Christian König
> > >   wrote:
> > > 
> > > > And as long as we are all inside amdgpu we also don't have any 
> > > > oversync,
> > > > the issue only happens when we share dma-bufs with i915 (radeon 
> > > > and
> > > > AFAIK nouveau does the right thing as well).
> > >  Yeah because then you can't use the amdgpu dma_resv model 
> > >  anymore and
> > >  have to use the one atomic helpers use. Which is also the one 
> > >  that
> > >  e.g. Jason is threathening to bake in as uapi with his dma_buf 
> > >  ioctl,
> > >  so as soon as that lands and someone starts using it, something 
> > >  has to
> > >  adapt _anytime_ you have a dma-buf hanging around. Not just when 
> > >  it's
> > >  shared with another device.
> > > >>> Yeah, and that is exactly the reason why I will NAK this uAPI 
> > > >>> change.
>
> I just re-sent my dma-buf sync_file import/export series.  Assuming we
> can sort out what implicit sync looks like on the inside of dma-buf,
> would that alleviate some of your uAPI fears?  The idea would be that
> radeonsi and RADV would use amdgpu explicit sync primitives for
> everything and then, at the very end, fetch a sync_file and stuff it
> in the dma-buf's implicit sync container.  No nasty new uAPI for you.
> We still get implicit sync.  Everyone wins?

You still need the implicit fencing opt-out, which currently amdgpu
lacks completely.

But I also thought through the security implications of the patch set
(including the exclusive injection patch 4), and I think even with
current amdgpu that's perfectly fine. Not very useful since the fences
you get out aren't reflecting status accurately, but that's not a
correctness/security issue. You'll simply hit stalls when you don't
expect, because the kernel is allowed to throw random other exclusive
fences in whenever it feels like.

> Of course, that still leaves the question of what read and write
> fences are, what they mean, and where they go in the dma_resv.  But
> I'm trying to separate problems here.

Yeah I'll dump my patch set for clarifying status quo tomorrow for that.
-Daniel

>
> --Jason
>
>
> > > >>> This doesn't works for amdgpu at all for the reasons outlined 
> > > >>> above.
> > > >> Uh that's really not how uapi works. "my driver is right, everyone
> > > >> else is wrong" is not how cross driver contracts are defined. If 
> > > >> that
> > > >> means a perf impact until you've fixed your rules, that's on you.
> > > >>
> > > >> Also you're a few years too late with nacking this, it's already 
> > > >> uapi
> > > >> in the form of the dma-buf poll() support.
> > > > ^^  My fancy new ioctl doesn't expose anything that isn't already
> > > > there.  It just lets you take a snap-shot of a wait instead of doing
> > > > an active wait which might end up with more fences added depending 
> > > > on
> > > > interrupts and retries.  The dma-buf poll waits on all fences for
> > > > POLLOUT and only the exclusive fence for POLLIN.  It's already uAPI.
> > >  Note that the dma-buf poll support could be useful to Wayland 
> > >  compositors for the same purpose as Jason's new ioctl (only using 
> > >  client buffers which have finished drawing for an output frame, to 
> > >  avoid missing a refresh cycle due to client drawing), *if* it didn't 
> > >  work differently with amdgpu.
> > > 
> > >  Am I understanding correctly that Jason's new ioctl would also work 
> > >  differently with amdgpu as things stand currently? If so, that would 
> > >  be a real bummer and might hinder adoption of the ioctl by Wayland 
> > >  compositors.
> > > >>> My new ioctl has identical semantics to poll().  It just lets you take
> > > >>> a snapshot in time to wait on later instead of waiting on whatever
> > > >>> happens to be set right now.  IMO, having identical semantics to
> > > >>> poll() isn't something we want to change.
> > > >> Agreed.
> > > >>
> > > >> I'd argue then that making amdgpu poll semantics match those of other 
> > > >> drivers is a 

Re: [PATCH 16/38] drm/amd/amdgpu/si_dma: Fix some function name disparity

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:04 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/si_dma.c:320: warning: expecting prototype for 
> cik_dma_vm_copy_pte(). Prototype was for si_dma_vm_copy_pte() instead
>  drivers/gpu/drm/amd/amdgpu/si_dma.c:412: warning: expecting prototype for 
> si_dma_pad_ib(). Prototype was for si_dma_ring_pad_ib() instead
>  drivers/gpu/drm/amd/amdgpu/si_dma.c:425: warning: expecting prototype for 
> cik_sdma_ring_emit_pipeline_sync(). Prototype was for 
> si_dma_ring_emit_pipeline_sync() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/si_dma.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c 
> b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> index cb703e307238d..195b45bcb8ad9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> @@ -305,7 +305,7 @@ static int si_dma_ring_test_ib(struct amdgpu_ring *ring, 
> long timeout)
>  }
>
>  /**
> - * cik_dma_vm_copy_pte - update PTEs by copying them from the GART
> + * si_dma_vm_copy_pte - update PTEs by copying them from the GART
>   *
>   * @ib: indirect buffer to fill with commands
>   * @pe: addr of the page entry
> @@ -402,7 +402,7 @@ static void si_dma_vm_set_pte_pde(struct amdgpu_ib *ib,
>  }
>
>  /**
> - * si_dma_pad_ib - pad the IB to the required number of dw
> + * si_dma_ring_pad_ib - pad the IB to the required number of dw
>   *
>   * @ring: amdgpu_ring pointer
>   * @ib: indirect buffer to fill with padding
> @@ -415,7 +415,7 @@ static void si_dma_ring_pad_ib(struct amdgpu_ring *ring, 
> struct amdgpu_ib *ib)
>  }
>
>  /**
> - * cik_sdma_ring_emit_pipeline_sync - sync the pipeline
> + * si_dma_ring_emit_pipeline_sync - sync the pipeline
>   *
>   * @ring: amdgpu_ring pointer
>   *
> --
> 2.31.1
>


Re: [PATCH 14/38] drm/amd/amdgpu/gfx_v7_0: Repair function names in the documentation

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c:2126: warning: expecting prototype for 
> gfx_v7_0_ring_emit_hdp(). Prototype was for gfx_v7_0_ring_emit_hdp_flush() 
> instead
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c:2262: warning: expecting prototype for 
> gfx_v7_0_ring_emit_ib(). Prototype was for gfx_v7_0_ring_emit_ib_gfx() instead
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c:3207: warning: expecting prototype for 
> gfx_v7_0_ring_emit_vm_flush(). Prototype was for 
> gfx_v7_0_ring_emit_pipeline_sync() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index c35fdd2ef2d4d..685212c3ddae5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> @@ -2116,7 +2116,7 @@ static int gfx_v7_0_ring_test_ring(struct amdgpu_ring 
> *ring)
>  }
>
>  /**
> - * gfx_v7_0_ring_emit_hdp - emit an hdp flush on the cp
> + * gfx_v7_0_ring_emit_hdp_flush - emit an hdp flush on the cp
>   *
>   * @ring: amdgpu_ring structure holding ring information
>   *
> @@ -2242,7 +2242,7 @@ static void gfx_v7_0_ring_emit_fence_compute(struct 
> amdgpu_ring *ring,
>   * IB stuff
>   */
>  /**
> - * gfx_v7_0_ring_emit_ib - emit an IB (Indirect Buffer) on the ring
> + * gfx_v7_0_ring_emit_ib_gfx - emit an IB (Indirect Buffer) on the ring
>   *
>   * @ring: amdgpu_ring structure holding ring information
>   * @job: job to retrieve vmid from
> @@ -3196,7 +3196,7 @@ static int gfx_v7_0_cp_resume(struct amdgpu_device 
> *adev)
>  }
>
>  /**
> - * gfx_v7_0_ring_emit_vm_flush - cik vm flush using the CP
> + * gfx_v7_0_ring_emit_pipeline_sync - cik vm flush using the CP
>   *
>   * @ring: the ring to emit the commands to
>   *
> --
> 2.31.1
>


Re: [PATCH 13/38] drm/amd/amdgpu/cik_sdma: Fix a few incorrectly named functions

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c:735: warning: expecting prototype for 
> cik_sdma_vm_copy_pages(). Prototype was for cik_sdma_vm_copy_pte() instead
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c:762: warning: expecting prototype for 
> cik_sdma_vm_write_pages(). Prototype was for cik_sdma_vm_write_pte() instead
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c:792: warning: expecting prototype for 
> cik_sdma_vm_set_pages(). Prototype was for cik_sdma_vm_set_pte_pde() instead
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c:814: warning: expecting prototype for 
> cik_sdma_vm_pad_ib(). Prototype was for cik_sdma_ring_pad_ib() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: Evan Quan 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index c4bb8eed246d6..c8ebd108548d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -720,7 +720,7 @@ static int cik_sdma_ring_test_ib(struct amdgpu_ring 
> *ring, long timeout)
>  }
>
>  /**
> - * cik_sdma_vm_copy_pages - update PTEs by copying them from the GART
> + * cik_sdma_vm_copy_pte - update PTEs by copying them from the GART
>   *
>   * @ib: indirect buffer to fill with commands
>   * @pe: addr of the page entry
> @@ -746,7 +746,7 @@ static void cik_sdma_vm_copy_pte(struct amdgpu_ib *ib,
>  }
>
>  /**
> - * cik_sdma_vm_write_pages - update PTEs by writing them manually
> + * cik_sdma_vm_write_pte - update PTEs by writing them manually
>   *
>   * @ib: indirect buffer to fill with commands
>   * @pe: addr of the page entry
> @@ -775,7 +775,7 @@ static void cik_sdma_vm_write_pte(struct amdgpu_ib *ib, 
> uint64_t pe,
>  }
>
>  /**
> - * cik_sdma_vm_set_pages - update the page tables using sDMA
> + * cik_sdma_vm_set_pte_pde - update the page tables using sDMA
>   *
>   * @ib: indirect buffer to fill with commands
>   * @pe: addr of the page entry
> @@ -804,7 +804,7 @@ static void cik_sdma_vm_set_pte_pde(struct amdgpu_ib *ib, 
> uint64_t pe,
>  }
>
>  /**
> - * cik_sdma_vm_pad_ib - pad the IB to the required number of dw
> + * cik_sdma_ring_pad_ib - pad the IB to the required number of dw
>   *
>   * @ring: amdgpu_ring structure holding ring information
>   * @ib: indirect buffer to fill with padding
> --
> 2.31.1
>


Re: [PATCH 12/38] drm/amd/amdgpu/amdgpu_gmc: Fix a little naming related doc-rot

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:487: warning: expecting prototype 
> for amdgpu_tmz_set(). Prototype was for amdgpu_gmc_tmz_set() instead
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:533: warning: expecting prototype 
> for amdgpu_noretry_set(). Prototype was for amdgpu_gmc_noretry_set() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index a129ecc738693..58feb0a422c34 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -528,7 +528,7 @@ int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device 
> *adev)
>  }
>
>  /**
> - * amdgpu_tmz_set -- check and set if a device supports TMZ
> + * amdgpu_gmc_tmz_set -- check and set if a device supports TMZ
>   * @adev: amdgpu_device pointer
>   *
>   * Check and set if an the device @adev supports Trusted Memory
> @@ -574,7 +574,7 @@ void amdgpu_gmc_tmz_set(struct amdgpu_device *adev)
>  }
>
>  /**
> - * amdgpu_noretry_set -- set per asic noretry defaults
> + * amdgpu_gmc_noretry_set -- set per asic noretry defaults
>   * @adev: amdgpu_device pointer
>   *
>   * Set a per asic default for the no-retry parameter.
> --
> 2.31.1
>


Re: [PATCH 11/38] drm/amd/amdgpu/amdgpu_debugfs: Fix a couple of misnamed functions

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:03 AM Lee Jones  wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1004: warning: expecting 
> prototype for amdgpu_debugfs_regs_gfxoff_write(). Prototype was for 
> amdgpu_debugfs_gfxoff_write() instead
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1053: warning: expecting 
> prototype for amdgpu_debugfs_regs_gfxoff_status(). Prototype was for 
> amdgpu_debugfs_gfxoff_read() instead
>
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Sumit Semwal 
> Cc: amd-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-me...@vger.kernel.org
> Cc: linaro-mm-...@lists.linaro.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index bcaf271b39bf5..a9bbb0034e1ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -990,7 +990,7 @@ static ssize_t amdgpu_debugfs_gpr_read(struct file *f, 
> char __user *buf,
>  }
>
>  /**
> - * amdgpu_debugfs_regs_gfxoff_write - Enable/disable GFXOFF
> + * amdgpu_debugfs_gfxoff_write - Enable/disable GFXOFF
>   *
>   * @f: open file handle
>   * @buf: User buffer to write data from
> @@ -1041,7 +1041,7 @@ static ssize_t amdgpu_debugfs_gfxoff_write(struct file 
> *f, const char __user *bu
>
>
>  /**
> - * amdgpu_debugfs_regs_gfxoff_status - read gfxoff status
> + * amdgpu_debugfs_gfxoff_read - read gfxoff status
>   *
>   * @f: open file handle
>   * @buf: User buffer to store read data in
> --
> 2.31.1
>


Re: [PATCH 10/38] drm/amd/amdgpu/amdgpu_ids: Correct some function name disparity

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:04 AM Christian König
 wrote:
>
> Am 20.05.21 um 14:02 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c:200: warning: expecting prototype 
> > for amdgpu_vm_grab_idle(). Prototype was for amdgpu_vmid_grab_idle() instead
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c:272: warning: expecting prototype 
> > for amdgpu_vm_grab_reserved(). Prototype was for 
> > amdgpu_vmid_grab_reserved() instead
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c:337: warning: expecting prototype 
> > for amdgpu_vm_grab_used(). Prototype was for amdgpu_vmid_grab_used() instead
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c:410: warning: expecting prototype 
> > for amdgpu_vm_grab_id(). Prototype was for amdgpu_vmid_grab() instead
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > index b4971e90b98cf..c7f3aae23c625 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > @@ -183,7 +183,7 @@ bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device 
> > *adev,
> >   }
> >
> >   /**
> > - * amdgpu_vm_grab_idle - grab idle VMID
> > + * amdgpu_vmid_grab_idle - grab idle VMID
> >*
> >* @vm: vm to allocate id for
> >* @ring: ring we want to submit job to
> > @@ -256,7 +256,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> >   }
> >
> >   /**
> > - * amdgpu_vm_grab_reserved - try to assign reserved VMID
> > + * amdgpu_vmid_grab_reserved - try to assign reserved VMID
> >*
> >* @vm: vm to allocate id for
> >* @ring: ring we want to submit job to
> > @@ -325,7 +325,7 @@ static int amdgpu_vmid_grab_reserved(struct amdgpu_vm 
> > *vm,
> >   }
> >
> >   /**
> > - * amdgpu_vm_grab_used - try to reuse a VMID
> > + * amdgpu_vmid_grab_used - try to reuse a VMID
> >*
> >* @vm: vm to allocate id for
> >* @ring: ring we want to submit job to
> > @@ -397,7 +397,7 @@ static int amdgpu_vmid_grab_used(struct amdgpu_vm *vm,
> >   }
> >
> >   /**
> > - * amdgpu_vm_grab_id - allocate the next free VMID
> > + * amdgpu_vmid_grab - allocate the next free VMID
> >*
> >* @vm: vm to allocate id for
> >* @ring: ring we want to submit job to
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 03/38] drm/radeon/radeon_cs: Fix incorrectly documented function 'radeon_cs_parser_fini'

2021-05-20 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, May 20, 2021 at 8:04 AM Christian König
 wrote:
>
> Am 20.05.21 um 14:02 schrieb Lee Jones:
> > Fixes the following W=1 kernel build warning(s):
> >
> >   drivers/gpu/drm/radeon/radeon_cs.c:417: warning: expecting prototype for 
> > cs_parser_fini(). Prototype was for radeon_cs_parser_fini() instead
> >
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: Jerome Glisse 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Signed-off-by: Lee Jones 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/radeon/radeon_cs.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
> > b/drivers/gpu/drm/radeon/radeon_cs.c
> > index 48162501c1ee6..80a3bee933d6d 100644
> > --- a/drivers/gpu/drm/radeon/radeon_cs.c
> > +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> > @@ -405,7 +405,7 @@ static int cmp_size_smaller_first(void *priv, const 
> > struct list_head *a,
> >   }
> >
> >   /**
> > - * cs_parser_fini() - clean parser states
> > + * radeon_cs_parser_fini() - clean parser states
> >* @parser: parser structure holding parsing context.
> >* @error:  error number
> >* @backoff:indicator to backoff the reservation
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-20 Thread Jason Ekstrand
On Thu, May 20, 2021 at 12:23 PM Jason Ekstrand  wrote:
>
> On Thu, May 20, 2021 at 5:50 AM Christian König
>  wrote:
> >
> > Am 20.05.21 um 09:55 schrieb Daniel Vetter:
> > > On Wed, May 19, 2021 at 5:48 PM Michel Dänzer  wrote:
> > >> On 2021-05-19 5:21 p.m., Jason Ekstrand wrote:
> > >>> On Wed, May 19, 2021 at 5:52 AM Michel Dänzer  
> > >>> wrote:
> >  On 2021-05-19 12:06 a.m., Jason Ekstrand wrote:
> > > On Tue, May 18, 2021 at 4:17 PM Daniel Vetter  wrote:
> > >> On Tue, May 18, 2021 at 7:40 PM Christian König
> > >>  wrote:
> > >>> Am 18.05.21 um 18:48 schrieb Daniel Vetter:
> >  On Tue, May 18, 2021 at 2:49 PM Christian König
> >   wrote:
> > 
> > > And as long as we are all inside amdgpu we also don't have any 
> > > oversync,
> > > the issue only happens when we share dma-bufs with i915 (radeon 
> > > and
> > > AFAIK nouveau does the right thing as well).
> >  Yeah because then you can't use the amdgpu dma_resv model anymore 
> >  and
> >  have to use the one atomic helpers use. Which is also the one that
> >  e.g. Jason is threathening to bake in as uapi with his dma_buf 
> >  ioctl,
> >  so as soon as that lands and someone starts using it, something 
> >  has to
> >  adapt _anytime_ you have a dma-buf hanging around. Not just when 
> >  it's
> >  shared with another device.
> > >>> Yeah, and that is exactly the reason why I will NAK this uAPI 
> > >>> change.

I just re-sent my dma-buf sync_file import/export series.  Assuming we
can sort out what implicit sync looks like on the inside of dma-buf,
would that alleviate some of your uAPI fears?  The idea would be that
radeonsi and RADV would use amdgpu explicit sync primitives for
everything and then, at the very end, fetch a sync_file and stuff it
in the dma-buf's implicit sync container.  No nasty new uAPI for you.
We still get implicit sync.  Everyone wins?

Of course, that still leaves the question of what read and write
fences are, what they mean, and where they go in the dma_resv.  But
I'm trying to separate problems here.

--Jason


> > >>> This doesn't works for amdgpu at all for the reasons outlined above.
> > >> Uh that's really not how uapi works. "my driver is right, everyone
> > >> else is wrong" is not how cross driver contracts are defined. If that
> > >> means a perf impact until you've fixed your rules, that's on you.
> > >>
> > >> Also you're a few years too late with nacking this, it's already uapi
> > >> in the form of the dma-buf poll() support.
> > > ^^  My fancy new ioctl doesn't expose anything that isn't already
> > > there.  It just lets you take a snap-shot of a wait instead of doing
> > > an active wait which might end up with more fences added depending on
> > > interrupts and retries.  The dma-buf poll waits on all fences for
> > > POLLOUT and only the exclusive fence for POLLIN.  It's already uAPI.
> >  Note that the dma-buf poll support could be useful to Wayland 
> >  compositors for the same purpose as Jason's new ioctl (only using 
> >  client buffers which have finished drawing for an output frame, to 
> >  avoid missing a refresh cycle due to client drawing), *if* it didn't 
> >  work differently with amdgpu.
> > 
> >  Am I understanding correctly that Jason's new ioctl would also work 
> >  differently with amdgpu as things stand currently? If so, that would 
> >  be a real bummer and might hinder adoption of the ioctl by Wayland 
> >  compositors.
> > >>> My new ioctl has identical semantics to poll().  It just lets you take
> > >>> a snapshot in time to wait on later instead of waiting on whatever
> > >>> happens to be set right now.  IMO, having identical semantics to
> > >>> poll() isn't something we want to change.
> > >> Agreed.
> > >>
> > >> I'd argue then that making amdgpu poll semantics match those of other 
> > >> drivers is a pre-requisite for the new ioctl, otherwise it seems 
> > >> unlikely that the ioctl will be widely adopted.
> > > This seems backwards, because that means useful improvements in all
> > > other drivers are stalled until amdgpu is fixed.
> >
> > Well there is nothing to fix in amdgpu, what we need to is to come up
> > with an DMA-buf implicit syncing model which works for everyone.
> >
> > I've pointed this problem out at FOSDEM roughly 6 years ago, before
> > DMA-buf was even merged upstream and way before amdgpu even existed. And
> > the response was yeah, maybe we need to look at this as well.
> >
> > Over the years I've mentioned now at least 5 times that this isn't going
> > to work in some situations and came up with different approaches how to
> > fix it.
> >
> > And you still have the nerves to tell me that this isn't a problem and
> > we should fix amdgpu instead? Sorry, but I'm really running out of ideas
> > 

[PATCH 3/4] dma-buf: Add an API for exporting sync files (v9)

2021-05-20 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over- synchronization.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Drop the sync_file import as it was all-around sketchy and not nearly
   as useful as import.
 - Re-introduce READ/WRITE flag support for export
 - Rework the commit message

v7 (Jason Ekstrand):
 - Require at least one sync flag
 - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
 - Use _rcu helpers since we're accessing the dma_resv read-only

v8 (Jason Ekstrand):
 - Return -ENOMEM if the sync_file_create fails
 - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)

v9 (Jason Ekstrand):
 - Add documentation for the new ioctl

v10 (Jason Ekstrand):
 - Go back to dma_buf_sync_file as the ioctl struct name

Signed-off-by: Jason Ekstrand 
Acked-by: Simon Ser 
---
 drivers/dma-buf/dma-buf.c| 62 
 include/uapi/linux/dma-buf.h | 24 ++
 2 files changed, 86 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383eb..7679562b57bfc 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -362,6 +363,62 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return ret;
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+static long dma_buf_export_sync_file(struct dma_buf *dmabuf,
+void __user *user_data)
+{
+   struct dma_buf_sync_file arg;
+   struct dma_fence *fence = NULL;
+   struct sync_file *sync_file;
+   int fd, ret;
+
+   if (copy_from_user(, user_data, sizeof(arg)))
+   return -EFAULT;
+
+   if (arg.flags & ~DMA_BUF_SYNC_RW)
+   return -EINVAL;
+
+   if ((arg.flags & DMA_BUF_SYNC_RW) == 0)
+   return -EINVAL;
+
+   fd = get_unused_fd_flags(O_CLOEXEC);
+   if (fd < 0)
+   return fd;
+
+   if (arg.flags & DMA_BUF_SYNC_WRITE) {
+   ret = dma_resv_get_singleton_rcu(dmabuf->resv, NULL, );
+   if (ret)
+   goto err_put_fd;
+   } else if (arg.flags & DMA_BUF_SYNC_READ) {
+   fence = dma_resv_get_excl_rcu(dmabuf->resv);
+   }
+
+   if (!fence)
+   fence = dma_fence_get_stub();
+
+   sync_file = sync_file_create(fence);
+
+   dma_fence_put(fence);
+
+   if (!sync_file) {
+   ret = -ENOMEM;
+   goto err_put_fd;
+   }
+
+   fd_install(fd, sync_file->file);
+
+   arg.fd = fd;
+   if (copy_to_user(user_data, , sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+
+err_put_fd:
+   put_unused_fd(fd);
+   return ret;
+}
+#endif
+
 static long dma_buf_ioctl(struct file *file,
  unsigned int cmd, unsigned long 

[PATCH 2/4] dma-buf: add dma_resv_get_singleton_rcu (v4)

2021-05-20 Thread Jason Ekstrand
Add a helper function to get a single fence representing
all fences in a dma_resv object.

This fence is either the only one in the object or all not
signaled fences of the object in a flatted out dma_fence_array.

v2 (Jason Ekstrand):
 - Take reference of fences both for creating the dma_fence_array and in
   the case where we return one fence.
 - Handle the case where dma_resv_get_list() returns NULL

v3 (Jason Ekstrand):
 - Add an _rcu suffix because it is read-only
 - Rewrite to use dma_resv_get_fences_rcu so it's RCU-safe
 - Add an EXPORT_SYMBOL_GPL declaration
 - Re-author the patch to Jason since very little is left of Christian
   König's original patch
 - Remove the extra fence argument

v4 (Jason Ekstrand):
 - Restore the extra fence argument

Signed-off-by: Jason Ekstrand 

get_singleton
---
 drivers/dma-buf/dma-resv.c | 122 +
 include/linux/dma-resv.h   |   3 +
 2 files changed, 125 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 6ddbeb5dfbf65..25995fc15c370 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -33,6 +33,8 @@
  */
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +51,19 @@
  * write-side updates.
  */
 
+/**
+ * dma_fence_deep_dive_for_each - deep dive into the fence containers
+ * @fence: resulting fence
+ * @chain: variable for a dma_fence_chain
+ * @index: index into a dma_fence_array
+ * @head: starting point
+ *
+ * Helper to deep dive into the fence containers for flattening them.
+ */
+#define dma_fence_deep_dive_for_each(fence, chain, index, head)\
+   dma_fence_chain_for_each(chain, head)   \
+   dma_fence_array_for_each(fence, index, chain)
+
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
@@ -517,6 +532,113 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
 }
 EXPORT_SYMBOL_GPL(dma_resv_get_fences_rcu);
 
+/**
+ * dma_resv_get_singleton - get a single fence for the dma_resv object
+ * @obj: the reservation object
+ * @extra: extra fence to add to the resulting array
+ * @result: resulting dma_fence
+ *
+ * Get a single fence representing all unsignaled fences in the dma_resv object
+ * plus the given extra fence. If we got only one fence return a new
+ * reference to that, otherwise return a dma_fence_array object.
+ *
+ * RETURNS
+ * Returns -NOMEM if allocations fail, zero otherwise.
+ */
+int dma_resv_get_singleton_rcu(struct dma_resv *obj, struct dma_fence *extra,
+  struct dma_fence **result)
+{
+   struct dma_fence **resv_fences, *fence, *chain, **fences;
+   struct dma_fence_array *array;
+   unsigned int num_resv_fences, num_fences;
+   unsigned int ret, i, j;
+
+   ret = dma_resv_get_fences_rcu(obj, NULL, _resv_fences, 
_fences);
+   if (ret)
+   return ret;
+
+   num_fences = 0;
+   *result = NULL;
+
+   if (num_resv_fences == 0 && !extra)
+   return 0;
+
+   for (i = 0; i < num_resv_fences; ++i) {
+   dma_fence_deep_dive_for_each(fence, chain, j, resv_fences[i]) {
+   if (dma_fence_is_signaled(fence))
+   continue;
+
+   *result = fence;
+   ++num_fences;
+   }
+   }
+
+   if (extra) {
+   dma_fence_deep_dive_for_each(fence, chain, j, extra) {
+   if (dma_fence_is_signaled(fence))
+   continue;
+
+   *result = fence;
+   ++num_fences;
+   }
+   }
+
+   if (num_fences <= 1) {
+   *result = dma_fence_get(*result);
+   goto put_resv_fences;
+   }
+
+   fences = kmalloc_array(num_fences, sizeof(struct dma_fence*),
+  GFP_KERNEL);
+   if (!fences) {
+   *result = NULL;
+   ret = -ENOMEM;
+   goto put_resv_fences;
+   }
+
+   num_fences = 0;
+   for (i = 0; i < num_resv_fences; ++i) {
+   dma_fence_deep_dive_for_each(fence, chain, j, resv_fences[i]) {
+   if (!dma_fence_is_signaled(fence))
+   fences[num_fences++] = dma_fence_get(fence);
+   }
+   }
+
+   if (extra) {
+   dma_fence_deep_dive_for_each(fence, chain, j, extra) {
+   if (dma_fence_is_signaled(fence))
+   fences[num_fences++] = dma_fence_get(fence);
+   }
+   }
+
+   if (num_fences <= 1) {
+   *result = num_fences ? fences[0] : NULL;
+   kfree(fences);
+   goto put_resv_fences;
+   }
+
+   array = dma_fence_array_create(num_fences, fences,
+  dma_fence_context_alloc(1),
+  1, false);
+   if 

[PATCH 0/4] dma-buf: Add an API for exporting sync files (v8)

2021-05-20 Thread Jason Ekstrand
This is mostly a re-send of v8 only with a fourth patch which contains the
sync file import ioctl that I had in the original series.  I've not updated
the IGT tests yet for sync file import.  This resend is mostly intended to
aid in discussions around implicit sync in general.  I'll write up some IGT
tests if there is serious interest in patch 4.  I can also update the Mesa
MR to use it for Vulkan.

---

Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over- synchronization.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://lists.freedesktop.org/archives/igt-dev/2021-March/029825.html

Cc: Christian König 
Cc: Michel Dänzer 
Cc: Dave Airlie 
Cc: Bas Nieuwenhuizen 
Cc: Daniel Stone 

Christian König (1):
  dma-buf: add dma_fence_array_for_each (v2)

Jason Ekstrand (3):
  dma-buf: add dma_resv_get_singleton_rcu (v4)
  dma-buf: Add an API for exporting sync files (v9)
  RFC: dma-buf: Add an API for importing sync files (v6)

 drivers/dma-buf/dma-buf.c |  94 +++
 drivers/dma-buf/dma-fence-array.c |  27 +++
 drivers/dma-buf/dma-resv.c| 122 ++
 include/linux/dma-fence-array.h   |  17 +
 include/linux/dma-resv.h  |   3 +
 include/uapi/linux/dma-buf.h  |  25 ++
 6 files changed, 288 insertions(+)

-- 
2.31.1



[PATCH 4/4] RFC: dma-buf: Add an API for importing sync files (v6)

2021-05-20 Thread Jason Ekstrand
This patch is analogous to the previous sync file export patch in that
it allows you to import a sync_file into a dma-buf.  Unlike the previous
patch, however, this does add genuinely new functionality to dma-buf.
Without this, the only way to attach a sync_file to a dma-buf is to
submit a batch to your driver of choice which waits on the sync_file and
claims to write to the dma-buf.  Even if said batch is a no-op, a submit
is typically way more overhead than just attaching a fence.  A submit
may also imply extra synchronization with other work because it happens
on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from
vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
those as an exclusive (write) fence on the dma-buf.  We handle it in
Mesa today with the above mentioned dummy submit trick.  This ioctl
would allow us to set it directly without the dummy submit.

This may also open up possibilities for GPU drivers to move away from
implicit sync for their kernel driver uAPI and instead provide sync
files and rely on dma-buf import/export for communicating with other
implicit sync clients.

We make the explicit choice here to only allow setting RW fences which
translates to an exclusive fence on the dma_resv.  There's no use for
read-only fences for communicating with other implicit sync userspace
and any such attempts are likely to be racy at best.  When we got to
insert the RW fence, the actual fence we set as the new exclusive fence
is a combination of the sync_file provided by the user and all the other
fences on the dma_resv.  This ensures that the newly added exclusive
fence will never signal before the old one would have and ensures that
we don't break any dma_resv contracts.  We require userspace to specify
RW in the flags for symmetry with the export ioctl and in case we ever
want to support read fences in the future.

There is one downside here that's worth documenting:  If two clients
writing to the same dma-buf using this API race with each other, their
actions on the dma-buf may happen in parallel or in an undefined order.
Both with and without this API, the pattern is the same:  Collect all
the fences on dma-buf, submit work which depends on said fences, and
then set a new exclusive (write) fence on the dma-buf which depends on
said work.  The difference is that, when it's all handled by the GPU
driver's submit ioctl, the three operations happen atomically under the
dma_resv lock.  If two userspace submits race, one will happen before
the other.  You aren't guaranteed which but you are guaranteed that
they're strictly ordered.  If userspace manages the fences itself, then
these three operations happen separately and the two render operations
may happen genuinely in parallel or get interleaved.  However, this is a
case of userspace racing with itself.  As long as we ensure userspace
can't back the kernel into a corner, it should be fine.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v5 (Jason Ekstrand):
 - Split import and export into separate patches
 - New commit message

Signed-off-by: Jason Ekstrand 
---
 drivers/dma-buf/dma-buf.c| 32 
 include/uapi/linux/dma-buf.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 7679562b57bfc..c9d6b9195c00c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -417,6 +417,36 @@ static long dma_buf_export_sync_file(struct dma_buf 
*dmabuf,
put_unused_fd(fd);
return ret;
 }
+
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf,
+const void __user *user_data)
+{
+   struct dma_buf_sync_file arg;
+   struct dma_fence *fence, *singleton = NULL;
+   int ret = 0;
+
+   if (copy_from_user(, user_data, sizeof(arg)))
+   return -EFAULT;
+
+   if (arg.flags != DMA_BUF_SYNC_RW)
+   return -EINVAL;
+
+   fence = sync_file_get_fence(arg.fd);
+   if (!fence)
+   return -EINVAL;
+
+   dma_resv_lock(dmabuf->resv, NULL);
+
+   ret = dma_resv_get_singleton_rcu(dmabuf->resv, fence, );
+   if (!ret && singleton)
+   dma_resv_add_excl_fence(dmabuf->resv, singleton);
+
+   dma_resv_unlock(dmabuf->resv);
+
+   dma_fence_put(fence);
+
+   

[PATCH 1/4] dma-buf: add dma_fence_array_for_each (v2)

2021-05-20 Thread Jason Ekstrand
From: Christian König 

Add a helper to iterate over all fences in a dma_fence_array object.

v2 (Jason Ekstrand)
 - Return NULL from dma_fence_array_first if head == NULL.  This matches
   the iterator behavior of dma_fence_chain_for_each in that it iterates
   zero times if head == NULL.
 - Return NULL from dma_fence_array_next if index > array->num_fences.

Signed-off-by: Jason Ekstrand 
---
 drivers/dma-buf/dma-fence-array.c | 27 +++
 include/linux/dma-fence-array.h   | 17 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c
index d3fbd950be944..2ac1afc697d0f 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -201,3 +201,30 @@ bool dma_fence_match_context(struct dma_fence *fence, u64 
context)
return true;
 }
 EXPORT_SYMBOL(dma_fence_match_context);
+
+struct dma_fence *dma_fence_array_first(struct dma_fence *head)
+{
+   struct dma_fence_array *array;
+
+   if (!head)
+   return NULL;
+
+   array = to_dma_fence_array(head);
+   if (!array)
+   return head;
+
+   return array->fences[0];
+}
+EXPORT_SYMBOL(dma_fence_array_first);
+
+struct dma_fence *dma_fence_array_next(struct dma_fence *head,
+  unsigned int index)
+{
+   struct dma_fence_array *array = to_dma_fence_array(head);
+
+   if (!array || index >= array->num_fences)
+   return NULL;
+
+   return array->fences[index];
+}
+EXPORT_SYMBOL(dma_fence_array_next);
diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 303dd712220fd..588ac8089dd61 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -74,6 +74,19 @@ to_dma_fence_array(struct dma_fence *fence)
return container_of(fence, struct dma_fence_array, base);
 }
 
+/**
+ * dma_fence_array_for_each - iterate over all fences in array
+ * @fence: current fence
+ * @index: index into the array
+ * @head: potential dma_fence_array object
+ *
+ * Test if @array is a dma_fence_array object and if yes iterate over all 
fences
+ * in the array. If not just iterate over the fence in @array itself.
+ */
+#define dma_fence_array_for_each(fence, index, head)   \
+   for (index = 0, fence = dma_fence_array_first(head); fence; \
+++(index), fence = dma_fence_array_next(head, index))
+
 struct dma_fence_array *dma_fence_array_create(int num_fences,
   struct dma_fence **fences,
   u64 context, unsigned seqno,
@@ -81,4 +94,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences,
 
 bool dma_fence_match_context(struct dma_fence *fence, u64 context);
 
+struct dma_fence *dma_fence_array_first(struct dma_fence *head);
+struct dma_fence *dma_fence_array_next(struct dma_fence *head,
+  unsigned int index);
+
 #endif /* __LINUX_DMA_FENCE_ARRAY_H */
-- 
2.31.1



Re: [PATCH] vgaarb: Use ACPI HID name to find integrated GPU

2021-05-20 Thread Alex Deucher
Pushed to drm-misc-next.  Thanks!

Alex

On Wed, May 19, 2021 at 12:45 PM Alex Deucher  wrote:
>
> On Wed, May 19, 2021 at 9:57 AM Kai-Heng Feng
>  wrote:
> >
> > Commit 3d42f1ddc47a ("vgaarb: Keep adding VGA device in queue") assumes
> > the first device is an integrated GPU. However, on AMD platforms an
> > integrated GPU can have higher PCI device number than a discrete GPU.
> >
> > Integrated GPU on ACPI platform generally has _DOD and _DOS method, so
> > use that as predicate to find integrated GPU. If the new strategy
> > doesn't work, fallback to use the first device as boot VGA.
> >
> > Signed-off-by: Kai-Heng Feng 
>
> Reviewed-by: Alex Deucher 
>
> Unless there are any other comments, I'll apply it tomorrow.
>
> Alex
>
> > ---
> >  drivers/gpu/vga/vgaarb.c | 31 ++-
> >  1 file changed, 26 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
> > index 5180c5687ee5..949fde433ea2 100644
> > --- a/drivers/gpu/vga/vgaarb.c
> > +++ b/drivers/gpu/vga/vgaarb.c
> > @@ -50,6 +50,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >
> > @@ -1450,9 +1451,23 @@ static struct miscdevice vga_arb_device = {
> > MISC_DYNAMIC_MINOR, "vga_arbiter", _arb_device_fops
> >  };
> >
> > +#if defined(CONFIG_ACPI)
> > +static bool vga_arb_integrated_gpu(struct device *dev)
> > +{
> > +   struct acpi_device *adev = ACPI_COMPANION(dev);
> > +
> > +   return adev && !strcmp(acpi_device_hid(adev), ACPI_VIDEO_HID);
> > +}
> > +#else
> > +static bool vga_arb_integrated_gpu(struct device *dev)
> > +{
> > +   return false;
> > +}
> > +#endif
> > +
> >  static void __init vga_arb_select_default_device(void)
> >  {
> > -   struct pci_dev *pdev;
> > +   struct pci_dev *pdev, *found = NULL;
> > struct vga_device *vgadev;
> >
> >  #if defined(CONFIG_X86) || defined(CONFIG_IA64)
> > @@ -1505,20 +1520,26 @@ static void __init 
> > vga_arb_select_default_device(void)
> >  #endif
> >
> > if (!vga_default_device()) {
> > -   list_for_each_entry(vgadev, _list, list) {
> > +   list_for_each_entry_reverse(vgadev, _list, list) {
> > struct device *dev = >pdev->dev;
> > u16 cmd;
> >
> > pdev = vgadev->pdev;
> > pci_read_config_word(pdev, PCI_COMMAND, );
> > if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
> > -   vgaarb_info(dev, "setting as boot device 
> > (VGA legacy resources not available)\n");
> > -   vga_set_default_device(pdev);
> > -   break;
> > +   found = pdev;
> > +   if (vga_arb_integrated_gpu(dev))
> > +   break;
> > }
> > }
> > }
> >
> > +   if (found) {
> > +   vgaarb_info(>dev, "setting as boot device (VGA 
> > legacy resources not available)\n");
> > +   vga_set_default_device(found);
> > +   return;
> > +   }
> > +
> > if (!vga_default_device()) {
> > vgadev = list_first_entry_or_null(_list,
> >   struct vga_device, list);
> > --
> > 2.31.1
> >


[Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

2021-05-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #33 from Jerome C (m...@jeromec.com) ---
(In reply to kolAflash from comment #32)
> In the meanwhile I performed test number 2.
> 
> > 2. amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg
> [...]
> 
> This time the crash was very different!
> 
> After some minutes (about 3) the graphical screen actually turned back on.
> I'm pretty sure that didn't happen with the other kernels I tested.
> (never tested amd-drm-next-5.14-2021-05-12 before)
> 
> Nevertheless everything graphical is lagging extremely. If I move the mouse
> or do anything else it takes more than 10 seconds until something happens on
> the screen.
> 
> On the other hand SSH access is smoothly possible. And I was able to save
> the dmesg output. (see attachment)
> Unlocking the screen via SSH (loginctl) or starting graphical programs
> (DISPLAY=:0 xterm) works, but is extremely slow too. (> 10 seconds waiting)

I experienced this laggy too although I didn't try the SSH thing ( I don't have
it setup )

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 211277] sometimes crash at s2ram-wake (Ryzen 3500U): amdgpu, drm, commit_tail, amdgpu_dm_atomic_commit_tail

2021-05-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=211277

--- Comment #32 from kolAflash (kolafl...@kolahilft.de) ---
Created attachment 296901
  --> https://bugzilla.kernel.org/attachment.cgi?id=296901=edit
dmesg via SSH, running amd-drm-next-5.14-2021-05-12 without ip_block_mask=0x0ff
and with Xorg

(In reply to Jerome C from comment #31)
> [...]
> hiya, you may not know this but use in "amdgpu.ip_block_mask=0x0ff" and not
> "ip_block_mask=0x0ff"
> [...]
> I can see in your kernel logs that VCN is still enabled

Ooops you're right.
I know someone wrote that before. But it seems I somehow missed it while
editing my Grub parameters.

I'll give it another try!





In the meanwhile I performed test number 2.

> 2. amd-drm-next-5.14-2021-05-12* without ip_block_mask=0x0ff, with Xorg [...]

This time the crash was very different!

After some minutes (about 3) the graphical screen actually turned back on.
I'm pretty sure that didn't happen with the other kernels I tested.
(never tested amd-drm-next-5.14-2021-05-12 before)

Nevertheless everything graphical is lagging extremely. If I move the mouse or
do anything else it takes more than 10 seconds until something happens on the
screen.

On the other hand SSH access is smoothly possible. And I was able to save the
dmesg output. (see attachment)
Unlocking the screen via SSH (loginctl) or starting graphical programs
(DISPLAY=:0 xterm) works, but is extremely slow too. (> 10 seconds waiting)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 6:31 PM Christian König
 wrote:
>
> Yeah, having the timestamp is a good idea as well.
>
>   drm-driver: i915
>
> I think we should rather add something like printing 
> file_operations->owner->name to the common fdinfo code.
>
> This way we would have something common for all drivers in the system. I'm 
> just not sure if that also works if they are compiled into the kernel.

Yeah common code could print driver name, busid and all that stuff. I
think the common code should also provide some helpers for the key:
value pair formatting (and maybe check for all lower-case and stuff
like that) because if we don't then this is going to be a complete
mess that's not parseable.

And value should be real semantic stuff, not "here's a string". So
accumulated time as a struct ktime as the example.
-Daniel

> Regards,
> Christian.
>
> Am 20.05.21 um 18:26 schrieb Nieto, David M:
>
> [AMD Official Use Only]
>
>
> i would like to add a unit marker for the stats that we monitor in the fd, as 
> we discussed currently we are displaying the usage percentage, because we 
> wanted to to provide single query percentages, but this may evolve with time.
>
> May I suggest to add two new fields
>
> drm-stat-interval: <64 bit> ns
> drm-stat-timestamp: <64 bit> ns
>
> If interval is set, engine utilization is calculated by doing  = 
> 100*/
> if interval is not set, two reads are needed :  = 
> 100* /  drm-stat-timestamp0>
>
>
> Regards,
>
> David
>
>
> 
> From: Tvrtko Ursulin 
> Sent: Thursday, May 20, 2021 8:12 AM
> To: intel-...@lists.freedesktop.org 
> Cc: dri-devel@lists.freedesktop.org ; Tvrtko 
> Ursulin ; Nieto, David M ; 
> Koenig, Christian ; Daniel Vetter 
> Subject: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo
>
> From: Tvrtko Ursulin 
>
> Similar to AMD commit
> 874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
> infrastructure added in previous patches, we add basic client info
> and GPU engine utilisation for i915.
>
> Example of the output:
>
>   pos:0
>   flags:  012
>   mnt_id: 21
>   drm-driver: i915
>   drm-pdev:   :00:02.0
>   drm-client-id:  7
>   drm-engine-render:  9288864723 ns
>   drm-engine-copy:2035071108 ns
>   drm-engine-video:   0 ns
>   drm-engine-video-enhance:   0 ns
>
> DRM related fields are appropriately prefixed for easy parsing and
> separation from generic fdinfo fields.
>
> Idea is for some fields to become either fully or partially standardised
> in order to enable writting of generic top-like tools.
>
> Initial proposal for fully standardised common fields:
>
>  drm-driver: 
>  drm-pdev: 
>
> Optional fully standardised:
>
>  drm-client-id: 
>
> Optional partially standardised:
>
>  engine-:  ns
>  memory-:  KiB
>
> Once agreed the format would need to go to some README or kerneldoc in
> DRM core.
>
> Signed-off-by: Tvrtko Ursulin 
> Cc: David M Nieto 
> Cc: Christian König 
> Cc: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_drm_client.c | 68 ++
>  drivers/gpu/drm/i915/i915_drm_client.h |  4 ++
>  drivers/gpu/drm/i915/i915_drv.c|  3 ++
>  3 files changed, 75 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
> b/drivers/gpu/drm/i915/i915_drm_client.c
> index 1e5db7753276..5e9cfba1116b 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> @@ -9,6 +9,11 @@
>
>  #include 
>
> +#include 
> +
> +#include "gem/i915_gem_context.h"
> +#include "gt/intel_engine_user.h"
> +
>  #include "i915_drm_client.h"
>  #include "i915_drv.h"
>  #include "i915_gem.h"
> @@ -168,3 +173,66 @@ void i915_drm_clients_fini(struct i915_drm_clients 
> *clients)
>
>  xa_destroy(>xarray);
>  }
> +
> +#ifdef CONFIG_PROC_FS
> +static const char * const uabi_class_names[] = {
> +   [I915_ENGINE_CLASS_RENDER] = "render",
> +   [I915_ENGINE_CLASS_COPY] = "copy",
> +   [I915_ENGINE_CLASS_VIDEO] = "video",
> +   [I915_ENGINE_CLASS_VIDEO_ENHANCE] = "video-enhance",
> +};
> +
> +static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> +{
> +   struct i915_gem_engines_iter it;
> +   struct intel_context *ce;
> +   u64 total = 0;
> +
> +   for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
> +   if (ce->engine->uabi_class != class)
> +   continue;
> +
> +   total += intel_context_get_total_runtime_ns(ce);
> +   }
> +
> +   return total;
> +}
> +
> +static void
> +show_client_class(struct seq_file *m,
> + struct i915_drm_client *client,
> + unsigned int class)
> +{
> +   const struct list_head *list = >ctx_list;
> +   u64 total = atomic64_read(>past_runtime[class]);
> +   struct i915_gem_context *ctx;
> +
> +   rcu_read_lock();
> +   list_for_each_entry_rcu(ctx, list, client_link)
> +   total += busy_add(ctx, class);
> +   

Re: [PATCH] drm/panel: add rotation support for Elida KD35T133 panels

2021-05-20 Thread Lucas Stach
Am Freitag, dem 02.04.2021 um 15:48 -0500 schrieb Chris Morgan:
> Update the panel to allow setting the rotation value in device tree.
> Tested on an Odroid Go Advance, where the panel is by default rotated 270
> degrees.
> 
> Signed-off-by: Chris Morgan 

Reviewed-by: Lucas Stach 

> ---
>  drivers/gpu/drm/panel/panel-elida-kd35t133.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panel/panel-elida-kd35t133.c 
> b/drivers/gpu/drm/panel/panel-elida-kd35t133.c
> index bc36aa3c1123..d8534406d1ef 100644
> --- a/drivers/gpu/drm/panel/panel-elida-kd35t133.c
> +++ b/drivers/gpu/drm/panel/panel-elida-kd35t133.c
> @@ -42,6 +42,7 @@ struct kd35t133 {
>   struct gpio_desc *reset_gpio;
>   struct regulator *vdd;
>   struct regulator *iovcc;
> + enum drm_panel_orientation orientation;
>   bool prepared;
>  };
>  
> @@ -216,6 +217,7 @@ static int kd35t133_get_modes(struct drm_panel *panel,
>   connector->display_info.width_mm = mode->width_mm;
>   connector->display_info.height_mm = mode->height_mm;
>   drm_mode_probed_add(connector, mode);
> + drm_connector_set_panel_orientation(connector, ctx->orientation);
>  
>   return 1;
>  }
> @@ -258,6 +260,12 @@ static int kd35t133_probe(struct mipi_dsi_device *dsi)
>   return ret;
>   }
>  
> + ret = of_drm_get_panel_orientation(dev->of_node, >orientation);
> + if (ret < 0) {
> + dev_err(dev, "%pOF: failed to get orientation %d\n", 
> dev->of_node, ret);
> + return ret;
> + }
> +
>   mipi_dsi_set_drvdata(dsi, ctx);
>  
>   ctx->dev = dev;




Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-20 Thread Jason Ekstrand
On Thu, May 20, 2021 at 5:50 AM Christian König
 wrote:
>
> Am 20.05.21 um 09:55 schrieb Daniel Vetter:
> > On Wed, May 19, 2021 at 5:48 PM Michel Dänzer  wrote:
> >> On 2021-05-19 5:21 p.m., Jason Ekstrand wrote:
> >>> On Wed, May 19, 2021 at 5:52 AM Michel Dänzer  wrote:
>  On 2021-05-19 12:06 a.m., Jason Ekstrand wrote:
> > On Tue, May 18, 2021 at 4:17 PM Daniel Vetter  wrote:
> >> On Tue, May 18, 2021 at 7:40 PM Christian König
> >>  wrote:
> >>> Am 18.05.21 um 18:48 schrieb Daniel Vetter:
>  On Tue, May 18, 2021 at 2:49 PM Christian König
>   wrote:
> 
> > And as long as we are all inside amdgpu we also don't have any 
> > oversync,
> > the issue only happens when we share dma-bufs with i915 (radeon and
> > AFAIK nouveau does the right thing as well).
>  Yeah because then you can't use the amdgpu dma_resv model anymore and
>  have to use the one atomic helpers use. Which is also the one that
>  e.g. Jason is threathening to bake in as uapi with his dma_buf ioctl,
>  so as soon as that lands and someone starts using it, something has 
>  to
>  adapt _anytime_ you have a dma-buf hanging around. Not just when it's
>  shared with another device.
> >>> Yeah, and that is exactly the reason why I will NAK this uAPI change.
> >>>
> >>> This doesn't works for amdgpu at all for the reasons outlined above.
> >> Uh that's really not how uapi works. "my driver is right, everyone
> >> else is wrong" is not how cross driver contracts are defined. If that
> >> means a perf impact until you've fixed your rules, that's on you.
> >>
> >> Also you're a few years too late with nacking this, it's already uapi
> >> in the form of the dma-buf poll() support.
> > ^^  My fancy new ioctl doesn't expose anything that isn't already
> > there.  It just lets you take a snap-shot of a wait instead of doing
> > an active wait which might end up with more fences added depending on
> > interrupts and retries.  The dma-buf poll waits on all fences for
> > POLLOUT and only the exclusive fence for POLLIN.  It's already uAPI.
>  Note that the dma-buf poll support could be useful to Wayland 
>  compositors for the same purpose as Jason's new ioctl (only using client 
>  buffers which have finished drawing for an output frame, to avoid 
>  missing a refresh cycle due to client drawing), *if* it didn't work 
>  differently with amdgpu.
> 
>  Am I understanding correctly that Jason's new ioctl would also work 
>  differently with amdgpu as things stand currently? If so, that would be 
>  a real bummer and might hinder adoption of the ioctl by Wayland 
>  compositors.
> >>> My new ioctl has identical semantics to poll().  It just lets you take
> >>> a snapshot in time to wait on later instead of waiting on whatever
> >>> happens to be set right now.  IMO, having identical semantics to
> >>> poll() isn't something we want to change.
> >> Agreed.
> >>
> >> I'd argue then that making amdgpu poll semantics match those of other 
> >> drivers is a pre-requisite for the new ioctl, otherwise it seems unlikely 
> >> that the ioctl will be widely adopted.
> > This seems backwards, because that means useful improvements in all
> > other drivers are stalled until amdgpu is fixed.
>
> Well there is nothing to fix in amdgpu, what we need to is to come up
> with an DMA-buf implicit syncing model which works for everyone.
>
> I've pointed this problem out at FOSDEM roughly 6 years ago, before
> DMA-buf was even merged upstream and way before amdgpu even existed. And
> the response was yeah, maybe we need to look at this as well.
>
> Over the years I've mentioned now at least 5 times that this isn't going
> to work in some situations and came up with different approaches how to
> fix it.
>
> And you still have the nerves to tell me that this isn't a problem and
> we should fix amdgpu instead? Sorry, but I'm really running out of ideas
> how to explain why this isn't working for everybody.

I'm trying really hard to not fuel a flame war here but I tend to lean
Daniel's direction on this.  Stepping back from the individual needs
of amdgpu and looking at things from the PoV of Linux as a whole, AMD
being a special snowflake here is bad.  I think we have two problems:
amdgpu doesn't play by the established rules, and the rules don't work
well for amdgpu.  We need to solve BOTH problems.  Does that mean we
need to smash something into amdgpu to force it into the dma-buf model
today?  Maybe not; stuff's working well enough, I guess.  But we can't
just rewrite all the rules and break everyone else either.

> That amdgpu wants to be special is true, but it is a fundamental problem
> that we have designed the implicit sync in DMA-buf only around the needs
> of DRM drivers at that time instead of going a step back and saying hey
> what would be 

Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-20 Thread Jason Ekstrand
On Thu, May 20, 2021 at 9:18 AM Daniel Vetter  wrote:
>
> On Thu, May 20, 2021 at 10:13:38AM +0200, Michel Dänzer wrote:
> > On 2021-05-20 9:55 a.m., Daniel Vetter wrote:
> > > On Wed, May 19, 2021 at 5:48 PM Michel Dänzer  wrote:
> > >>
> > >> On 2021-05-19 5:21 p.m., Jason Ekstrand wrote:
> > >>> On Wed, May 19, 2021 at 5:52 AM Michel Dänzer  
> > >>> wrote:
> > 
> >  On 2021-05-19 12:06 a.m., Jason Ekstrand wrote:
> > > On Tue, May 18, 2021 at 4:17 PM Daniel Vetter  wrote:
> > >>
> > >> On Tue, May 18, 2021 at 7:40 PM Christian König
> > >>  wrote:
> > >>>
> > >>> Am 18.05.21 um 18:48 schrieb Daniel Vetter:
> >  On Tue, May 18, 2021 at 2:49 PM Christian König
> >   wrote:
> > 
> > > And as long as we are all inside amdgpu we also don't have any 
> > > oversync,
> > > the issue only happens when we share dma-bufs with i915 (radeon 
> > > and
> > > AFAIK nouveau does the right thing as well).
> >  Yeah because then you can't use the amdgpu dma_resv model anymore 
> >  and
> >  have to use the one atomic helpers use. Which is also the one that
> >  e.g. Jason is threathening to bake in as uapi with his dma_buf 
> >  ioctl,
> >  so as soon as that lands and someone starts using it, something 
> >  has to
> >  adapt _anytime_ you have a dma-buf hanging around. Not just when 
> >  it's
> >  shared with another device.
> > >>>
> > >>> Yeah, and that is exactly the reason why I will NAK this uAPI 
> > >>> change.
> > >>>
> > >>> This doesn't works for amdgpu at all for the reasons outlined above.
> > >>
> > >> Uh that's really not how uapi works. "my driver is right, everyone
> > >> else is wrong" is not how cross driver contracts are defined. If that
> > >> means a perf impact until you've fixed your rules, that's on you.
> > >>
> > >> Also you're a few years too late with nacking this, it's already uapi
> > >> in the form of the dma-buf poll() support.
> > >
> > > ^^  My fancy new ioctl doesn't expose anything that isn't already
> > > there.  It just lets you take a snap-shot of a wait instead of doing
> > > an active wait which might end up with more fences added depending on
> > > interrupts and retries.  The dma-buf poll waits on all fences for
> > > POLLOUT and only the exclusive fence for POLLIN.  It's already uAPI.
> > 
> >  Note that the dma-buf poll support could be useful to Wayland 
> >  compositors for the same purpose as Jason's new ioctl (only using 
> >  client buffers which have finished drawing for an output frame, to 
> >  avoid missing a refresh cycle due to client drawing), *if* it didn't 
> >  work differently with amdgpu.
> > 
> >  Am I understanding correctly that Jason's new ioctl would also work 
> >  differently with amdgpu as things stand currently? If so, that would 
> >  be a real bummer and might hinder adoption of the ioctl by Wayland 
> >  compositors.
> > >>>
> > >>> My new ioctl has identical semantics to poll().  It just lets you take
> > >>> a snapshot in time to wait on later instead of waiting on whatever
> > >>> happens to be set right now.  IMO, having identical semantics to
> > >>> poll() isn't something we want to change.
> > >>
> > >> Agreed.
> > >>
> > >> I'd argue then that making amdgpu poll semantics match those of other 
> > >> drivers is a pre-requisite for the new ioctl, otherwise it seems 
> > >> unlikely that the ioctl will be widely adopted.
> > >
> > > This seems backwards, because that means useful improvements in all
> > > other drivers are stalled until amdgpu is fixed.
> > >
> > > I think we need agreement on what the rules are, reasonable plan to
> > > get there, and then that should be enough to unblock work in the wider
> > > community. Holding the community at large hostage because one driver
> > > is different is really not great.
> >
> > I think we're in violent agreement. :) The point I was trying to make is
> > that amdgpu really needs to be fixed to be consistent with other drivers
> > ASAP.
>
> It's not that easy at all. I think best case we're looking at about a one
> year plan to get this into shape, taking into account usual release/distro
> update latencies.
>
> Best case.
>
> But also it's not a really big issue, since this shouldn't stop
> compositors from using poll on dma-buf fd or the sync_file stuff from
> Jason: The use-case for this in compositors is to avoid a single client
> stalling the entire desktop. If a driver lies by not setting the exclusive
> fence when expected, you simply don't get this stall avoidance benefit of
> misbehaving clients. But also this needs a gpu scheduler and higher
> priority for the compositor (or a lot of hw planes so you can composite
> with them alone), so it's all fairly academic issue.

That's not really the 

Re: [Linaro-mm-sig] [RFC 1/3] dma-fence: Add boost fence op

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 6:41 PM Christian König
 wrote:
>
> Am 20.05.21 um 18:34 schrieb Daniel Vetter:
> > On Thu, May 20, 2021 at 06:01:39PM +0200, Christian König wrote:
> >> Am 20.05.21 um 16:54 schrieb Rob Clark:
> >>> On Thu, May 20, 2021 at 7:11 AM Christian König
> >>>  wrote:
> 
>  Am 20.05.21 um 16:07 schrieb Rob Clark:
> > On Wed, May 19, 2021 at 11:47 PM Christian König
> >  wrote:
> >> Uff, that looks very hardware specific to me.
> > Howso?  I'm not sure I agree.. and even if it was not useful for some
> > hw, it should be useful for enough drivers (and harm no drivers), so I
> > still think it is a good idea
> >
> > The fallback plan is to go the i915 route and stop using atomic
> > helpers and do the same thing inside the driver, but that doesn't help
> > any of the cases where you have a separate kms and gpu driver.
>  Yeah, that's certainly not something we want.
> 
> >> As far as I can see you can also implement completely inside the 
> >> backend
> >> by starting a timer on enable_signaling, don't you?
> > Not really.. I mean, the fact that something waited on a fence could
> > be a useful input signal to gpu freq governor, but it is entirely
> > insufficient..
> >
> > If the cpu is spending a lot of time waiting on a fence, cpufreq will
> > clock down so you spend less time waiting.  And no problem has been
> > solved.  You absolutely need the concept of a missed deadline, and a
> > timer doesn't give you that.
>  Ok then I probably don't understand the use case here.
> 
>  What exactly do you try to solve?
> >>> Basically situations where you are ping-ponging between GPU and CPU..
> >>> for example if you are double buffering instead of triple buffering,
> >>> and doing vblank sync'd pageflips.  The GPU, without any extra signal,
> >>> could get stuck at 30fps and a low gpu freq, because it ends up idle
> >>> while waiting for an extra vblank cycle for the next back-buffer to
> >>> become available.  Whereas if it boosted up to a higher freq and
> >>> stopped missing a vblank deadline, it would be less idle due to
> >>> getting the next back-buffer sooner (due to not missing a vblank
> >>> deadline).
> >> Ok the is the why, but what about the how?
> >>
> >> How does it help to have this boost callback and not just start a time on
> >> enable signaling and stop it when the signal arrives?
> > Because the render side (or drm/scheduler, if msm would use that) has no
> > idea for which vblank a rendering actually is for.
>
> AH! So we are basically telling the fence backend that we have just
> missed an event we waited for.
>
> So what we want to know is how long the frontend wanted to wait instead
> of how long the backend took for rendering.

tbh I'm not sure the timestamp matters at all. What we do in i915 is
boost quite aggressively, and then let the usual clock tuning wittle
it down if we overshot. Plus soom cool-down to prevent
abuse/continuous boosting. I think we also differentiate between
display boost and userspace waits.

On the display side we also wait until the vblank has passed we aimed
for (atm always the next, we don't have target_frame support like
amdgpu), to avoid boosting when there's no point.

> > So boosting right when you've missed your frame (not what Rob implements
> > currently, but fixable) is the right semantics.
> >
> > The other issue is that for cpu waits, we want to differentiate from fence
> > waits that userspace does intentially (e.g. wait ioctl) and waits that
> > random other things are doing within the kernel to keep track of progress.
> >
> > For the former we know that userspace is stuck waiting for the gpu, and we
> > probably want to boost. For the latter we most definitely do _not_ want to
> > boost.
> >
> > Otoh I do agree with you that the current api is a bit awkward, so perhaps
> > we do need a dma_fence_userspace_wait wrapper which boosts automatically
> > after a bit. And similarly perhaps a drm_vblank_dma_fence_wait, where you
> > give it a vblank target, and if the fence isn't signalled by then, we kick
> > it real hard.
>
> Yeah, something like an use case driven API would be nice to have.
>
> For this particular case I suggest that we somehow extend the enable
> signaling callback.
>
> > But otherwise yes this is absolutely a thing that matters a ton. If you
> > look at Matt Brost's scheduler rfc, there's also a line item in there
> > about adding this kind of boosting to drm/scheduler.
>
> BTW: I still can't see this in my inbox.

You've replied already:

https://lore.kernel.org/dri-devel/20210518235830.133834-1-matthew.br...@intel.com/

It's just the big picture plan of what areas we're all trying to
tackle with some why, so that everyone knows what's coming in the next
half year at least. Probably longer until this is all sorted. I think
Matt has some poc hacked-up pile, but nothing really to show.
-Daniel

> Do you have 

Re: [RFC PATCH 04/97] drm/i915/guc: skip disabling CTBs before sanitizing the GuC

2021-05-20 Thread Matthew Brost
On Thu, May 06, 2021 at 12:13:18PM -0700, Matthew Brost wrote:
> From: Daniele Ceraolo Spurio 
> 
> If we're about to sanitize the GuC, something might have going wrong
> beforehand, so we should avoid trying to talk to it. Even if GuC is
> still running fine, the sanitize will reset its internal state and clear
> the CTB registration, so there is still no need to explicitly do so.
> 
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/2469
> Signed-off-by: Daniele Ceraolo Spurio 
> Signed-off-by: Matthew Brost 
> Cc: Michal Wajdeczko 
> Cc: John Harrison 

Reviewed-by: Matthew Brost 

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 6abb8f2dc33d..892c1315ce49 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -504,7 +504,7 @@ static int __uc_init_hw(struct intel_uc *uc)
>  
>   ret = intel_guc_sample_forcewake(guc);
>   if (ret)
> - goto err_communication;
> + goto err_log_capture;
>  
>   if (intel_uc_uses_guc_submission(uc))
>   intel_guc_submission_enable(guc);
> @@ -529,8 +529,6 @@ static int __uc_init_hw(struct intel_uc *uc)
>   /*
>* We've failed to load the firmware :(
>*/
> -err_communication:
> - guc_disable_communication(guc);
>  err_log_capture:
>   __uc_capture_load_err_log(uc);
>  err_out:
> @@ -558,9 +556,6 @@ static void __uc_fini_hw(struct intel_uc *uc)
>   if (intel_uc_uses_guc_submission(uc))
>   intel_guc_submission_disable(guc);
>  
> - if (guc_communication_enabled(guc))
> - guc_disable_communication(guc);
> -
>   __uc_sanitize(uc);
>  }
>  
> @@ -577,7 +572,6 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
>   if (!intel_guc_is_ready(guc))
>   return;
>  
> - guc_disable_communication(guc);
>   __uc_sanitize(uc);
>  }
>  
> -- 
> 2.28.0
> 


Re: [PATCH 0/4] drm/vc4: Add support for the BCM2711 VEC

2021-05-20 Thread Dave Stevenson
Hi Maxime

On Thu, 20 May 2021 at 16:03, Maxime Ripard  wrote:
>
> Hi,
>
> The composite output in the BCM2711 is dealt using the VEC. While the earlier
> SoCs were properly supported, it wasn't functional on the BCM2711. Add the
> needed support from the RPi downstream kernel.

Thanks for upstreaming these.
As far as I'm concerned they're all good, but DT bindings aren't an
area of expertise for me.

Patches 1&3:
Reviewed-by: Dave Stevenson 
Patches 2&4:
Acked-by: Dave Stevenson 

This is going to need firmware from 23rd March 2021 or later in order
to ensure that the VEC can use an integer divider from the PLL. PLLC
was rejigged to run at 2592MHz so that /24 gives the VEC the 108MHz
clock it requires. (Before that it needed a special config.txt option
to set the PLLs appropriately).

 Dave

> Maxime
>
> Mateusz Kwiatkowski (4):
>   drm/vc4: Fix clock source for VEC PixelValve on BCM2711
>   dt-bindings: display: bcm2835-vec: Add BCM2711 compatible
>   drm/vc4: Separate VEC compatible variants
>   ARM: boot: dts: bcm2711: Add BCM2711 VEC compatible
>
>  .../bindings/display/brcm,bcm2835-vec.yaml|  4 ++-
>  arch/arm/boot/dts/bcm2711.dtsi|  1 +
>  drivers/gpu/drm/vc4/vc4_crtc.c|  2 +-
>  drivers/gpu/drm/vc4/vc4_vec.c | 27 +++
>  4 files changed, 27 insertions(+), 7 deletions(-)
>
> --
> 2.31.1
>


Re: [Linaro-mm-sig] [RFC 1/3] dma-fence: Add boost fence op

2021-05-20 Thread Christian König

Am 20.05.21 um 18:34 schrieb Daniel Vetter:

On Thu, May 20, 2021 at 06:01:39PM +0200, Christian König wrote:

Am 20.05.21 um 16:54 schrieb Rob Clark:

On Thu, May 20, 2021 at 7:11 AM Christian König
 wrote:


Am 20.05.21 um 16:07 schrieb Rob Clark:

On Wed, May 19, 2021 at 11:47 PM Christian König
 wrote:

Uff, that looks very hardware specific to me.

Howso?  I'm not sure I agree.. and even if it was not useful for some
hw, it should be useful for enough drivers (and harm no drivers), so I
still think it is a good idea

The fallback plan is to go the i915 route and stop using atomic
helpers and do the same thing inside the driver, but that doesn't help
any of the cases where you have a separate kms and gpu driver.

Yeah, that's certainly not something we want.


As far as I can see you can also implement completely inside the backend
by starting a timer on enable_signaling, don't you?

Not really.. I mean, the fact that something waited on a fence could
be a useful input signal to gpu freq governor, but it is entirely
insufficient..

If the cpu is spending a lot of time waiting on a fence, cpufreq will
clock down so you spend less time waiting.  And no problem has been
solved.  You absolutely need the concept of a missed deadline, and a
timer doesn't give you that.

Ok then I probably don't understand the use case here.

What exactly do you try to solve?

Basically situations where you are ping-ponging between GPU and CPU..
for example if you are double buffering instead of triple buffering,
and doing vblank sync'd pageflips.  The GPU, without any extra signal,
could get stuck at 30fps and a low gpu freq, because it ends up idle
while waiting for an extra vblank cycle for the next back-buffer to
become available.  Whereas if it boosted up to a higher freq and
stopped missing a vblank deadline, it would be less idle due to
getting the next back-buffer sooner (due to not missing a vblank
deadline).

Ok the is the why, but what about the how?

How does it help to have this boost callback and not just start a time on
enable signaling and stop it when the signal arrives?

Because the render side (or drm/scheduler, if msm would use that) has no
idea for which vblank a rendering actually is for.


AH! So we are basically telling the fence backend that we have just 
missed an event we waited for.


So what we want to know is how long the frontend wanted to wait instead 
of how long the backend took for rendering.



So boosting right when you've missed your frame (not what Rob implements
currently, but fixable) is the right semantics.

The other issue is that for cpu waits, we want to differentiate from fence
waits that userspace does intentially (e.g. wait ioctl) and waits that
random other things are doing within the kernel to keep track of progress.

For the former we know that userspace is stuck waiting for the gpu, and we
probably want to boost. For the latter we most definitely do _not_ want to
boost.

Otoh I do agree with you that the current api is a bit awkward, so perhaps
we do need a dma_fence_userspace_wait wrapper which boosts automatically
after a bit. And similarly perhaps a drm_vblank_dma_fence_wait, where you
give it a vblank target, and if the fence isn't signalled by then, we kick
it real hard.


Yeah, something like an use case driven API would be nice to have.

For this particular case I suggest that we somehow extend the enable 
signaling callback.



But otherwise yes this is absolutely a thing that matters a ton. If you
look at Matt Brost's scheduler rfc, there's also a line item in there
about adding this kind of boosting to drm/scheduler.


BTW: I still can't see this in my inbox.

Do you have a link?

Christian.


-Daniel



Regards,
Christian.


BR,
-R


Thanks,
Christian.


BR,
-R


Christian.

Am 19.05.21 um 20:38 schrieb Rob Clark:

From: Rob Clark 

Add a way to hint to the fence signaler that a fence waiter has missed a
deadline waiting on the fence.

In some cases, missing a vblank can result in lower gpu utilization,
when really we want to go in the opposite direction and boost gpu freq.
The boost callback gives some feedback to the fence signaler that we
are missing deadlines, so it can take this into account in it's freq/
utilization calculations.

Signed-off-by: Rob Clark 
---
 include/linux/dma-fence.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 9f12efaaa93a..172702521acc 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -231,6 +231,17 @@ struct dma_fence_ops {
 signed long (*wait)(struct dma_fence *fence,
 bool intr, signed long timeout);

+ /**
+  * @boost:
+  *
+  * Optional callback, to indicate that a fence waiter missed a deadline.
+  * This can serve as a signal that (if possible) whatever signals the
+  * fence should boost it's clocks.
+  *
+  * This can be called in any 

Re: [Linaro-mm-sig] [RFC 1/3] dma-fence: Add boost fence op

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 06:01:39PM +0200, Christian König wrote:
> Am 20.05.21 um 16:54 schrieb Rob Clark:
> > On Thu, May 20, 2021 at 7:11 AM Christian König
> >  wrote:
> > > 
> > > 
> > > Am 20.05.21 um 16:07 schrieb Rob Clark:
> > > > On Wed, May 19, 2021 at 11:47 PM Christian König
> > > >  wrote:
> > > > > Uff, that looks very hardware specific to me.
> > > > Howso?  I'm not sure I agree.. and even if it was not useful for some
> > > > hw, it should be useful for enough drivers (and harm no drivers), so I
> > > > still think it is a good idea
> > > > 
> > > > The fallback plan is to go the i915 route and stop using atomic
> > > > helpers and do the same thing inside the driver, but that doesn't help
> > > > any of the cases where you have a separate kms and gpu driver.
> > > Yeah, that's certainly not something we want.
> > > 
> > > > > As far as I can see you can also implement completely inside the 
> > > > > backend
> > > > > by starting a timer on enable_signaling, don't you?
> > > > Not really.. I mean, the fact that something waited on a fence could
> > > > be a useful input signal to gpu freq governor, but it is entirely
> > > > insufficient..
> > > > 
> > > > If the cpu is spending a lot of time waiting on a fence, cpufreq will
> > > > clock down so you spend less time waiting.  And no problem has been
> > > > solved.  You absolutely need the concept of a missed deadline, and a
> > > > timer doesn't give you that.
> > > Ok then I probably don't understand the use case here.
> > > 
> > > What exactly do you try to solve?
> > Basically situations where you are ping-ponging between GPU and CPU..
> > for example if you are double buffering instead of triple buffering,
> > and doing vblank sync'd pageflips.  The GPU, without any extra signal,
> > could get stuck at 30fps and a low gpu freq, because it ends up idle
> > while waiting for an extra vblank cycle for the next back-buffer to
> > become available.  Whereas if it boosted up to a higher freq and
> > stopped missing a vblank deadline, it would be less idle due to
> > getting the next back-buffer sooner (due to not missing a vblank
> > deadline).
> 
> Ok the is the why, but what about the how?
> 
> How does it help to have this boost callback and not just start a time on
> enable signaling and stop it when the signal arrives?

Because the render side (or drm/scheduler, if msm would use that) has no
idea for which vblank a rendering actually is for.

So boosting right when you've missed your frame (not what Rob implements
currently, but fixable) is the right semantics.

The other issue is that for cpu waits, we want to differentiate from fence
waits that userspace does intentially (e.g. wait ioctl) and waits that
random other things are doing within the kernel to keep track of progress.

For the former we know that userspace is stuck waiting for the gpu, and we
probably want to boost. For the latter we most definitely do _not_ want to
boost.

Otoh I do agree with you that the current api is a bit awkward, so perhaps
we do need a dma_fence_userspace_wait wrapper which boosts automatically
after a bit. And similarly perhaps a drm_vblank_dma_fence_wait, where you
give it a vblank target, and if the fence isn't signalled by then, we kick
it real hard.

But otherwise yes this is absolutely a thing that matters a ton. If you
look at Matt Brost's scheduler rfc, there's also a line item in there
about adding this kind of boosting to drm/scheduler.
-Daniel


> 
> Regards,
> Christian.
> 
> > 
> > BR,
> > -R
> > 
> > > Thanks,
> > > Christian.
> > > 
> > > > BR,
> > > > -R
> > > > 
> > > > > Christian.
> > > > > 
> > > > > Am 19.05.21 um 20:38 schrieb Rob Clark:
> > > > > > From: Rob Clark 
> > > > > > 
> > > > > > Add a way to hint to the fence signaler that a fence waiter has 
> > > > > > missed a
> > > > > > deadline waiting on the fence.
> > > > > > 
> > > > > > In some cases, missing a vblank can result in lower gpu utilization,
> > > > > > when really we want to go in the opposite direction and boost gpu 
> > > > > > freq.
> > > > > > The boost callback gives some feedback to the fence signaler that we
> > > > > > are missing deadlines, so it can take this into account in it's 
> > > > > > freq/
> > > > > > utilization calculations.
> > > > > > 
> > > > > > Signed-off-by: Rob Clark 
> > > > > > ---
> > > > > > include/linux/dma-fence.h | 26 ++
> > > > > > 1 file changed, 26 insertions(+)
> > > > > > 
> > > > > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > > > > index 9f12efaaa93a..172702521acc 100644
> > > > > > --- a/include/linux/dma-fence.h
> > > > > > +++ b/include/linux/dma-fence.h
> > > > > > @@ -231,6 +231,17 @@ struct dma_fence_ops {
> > > > > > signed long (*wait)(struct dma_fence *fence,
> > > > > > bool intr, signed long timeout);
> > > > > > 
> > > > > > + /**
> > > > > > +  * @boost:
> > > > > > +  *
> > > > > > +  * 

Re: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

2021-05-20 Thread Christian König

Yeah, having the timestamp is a good idea as well.

  drm-driver: i915

I think we should rather add something like printing 
file_operations->owner->name to the common fdinfo code.


This way we would have something common for all drivers in the system. 
I'm just not sure if that also works if they are compiled into the kernel.


Regards,
Christian.

Am 20.05.21 um 18:26 schrieb Nieto, David M:


[AMD Official Use Only]


i would like to add a unit marker for the stats that we monitor in the 
fd, as we discussed currently we are displaying the usage percentage, 
because we wanted to to provide single query percentages, but this may 
evolve with time.


May I suggest to add two new fields

drm-stat-interval: <64 bit> ns
drm-stat-timestamp: <64 bit> ns

If interval is set, engine utilization is calculated by doing render> = 100*/
if interval is not set, two reads are needed :  = 
100* / drm-stat-timestamp0>



Regards,

David



*From:* Tvrtko Ursulin 
*Sent:* Thursday, May 20, 2021 8:12 AM
*To:* intel-...@lists.freedesktop.org 
*Cc:* dri-devel@lists.freedesktop.org 
; Tvrtko Ursulin 
; Nieto, David M ; 
Koenig, Christian ; Daniel Vetter 

*Subject:* [RFC 7/7] drm/i915: Expose client engine utilisation via 
fdinfo

From: Tvrtko Ursulin 

Similar to AMD commit
874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
infrastructure added in previous patches, we add basic client info
and GPU engine utilisation for i915.

Example of the output:

  pos:    0
  flags:  012
  mnt_id: 21
  drm-driver: i915
  drm-pdev:   :00:02.0
  drm-client-id:  7
  drm-engine-render:  9288864723 ns
  drm-engine-copy:    2035071108 ns
  drm-engine-video:   0 ns
  drm-engine-video-enhance:   0 ns

DRM related fields are appropriately prefixed for easy parsing and
separation from generic fdinfo fields.

Idea is for some fields to become either fully or partially standardised
in order to enable writting of generic top-like tools.

Initial proposal for fully standardised common fields:

 drm-driver: 
 drm-pdev: 

Optional fully standardised:

 drm-client-id: 

Optional partially standardised:

 engine-:  ns
 memory-:  KiB

Once agreed the format would need to go to some README or kerneldoc in
DRM core.

Signed-off-by: Tvrtko Ursulin 
Cc: David M Nieto 
Cc: Christian König 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_drm_client.c | 68 ++
 drivers/gpu/drm/i915/i915_drm_client.h |  4 ++
 drivers/gpu/drm/i915/i915_drv.c    |  3 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c

index 1e5db7753276..5e9cfba1116b 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -9,6 +9,11 @@

 #include 

+#include 
+
+#include "gem/i915_gem_context.h"
+#include "gt/intel_engine_user.h"
+
 #include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_gem.h"
@@ -168,3 +173,66 @@ void i915_drm_clients_fini(struct 
i915_drm_clients *clients)


 xa_destroy(>xarray);
 }
+
+#ifdef CONFIG_PROC_FS
+static const char * const uabi_class_names[] = {
+   [I915_ENGINE_CLASS_RENDER] = "render",
+   [I915_ENGINE_CLASS_COPY] = "copy",
+   [I915_ENGINE_CLASS_VIDEO] = "video",
+   [I915_ENGINE_CLASS_VIDEO_ENHANCE] = "video-enhance",
+};
+
+static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
+{
+   struct i915_gem_engines_iter it;
+   struct intel_context *ce;
+   u64 total = 0;
+
+   for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
+   if (ce->engine->uabi_class != class)
+   continue;
+
+   total += intel_context_get_total_runtime_ns(ce);
+   }
+
+   return total;
+}
+
+static void
+show_client_class(struct seq_file *m,
+ struct i915_drm_client *client,
+ unsigned int class)
+{
+   const struct list_head *list = >ctx_list;
+   u64 total = atomic64_read(>past_runtime[class]);
+   struct i915_gem_context *ctx;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(ctx, list, client_link)
+   total += busy_add(ctx, class);
+   rcu_read_unlock();
+
+   return seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+ uabi_class_names[class], total);
+}
+
+void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct drm_file *file = f->private_data;
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+   struct drm_i915_private *i915 = file_priv->dev_priv;
+   struct i915_drm_client *client = file_priv->client;
+   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+   unsigned int i;
+
+   seq_printf(m, "drm-driver:\ti915\n");
+   seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
+  pci_domain_nr(pdev->bus), pdev->bus->number,
+ 

Re: [RFC 2/3] drm/atomic: Call dma_fence_boost() when we've missed a vblank

2021-05-20 Thread Daniel Vetter
On Wed, May 19, 2021 at 11:38:53AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index 560aaecba31b..fe10fc2e7f86 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1435,11 +1435,15 @@ int drm_atomic_helper_wait_for_fences(struct 
> drm_device *dev,
>   int i, ret;
>  
>   for_each_new_plane_in_state(state, plane, new_plane_state, i) {
> + u64 vblank_count;
> +
>   if (!new_plane_state->fence)
>   continue;
>  
>   WARN_ON(!new_plane_state->fb);
>  
> + vblank_count = drm_crtc_vblank_count(new_plane_state->crtc);
> +
>   /*
>* If waiting for fences pre-swap (ie: nonblock), userspace can
>* still interrupt the operation. Instead of blocking until the
> @@ -1449,6 +1453,13 @@ int drm_atomic_helper_wait_for_fences(struct 
> drm_device *dev,
>   if (ret)
>   return ret;
>  
> + /*
> +  * Check if we've missed a vblank while waiting, and if we have
> +  * signal the fence that it's signaler should be boosted
> +  */
> + if (vblank_count != 
> drm_crtc_vblank_count(new_plane_state->crtc))
> + dma_fence_boost(new_plane_state->fence);

I think we should do a lot better here:
- maybe only bother doing this for single-crtc updates, and only if
  modeset isn't set. No one else cares about latency.

- We should boost _right_ when we've missed the frame, so I think we
  should have a _timeout wait here that guesstimates when the vblank is
  over (might need to throw in a vblank wait if we missed) and then boost
  immediately. Not wait a bunch of frames (worst case) until we finally
  decide to boost.

Otherwise I really like this, I think it's about the only real reason i915
isn't using atomic helpers.

Also adding Matt B for this topic.
-Daniel

> +
>   dma_fence_put(new_plane_state->fence);
>   new_plane_state->fence = NULL;
>   }
> -- 
> 2.30.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

2021-05-20 Thread Nieto, David M
[AMD Official Use Only]

i would like to add a unit marker for the stats that we monitor in the fd, as 
we discussed currently we are displaying the usage percentage, because we 
wanted to to provide single query percentages, but this may evolve with time.

May I suggest to add two new fields

drm-stat-interval: <64 bit> ns
drm-stat-timestamp: <64 bit> ns

If interval is set, engine utilization is calculated by doing  = 
100*/
if interval is not set, two reads are needed :  = 
100* / 


Regards,

David



From: Tvrtko Ursulin 
Sent: Thursday, May 20, 2021 8:12 AM
To: intel-...@lists.freedesktop.org 
Cc: dri-devel@lists.freedesktop.org ; Tvrtko 
Ursulin ; Nieto, David M ; 
Koenig, Christian ; Daniel Vetter 
Subject: [RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

From: Tvrtko Ursulin 

Similar to AMD commit
874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
infrastructure added in previous patches, we add basic client info
and GPU engine utilisation for i915.

Example of the output:

  pos:0
  flags:  012
  mnt_id: 21
  drm-driver: i915
  drm-pdev:   :00:02.0
  drm-client-id:  7
  drm-engine-render:  9288864723 ns
  drm-engine-copy:2035071108 ns
  drm-engine-video:   0 ns
  drm-engine-video-enhance:   0 ns

DRM related fields are appropriately prefixed for easy parsing and
separation from generic fdinfo fields.

Idea is for some fields to become either fully or partially standardised
in order to enable writting of generic top-like tools.

Initial proposal for fully standardised common fields:

 drm-driver: 
 drm-pdev: 

Optional fully standardised:

 drm-client-id: 

Optional partially standardised:

 engine-:  ns
 memory-:  KiB

Once agreed the format would need to go to some README or kerneldoc in
DRM core.

Signed-off-by: Tvrtko Ursulin 
Cc: David M Nieto 
Cc: Christian König 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_drm_client.c | 68 ++
 drivers/gpu/drm/i915/i915_drm_client.h |  4 ++
 drivers/gpu/drm/i915/i915_drv.c|  3 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index 1e5db7753276..5e9cfba1116b 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -9,6 +9,11 @@

 #include 

+#include 
+
+#include "gem/i915_gem_context.h"
+#include "gt/intel_engine_user.h"
+
 #include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_gem.h"
@@ -168,3 +173,66 @@ void i915_drm_clients_fini(struct i915_drm_clients 
*clients)

 xa_destroy(>xarray);
 }
+
+#ifdef CONFIG_PROC_FS
+static const char * const uabi_class_names[] = {
+   [I915_ENGINE_CLASS_RENDER] = "render",
+   [I915_ENGINE_CLASS_COPY] = "copy",
+   [I915_ENGINE_CLASS_VIDEO] = "video",
+   [I915_ENGINE_CLASS_VIDEO_ENHANCE] = "video-enhance",
+};
+
+static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
+{
+   struct i915_gem_engines_iter it;
+   struct intel_context *ce;
+   u64 total = 0;
+
+   for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
+   if (ce->engine->uabi_class != class)
+   continue;
+
+   total += intel_context_get_total_runtime_ns(ce);
+   }
+
+   return total;
+}
+
+static void
+show_client_class(struct seq_file *m,
+ struct i915_drm_client *client,
+ unsigned int class)
+{
+   const struct list_head *list = >ctx_list;
+   u64 total = atomic64_read(>past_runtime[class]);
+   struct i915_gem_context *ctx;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(ctx, list, client_link)
+   total += busy_add(ctx, class);
+   rcu_read_unlock();
+
+   return seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+ uabi_class_names[class], total);
+}
+
+void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct drm_file *file = f->private_data;
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+   struct drm_i915_private *i915 = file_priv->dev_priv;
+   struct i915_drm_client *client = file_priv->client;
+   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+   unsigned int i;
+
+   seq_printf(m, "drm-driver:\ti915\n");
+   seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
+  pci_domain_nr(pdev->bus), pdev->bus->number,
+  PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+
+   seq_printf(m, "drm-client-id:\t%u\n", client->id);
+
+   for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
+   show_client_class(m, client, i);
+}
+#endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
b/drivers/gpu/drm/i915/i915_drm_client.h
index b2b69d6985e4..9885002433a0 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -98,6 +98,10 @@ 

Re: [RFC 1/3] dma-fence: Add boost fence op

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 4:03 PM Rob Clark  wrote:
>
> On Wed, May 19, 2021 at 11:47 PM Christian König
>  wrote:
> >
> > Uff, that looks very hardware specific to me.
>
> Howso?  I'm not sure I agree.. and even if it was not useful for some
> hw, it should be useful for enough drivers (and harm no drivers), so I
> still think it is a good idea
>
> The fallback plan is to go the i915 route and stop using atomic
> helpers and do the same thing inside the driver, but that doesn't help
> any of the cases where you have a separate kms and gpu driver.

Don't, because the i915 plan is to actually move towards drm/scheduler
and atomic helpers.

> > As far as I can see you can also implement completely inside the backend
> > by starting a timer on enable_signaling, don't you?
>
> Not really.. I mean, the fact that something waited on a fence could
> be a useful input signal to gpu freq governor, but it is entirely
> insufficient..
>
> If the cpu is spending a lot of time waiting on a fence, cpufreq will
> clock down so you spend less time waiting.  And no problem has been
> solved.  You absolutely need the concept of a missed deadline, and a
> timer doesn't give you that.

Yup agreed.

Adding Matt Brost, since he's planning all this boostback work.
-Daniel

>
> BR,
> -R
>
> > Christian.
> >
> > Am 19.05.21 um 20:38 schrieb Rob Clark:
> > > From: Rob Clark 
> > >
> > > Add a way to hint to the fence signaler that a fence waiter has missed a
> > > deadline waiting on the fence.
> > >
> > > In some cases, missing a vblank can result in lower gpu utilization,
> > > when really we want to go in the opposite direction and boost gpu freq.
> > > The boost callback gives some feedback to the fence signaler that we
> > > are missing deadlines, so it can take this into account in it's freq/
> > > utilization calculations.
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > >   include/linux/dma-fence.h | 26 ++
> > >   1 file changed, 26 insertions(+)
> > >
> > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > index 9f12efaaa93a..172702521acc 100644
> > > --- a/include/linux/dma-fence.h
> > > +++ b/include/linux/dma-fence.h
> > > @@ -231,6 +231,17 @@ struct dma_fence_ops {
> > >   signed long (*wait)(struct dma_fence *fence,
> > >   bool intr, signed long timeout);
> > >
> > > + /**
> > > +  * @boost:
> > > +  *
> > > +  * Optional callback, to indicate that a fence waiter missed a 
> > > deadline.
> > > +  * This can serve as a signal that (if possible) whatever signals 
> > > the
> > > +  * fence should boost it's clocks.
> > > +  *
> > > +  * This can be called in any context that can call dma_fence_wait().
> > > +  */
> > > + void (*boost)(struct dma_fence *fence);
> > > +
> > >   /**
> > >* @release:
> > >*
> > > @@ -586,6 +597,21 @@ static inline signed long dma_fence_wait(struct 
> > > dma_fence *fence, bool intr)
> > >   return ret < 0 ? ret : 0;
> > >   }
> > >
> > > +/**
> > > + * dma_fence_boost - hint from waiter that it missed a deadline
> > > + *
> > > + * @fence: the fence that caused the missed deadline
> > > + *
> > > + * This function gives a hint from a fence waiter that a deadline was
> > > + * missed, so that the fence signaler can factor this in to device
> > > + * power state decisions
> > > + */
> > > +static inline void dma_fence_boost(struct dma_fence *fence)
> > > +{
> > > + if (fence->ops->boost)
> > > + fence->ops->boost(fence);
> > > +}
> > > +
> > >   struct dma_fence *dma_fence_get_stub(void);
> > >   u64 dma_fence_context_alloc(unsigned num);
> > >
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Linaro-mm-sig] [RFC 1/3] dma-fence: Add boost fence op

2021-05-20 Thread Christian König

Am 20.05.21 um 16:54 schrieb Rob Clark:

On Thu, May 20, 2021 at 7:11 AM Christian König
 wrote:



Am 20.05.21 um 16:07 schrieb Rob Clark:

On Wed, May 19, 2021 at 11:47 PM Christian König
 wrote:

Uff, that looks very hardware specific to me.

Howso?  I'm not sure I agree.. and even if it was not useful for some
hw, it should be useful for enough drivers (and harm no drivers), so I
still think it is a good idea

The fallback plan is to go the i915 route and stop using atomic
helpers and do the same thing inside the driver, but that doesn't help
any of the cases where you have a separate kms and gpu driver.

Yeah, that's certainly not something we want.


As far as I can see you can also implement completely inside the backend
by starting a timer on enable_signaling, don't you?

Not really.. I mean, the fact that something waited on a fence could
be a useful input signal to gpu freq governor, but it is entirely
insufficient..

If the cpu is spending a lot of time waiting on a fence, cpufreq will
clock down so you spend less time waiting.  And no problem has been
solved.  You absolutely need the concept of a missed deadline, and a
timer doesn't give you that.

Ok then I probably don't understand the use case here.

What exactly do you try to solve?

Basically situations where you are ping-ponging between GPU and CPU..
for example if you are double buffering instead of triple buffering,
and doing vblank sync'd pageflips.  The GPU, without any extra signal,
could get stuck at 30fps and a low gpu freq, because it ends up idle
while waiting for an extra vblank cycle for the next back-buffer to
become available.  Whereas if it boosted up to a higher freq and
stopped missing a vblank deadline, it would be less idle due to
getting the next back-buffer sooner (due to not missing a vblank
deadline).


Ok the is the why, but what about the how?

How does it help to have this boost callback and not just start a time 
on enable signaling and stop it when the signal arrives?


Regards,
Christian.



BR,
-R


Thanks,
Christian.


BR,
-R


Christian.

Am 19.05.21 um 20:38 schrieb Rob Clark:

From: Rob Clark 

Add a way to hint to the fence signaler that a fence waiter has missed a
deadline waiting on the fence.

In some cases, missing a vblank can result in lower gpu utilization,
when really we want to go in the opposite direction and boost gpu freq.
The boost callback gives some feedback to the fence signaler that we
are missing deadlines, so it can take this into account in it's freq/
utilization calculations.

Signed-off-by: Rob Clark 
---
include/linux/dma-fence.h | 26 ++
1 file changed, 26 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 9f12efaaa93a..172702521acc 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -231,6 +231,17 @@ struct dma_fence_ops {
signed long (*wait)(struct dma_fence *fence,
bool intr, signed long timeout);

+ /**
+  * @boost:
+  *
+  * Optional callback, to indicate that a fence waiter missed a deadline.
+  * This can serve as a signal that (if possible) whatever signals the
+  * fence should boost it's clocks.
+  *
+  * This can be called in any context that can call dma_fence_wait().
+  */
+ void (*boost)(struct dma_fence *fence);
+
/**
 * @release:
 *
@@ -586,6 +597,21 @@ static inline signed long dma_fence_wait(struct dma_fence 
*fence, bool intr)
return ret < 0 ? ret : 0;
}

+/**
+ * dma_fence_boost - hint from waiter that it missed a deadline
+ *
+ * @fence: the fence that caused the missed deadline
+ *
+ * This function gives a hint from a fence waiter that a deadline was
+ * missed, so that the fence signaler can factor this in to device
+ * power state decisions
+ */
+static inline void dma_fence_boost(struct dma_fence *fence)
+{
+ if (fence->ops->boost)
+ fence->ops->boost(fence);
+}
+
struct dma_fence *dma_fence_get_stub(void);
u64 dma_fence_context_alloc(unsigned num);


___
Linaro-mm-sig mailing list
linaro-mm-...@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-mm-sig




Re: [Mesa-dev] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Matthew Brost
On Thu, May 20, 2021 at 01:11:59PM +0200, Christian König wrote:
> Am 19.05.21 um 18:51 schrieb Matthew Brost:
> > On Wed, May 19, 2021 at 01:45:39PM +0200, Christian König wrote:
> > > Oh, yeah we call that gang submit on the AMD side.
> > > 
> > > Had already some internal discussions how to implement this, but so far
> > > couldn't figure out how to cleanly introduce that into the DRM scheduler.
> > > 
> > > Can you briefly describe in a few words how that is supposed to work on 
> > > the
> > > Intel side?
> > > 
> > Sure, I've done a quick PoC internally and have been able to hook this
> > into the DRM scheduler.
> > 
> > Basically each BB still maps to a single job as each job is somewhat
> > unique (e.g. each job has its own ring, lrc, seqno, etc...). However all
> > the jobs configured to run in parallel map to a single sched_entity
> > which maintains the order each job was generated from the execbuf IOCTL
> > (1 - N). When the backend receives jobs 1 to N - 1 it basically just
> > updates some internal state. When the backend sees job N (last job) it
> > actually does the submit for jobs 1 - N which with GuC submission is a
> > simple command moving the LRC tail of the N jobs.
> > 
> > Daniel has suggested that we create a single job for the NN BBs but that
> > would be huge rework to the internals of the i915 and likely won't
> > happen by the time this code first lands.
> > 
> > Also worth noting one way a job isn't really a treated individually is
> > the excl slot with dma-resv. In that case we create a composite fence of
> > all jobs (dma_fence_array).
> 
> Yeah, that's something we have discussed as well.
> 
> How do you prevent the scheduler from over committing to a single ring
> buffer in this scenario?
> 

Each job has its own ring, the execbuf IOCTL throttles itself for each
job if there isn't space in the ring. This is exactly the same as
non-parallel submits.

I think this is what you were asking? If not, maybe try explaining the
question a bit more.

Matt

> Christian.
> 
> > 
> > Matt
> > 
> > > Thanks,
> > > Christian.
> > > 
> > > Am 19.05.21 um 01:58 schrieb Matthew Brost:
> > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > 
> > > > v2:
> > > >(Daniel Vetter):
> > > > - Expand logical order explaination
> > > > - Add dummy header
> > > > - Only allow N BBs in execbuf IOCTL
> > > > - Configure parallel submission per slot not per gem context
> > > > 
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Tony Ye 
> > > > CC: Carl Zhang 
> > > > Cc: Daniel Vetter 
> > > > Cc: Jason Ekstrand 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > ++
> > > >Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > >2 files changed, 196 insertions(+), 1 deletion(-)
> > > >create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > 
> > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > new file mode 100644
> > > > index ..8c64b983ccad
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > @@ -0,0 +1,144 @@
> > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > i915_context_engines_parallel_submit */
> > > > +
> > > > +/*
> > > > + * i915_context_engines_parallel_submit:
> > > > + *
> > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > execbuf IOCTL.
> > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > Multiple
> > > > + * hardware contexts are created internally in the i915 run these BBs. 
> > > > Once a
> > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > execbuf
> > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell the 
> > > > execbuf
> > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > are based on
> > > > + * the slots configuration).
> > > > + *
> > > > + * Their are two currently defined ways to control the placement of the
> > > > + * hardware contexts on physical engines: default behavior (no flags) 
> > > > and
> > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > in the
> > > > + * future as new hardware / use cases arise. Details of how to use this
> > > > + * interface below above the flags.
> > > > + *
> > > > + * Returns -EINVAL if hardware context placement configuration invalid 
> > > > or if the
> > > > + * placement configuration isn't supported on the platform / submission
> > > > + * interface.
> > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > submission
> > > > + * inteface.
> > > > + */
> > > > +struct i915_context_engines_parallel_submit {
> > > > +   struct i915_user_extension base;
> > > > +
> > > > +   __u16 engine_index; /* slot for parallel engine */
> 

Re: [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Matthew Brost
On Thu, May 20, 2021 at 11:54:25AM +0200, Daniel Vetter wrote:
> On Wed, May 19, 2021 at 7:19 PM Matthew Brost  wrote:
> >
> > On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > >
> > > > v2:
> > > >  (Daniel Vetter):
> > > >   - Expand logical order explaination
> > > >   - Add dummy header
> > > >   - Only allow N BBs in execbuf IOCTL
> > > >   - Configure parallel submission per slot not per gem context
> > > >
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Tony Ye 
> > > > CC: Carl Zhang 
> > > > Cc: Daniel Vetter 
> > > > Cc: Jason Ekstrand 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
> > > >  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > >  2 files changed, 196 insertions(+), 1 deletion(-)
> > > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > >
> > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > new file mode 100644
> > > > index ..8c64b983ccad
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > @@ -0,0 +1,144 @@
> > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > i915_context_engines_parallel_submit */
> > > > +
> > > > +/*
> > > > + * i915_context_engines_parallel_submit:
> > > > + *
> > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > execbuf IOCTL.
> > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > Multiple
> > > > + * hardware contexts are created internally in the i915 run these BBs. 
> > > > Once a
> > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > execbuf
> > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell the 
> > > > execbuf
> > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > are based on
> > > > + * the slots configuration).
> > > > + *
> > > > + * Their are two currently defined ways to control the placement of the
> > > > + * hardware contexts on physical engines: default behavior (no flags) 
> > > > and
> > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > in the
> > > > + * future as new hardware / use cases arise. Details of how to use this
> > > > + * interface below above the flags.
> > > > + *
> > > > + * Returns -EINVAL if hardware context placement configuration invalid 
> > > > or if the
> > > > + * placement configuration isn't supported on the platform / submission
> > > > + * interface.
> > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > submission
> > > > + * inteface.
> > > > + */
> > > > +struct i915_context_engines_parallel_submit {
> > > > +   struct i915_user_extension base;
> > > > +
> > > > +   __u16 engine_index; /* slot for parallel engine */
> > > > +   __u16 width;/* number of contexts per parallel engine */
> > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > +   __u16 mbz16;
> > >
> > > Ok the big picture looks reasonable now, the flags still confuse me.
> > >
> >
> > Yea, it is a bit confusing.
> >
> > > > +/*
> > > > + * Default placement behvavior (currently unsupported):
> > > > + *
> > > > + * Rather than restricting parallel submission to a single class with a
> > > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a 
> > > > mode that
> > > > + * enables parallel submission across multiple engine classes. In this 
> > > > case each
> > > > + * context's logical engine mask indicates where that context can 
> > > > placed. It is
> > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > placement (e.g.
> > > > + * if one context is running CS0 no other contexts can run on CS0).
> > > > + *
> > > > + * Example 1 pseudo code:
> > > > + * CSX[Y] = engine class X, logical instance Y
> > > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > > + * set_engines(INVALID)
> > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > > + *
> > > > + * Results in the following valid placements:
> > > > + * CS0[0], CS1[0]
> > > > + * CS0[0], CS1[1]
> > > > + * CS0[1], CS1[0]
> > > > + * CS0[1], CS1[1]
> > > > + *
> > > > + * This can also be though of as 2 virtual engines:
> > > > + * VE[0] = CS0[0], CS0[1]
> > > > + * VE[1] = CS1[0], CS1[1]
> > > > + *
> > > > + * Example 2 pseudo code:
> > > > + * CS[X] = generic engine of same class, logical instance X
> > > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > > + * set_engines(INVALID)
> > > > + * set_parallel(engine_index=0, width=2, num_siblings=3,
> > > > + * 

Re: [RFC PATCH 5/5] drm/ttm, drm/amdgpu: Allow the driver some control over swapping

2021-05-20 Thread Thomas Hellström
On Thu, 2021-05-20 at 17:09 +0200, Thomas Hellström wrote:
> 
> +EXPORT_SYMBOL(ttm_tt_unpopulate);

Oh, this one was a leftover. It's not meant to be included anymore.

/Thomas


>  
>  #ifdef CONFIG_DEBUG_FS
>  




[RFC 0/7] Per client engine busyness

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Continuing on the identically named thread. First six patches are i915 specific
so please mostly look at the last one only which discusses the common options
for DRM drivers.

I haven't updated intel_gpu_top to use this yet so can't report any performance
numbers.

Tvrtko Ursulin (7):
  drm/i915: Explicitly track DRM clients
  drm/i915: Update client name on context create
  drm/i915: Make GEM contexts track DRM clients
  drm/i915: Track runtime spent in closed and unreachable GEM contexts
  drm/i915: Track all user contexts per client
  drm/i915: Track context current active time
  drm/i915: Expose client engine utilisation via fdinfo

 drivers/gpu/drm/i915/Makefile |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 -
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
 drivers/gpu/drm/i915/gt/intel_context.c   |  27 +-
 drivers/gpu/drm/i915/gt/intel_context.h   |  15 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c|   4 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |  27 +-
 drivers/gpu/drm/i915/gt/intel_lrc.h   |  24 ++
 drivers/gpu/drm/i915/gt/selftest_lrc.c|  10 +-
 drivers/gpu/drm/i915/i915_drm_client.c| 238 ++
 drivers/gpu/drm/i915/i915_drm_client.h| 107 
 drivers/gpu/drm/i915/i915_drv.c   |   9 +
 drivers/gpu/drm/i915/i915_drv.h   |   5 +
 drivers/gpu/drm/i915/i915_gem.c   |  21 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  31 +--
 drivers/gpu/drm/i915/i915_gpu_error.h |   2 +-
 18 files changed, 568 insertions(+), 81 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

-- 
2.30.2



[RFC 5/7] drm/i915: Track all user contexts per client

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

We soon want to start answering questions like how much GPU time is the
context belonging to a client which exited still using.

To enable this we start tracking all context belonging to a client on a
separate list.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Aravind Iddamsetty 
Reviewed-by: Chris Wilson 
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 12 
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h |  3 +++
 drivers/gpu/drm/i915/i915_drm_client.c|  3 +++
 drivers/gpu/drm/i915/i915_drm_client.h|  5 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index b8d8366a2cce..1595a608de92 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -573,6 +573,7 @@ static void set_closed_name(struct i915_gem_context *ctx)
 static void context_close(struct i915_gem_context *ctx)
 {
struct i915_address_space *vm;
+   struct i915_drm_client *client;
 
/* Flush any concurrent set_engines() */
mutex_lock(>engines_mutex);
@@ -601,6 +602,13 @@ static void context_close(struct i915_gem_context *ctx)
list_del(>link);
spin_unlock(>i915->gem.contexts.lock);
 
+   client = ctx->client;
+   if (client) {
+   spin_lock(>ctx_lock);
+   list_del_rcu(>client_link);
+   spin_unlock(>ctx_lock);
+   }
+
mutex_unlock(>mutex);
 
/*
@@ -943,6 +951,10 @@ static int gem_context_register(struct i915_gem_context 
*ctx,
 
ctx->client = client;
 
+   spin_lock(>ctx_lock);
+   list_add_tail_rcu(>client_link, >ctx_list);
+   spin_unlock(>ctx_lock);
+
spin_lock(>gem.contexts.lock);
list_add_tail(>link, >gem.contexts.list);
spin_unlock(>gem.contexts.lock);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index eb098f2896c5..8ea3fe3e7414 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -102,6 +102,9 @@ struct i915_gem_context {
/** client: struct i915_drm_client */
struct i915_drm_client *client;
 
+   /** link: _client.context_list */
+   struct list_head client_link;
+
/**
 * @ref: reference count
 *
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index 0b7a70ed61d0..1e5db7753276 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -100,6 +100,9 @@ i915_drm_client_add(struct i915_drm_clients *clients, 
struct task_struct *task)
 
kref_init(>kref);
mutex_init(>update_lock);
+   spin_lock_init(>ctx_lock);
+   INIT_LIST_HEAD(>ctx_list);
+
client->clients = clients;
INIT_RCU_WORK(>rcu, __rcu_i915_drm_client_free);
 
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
b/drivers/gpu/drm/i915/i915_drm_client.h
index db82180f5859..b2b69d6985e4 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -7,10 +7,12 @@
 #define __I915_DRM_CLIENT_H__
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "gt/intel_engine_types.h"
@@ -42,6 +44,9 @@ struct i915_drm_client {
struct i915_drm_client_name __rcu *name;
bool closed;
 
+   spinlock_t ctx_lock; /* For add/remove from ctx_list. */
+   struct list_head ctx_list; /* List of contexts belonging to client. */
+
struct i915_drm_clients *clients;
 
/**
-- 
2.30.2



[RFC 7/7] drm/i915: Expose client engine utilisation via fdinfo

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Similar to AMD commit
874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
infrastructure added in previous patches, we add basic client info
and GPU engine utilisation for i915.

Example of the output:

  pos:0
  flags:  012
  mnt_id: 21
  drm-driver: i915
  drm-pdev:   :00:02.0
  drm-client-id:  7
  drm-engine-render:  9288864723 ns
  drm-engine-copy:2035071108 ns
  drm-engine-video:   0 ns
  drm-engine-video-enhance:   0 ns

DRM related fields are appropriately prefixed for easy parsing and
separation from generic fdinfo fields.

Idea is for some fields to become either fully or partially standardised
in order to enable writting of generic top-like tools.

Initial proposal for fully standardised common fields:

 drm-driver: 
 drm-pdev: 

Optional fully standardised:

 drm-client-id: 

Optional partially standardised:

 engine-:  ns
 memory-:  KiB

Once agreed the format would need to go to some README or kerneldoc in
DRM core.

Signed-off-by: Tvrtko Ursulin 
Cc: David M Nieto 
Cc: Christian König 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_drm_client.c | 68 ++
 drivers/gpu/drm/i915/i915_drm_client.h |  4 ++
 drivers/gpu/drm/i915/i915_drv.c|  3 ++
 3 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index 1e5db7753276..5e9cfba1116b 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -9,6 +9,11 @@
 
 #include 
 
+#include 
+
+#include "gem/i915_gem_context.h"
+#include "gt/intel_engine_user.h"
+
 #include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_gem.h"
@@ -168,3 +173,66 @@ void i915_drm_clients_fini(struct i915_drm_clients 
*clients)
 
xa_destroy(>xarray);
 }
+
+#ifdef CONFIG_PROC_FS
+static const char * const uabi_class_names[] = {
+   [I915_ENGINE_CLASS_RENDER] = "render",
+   [I915_ENGINE_CLASS_COPY] = "copy",
+   [I915_ENGINE_CLASS_VIDEO] = "video",
+   [I915_ENGINE_CLASS_VIDEO_ENHANCE] = "video-enhance",
+};
+
+static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
+{
+   struct i915_gem_engines_iter it;
+   struct intel_context *ce;
+   u64 total = 0;
+
+   for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
+   if (ce->engine->uabi_class != class)
+   continue;
+
+   total += intel_context_get_total_runtime_ns(ce);
+   }
+
+   return total;
+}
+
+static void
+show_client_class(struct seq_file *m,
+ struct i915_drm_client *client,
+ unsigned int class)
+{
+   const struct list_head *list = >ctx_list;
+   u64 total = atomic64_read(>past_runtime[class]);
+   struct i915_gem_context *ctx;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(ctx, list, client_link)
+   total += busy_add(ctx, class);
+   rcu_read_unlock();
+
+   return seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+ uabi_class_names[class], total);
+}
+
+void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+{
+   struct drm_file *file = f->private_data;
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+   struct drm_i915_private *i915 = file_priv->dev_priv;
+   struct i915_drm_client *client = file_priv->client;
+   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+   unsigned int i;
+
+   seq_printf(m, "drm-driver:\ti915\n");
+   seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
+  pci_domain_nr(pdev->bus), pdev->bus->number,
+  PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+
+   seq_printf(m, "drm-client-id:\t%u\n", client->id);
+
+   for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
+   show_client_class(m, client, i);
+}
+#endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
b/drivers/gpu/drm/i915/i915_drm_client.h
index b2b69d6985e4..9885002433a0 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -98,6 +98,10 @@ i915_drm_client_pid(const struct i915_drm_client *client)
return __i915_drm_client_name(client)->pid;
 }
 
+#ifdef CONFIG_PROC_FS
+void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
+#endif
+
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
 
 #endif /* !__I915_DRM_CLIENT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 33eb7b52b58b..6b63fe4b3c26 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1694,6 +1694,9 @@ static const struct file_operations i915_driver_fops = {
.read = drm_read,
.compat_ioctl = i915_ioc32_compat_ioctl,
.llseek = noop_llseek,
+#ifdef CONFIG_PROC_FS
+   .show_fdinfo = i915_drm_client_fdinfo,
+#endif
 };
 
 static int
-- 
2.30.2



[RFC 6/7] drm/i915: Track context current active time

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Track context active (on hardware) status together with the start
timestamp.

This will be used to provide better granularity of context
runtime reporting in conjunction with already tracked pphwsp accumulated
runtime.

The latter is only updated on context save so does not give us visibility
to any currently executing work.

As part of the patch the existing runtime tracking data is moved under the
new ce->stats member and updated under the seqlock. This provides the
ability to atomically read out accumulated plus active runtime.

v2:
 * Rename and make __intel_context_get_active_time unlocked.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Aravind Iddamsetty  #  v1
Reviewed-by: Chris Wilson 
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_context.c   | 27 ++-
 drivers/gpu/drm/i915/gt/intel_context.h   | 15 ---
 drivers/gpu/drm/i915/gt/intel_context_types.h | 24 +++--
 .../drm/i915/gt/intel_execlists_submission.c  | 23 
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c|  4 +++
 drivers/gpu/drm/i915/gt/intel_lrc.c   | 27 ++-
 drivers/gpu/drm/i915/gt/intel_lrc.h   | 24 +
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 10 +++
 drivers/gpu/drm/i915/i915_gpu_error.c |  9 +++
 drivers/gpu/drm/i915/i915_gpu_error.h |  2 +-
 10 files changed, 116 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 4033184f13b9..bc021244c3b2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -373,7 +373,7 @@ intel_context_init(struct intel_context *ce, struct 
intel_engine_cs *engine)
ce->sseu = engine->sseu;
ce->ring = __intel_context_ring_size(SZ_4K);
 
-   ewma_runtime_init(>runtime.avg);
+   ewma_runtime_init(>stats.runtime.avg);
 
ce->vm = i915_vm_get(engine->gt->vm);
 
@@ -499,6 +499,31 @@ struct i915_request *intel_context_create_request(struct 
intel_context *ce)
return rq;
 }
 
+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
+{
+   u64 total, active;
+
+   total = ce->stats.runtime.total;
+   if (ce->ops->flags & COPS_RUNTIME_CYCLES)
+   total *= ce->engine->gt->clock_period_ns;
+
+   active = READ_ONCE(ce->stats.active);
+   if (active)
+   active = intel_context_clock() - active;
+
+   return total + active;
+}
+
+u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
+{
+   u64 avg = ewma_runtime_read(>stats.runtime.avg);
+
+   if (ce->ops->flags & COPS_RUNTIME_CYCLES)
+   avg *= ce->engine->gt->clock_period_ns;
+
+   return avg;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39f..a9125768b1b4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -250,18 +250,13 @@ intel_context_clear_nopreempt(struct intel_context *ce)
clear_bit(CONTEXT_NOPREEMPT, >flags);
 }
 
-static inline u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
-{
-   const u32 period = ce->engine->gt->clock_period_ns;
-
-   return READ_ONCE(ce->runtime.total) * period;
-}
+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce);
+u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);
 
-static inline u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
+static inline u64 intel_context_clock(void)
 {
-   const u32 period = ce->engine->gt->clock_period_ns;
-
-   return mul_u32_u32(ewma_runtime_read(>runtime.avg), period);
+   /* As we mix CS cycles with CPU clocks, use the raw monotonic clock. */
+   return ktime_get_raw_fast_ns();
 }
 
 #endif /* __INTEL_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346..65a5730a4f5b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -33,6 +33,9 @@ struct intel_context_ops {
 #define COPS_HAS_INFLIGHT_BIT 0
 #define COPS_HAS_INFLIGHT BIT(COPS_HAS_INFLIGHT_BIT)
 
+#define COPS_RUNTIME_CYCLES_BIT 1
+#define COPS_RUNTIME_CYCLES BIT(COPS_RUNTIME_CYCLES_BIT)
+
int (*alloc)(struct intel_context *ce);
 
int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, 
void **vaddr);
@@ -110,14 +113,19 @@ struct intel_context {
} lrc;
u32 tag; /* cookie passed to HW to track this context on submission */
 
-   /* Time on GPU as tracked by the hw. */
-   struct {
-   struct ewma_runtime avg;
-   u64 total;
-   u32 last;
-   I915_SELFTEST_DECLARE(u32 num_underflow);
-   

[RFC 4/7] drm/i915: Track runtime spent in closed and unreachable GEM contexts

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

As contexts are abandoned we want to remember how much GPU time they used
(per class) so later we can used it for smarter purposes.

As GEM contexts are closed we want to have the DRM client remember how
much GPU time they used (per class) so later we can used it for smarter
purposes.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Aravind Iddamsetty 
Reviewed-by: Chris Wilson 
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++--
 drivers/gpu/drm/i915/i915_drm_client.h  |  7 ++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5ea42d5b0b1a..b8d8366a2cce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -262,23 +262,43 @@ static void free_engines_rcu(struct rcu_head *rcu)
free_engines(engines);
 }
 
+static void accumulate_runtime(struct i915_drm_client *client,
+  struct i915_gem_engines *engines)
+{
+   struct i915_gem_engines_iter it;
+   struct intel_context *ce;
+
+   if (!client)
+   return;
+
+   /* Transfer accumulated runtime to the parent GEM context. */
+   for_each_gem_engine(ce, engines, it) {
+   unsigned int class = ce->engine->uabi_class;
+
+   GEM_BUG_ON(class >= ARRAY_SIZE(client->past_runtime));
+   atomic64_add(intel_context_get_total_runtime_ns(ce),
+>past_runtime[class]);
+   }
+}
+
 static int __i915_sw_fence_call
 engines_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
struct i915_gem_engines *engines =
container_of(fence, typeof(*engines), fence);
+   struct i915_gem_context *ctx = engines->ctx;
 
switch (state) {
case FENCE_COMPLETE:
if (!list_empty(>link)) {
-   struct i915_gem_context *ctx = engines->ctx;
unsigned long flags;
 
spin_lock_irqsave(>stale.lock, flags);
list_del(>link);
spin_unlock_irqrestore(>stale.lock, flags);
}
-   i915_gem_context_put(engines->ctx);
+   accumulate_runtime(ctx->client, engines);
+   i915_gem_context_put(ctx);
break;
 
case FENCE_FREE:
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
b/drivers/gpu/drm/i915/i915_drm_client.h
index 6d55f77a08f1..db82180f5859 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+#include "gt/intel_engine_types.h"
+
 struct drm_i915_private;
 
 struct i915_drm_clients {
@@ -41,6 +43,11 @@ struct i915_drm_client {
bool closed;
 
struct i915_drm_clients *clients;
+
+   /**
+* @past_runtime: Accumulation of pphwsp runtimes from closed contexts.
+*/
+   atomic64_t past_runtime[MAX_ENGINE_CLASS + 1];
 };
 
 void i915_drm_clients_init(struct i915_drm_clients *clients,
-- 
2.30.2



[RFC 3/7] drm/i915: Make GEM contexts track DRM clients

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

If we make GEM contexts keep a reference to i915_drm_client for the whole
of their lifetime, we can consolidate the current task pid and name usage
by getting it from the client.

v2: Don't bother supporting selftests contexts from debugfs. (Chris)
v3 (Lucas): Finish constructing ctx before adding it to the list
v4 (Ram): Rebase.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Chris Wilson 
Reviewed-by: Aravind Iddamsetty 
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 20 -
 .../gpu/drm/i915/gem/i915_gem_context_types.h | 13 +++
 drivers/gpu/drm/i915/i915_gpu_error.c | 22 +++
 3 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e5f8d94666e8..5ea42d5b0b1a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -345,13 +345,14 @@ void i915_gem_context_release(struct kref *ref)
trace_i915_context_free(ctx);
GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
-   mutex_destroy(>engines_mutex);
-   mutex_destroy(>lut_mutex);
+   if (ctx->client)
+   i915_drm_client_put(ctx->client);
 
if (ctx->timeline)
intel_timeline_put(ctx->timeline);
 
-   put_pid(ctx->pid);
+   mutex_destroy(>engines_mutex);
+   mutex_destroy(>lut_mutex);
mutex_destroy(>mutex);
 
kfree_rcu(ctx, rcu);
@@ -895,6 +896,7 @@ static int gem_context_register(struct i915_gem_context 
*ctx,
u32 *id)
 {
struct drm_i915_private *i915 = ctx->i915;
+   struct i915_drm_client *client;
struct i915_address_space *vm;
int ret;
 
@@ -906,15 +908,21 @@ static int gem_context_register(struct i915_gem_context 
*ctx,
WRITE_ONCE(vm->file, fpriv); /* XXX */
mutex_unlock(>mutex);
 
-   ctx->pid = get_task_pid(current, PIDTYPE_PID);
+   client = i915_drm_client_get(fpriv->client);
+
+   rcu_read_lock();
snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
-current->comm, pid_nr(ctx->pid));
+i915_drm_client_name(client),
+pid_nr(i915_drm_client_pid(client)));
+   rcu_read_unlock();
 
/* And finally expose ourselves to userspace via the idr */
ret = xa_alloc(>context_xa, id, ctx, xa_limit_32b, GFP_KERNEL);
if (ret)
goto err_pid;
 
+   ctx->client = client;
+
spin_lock(>gem.contexts.lock);
list_add_tail(>link, >gem.contexts.list);
spin_unlock(>gem.contexts.lock);
@@ -922,7 +930,7 @@ static int gem_context_register(struct i915_gem_context 
*ctx,
return 0;
 
 err_pid:
-   put_pid(fetch_and_zero(>pid));
+   i915_drm_client_put(client);
return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 340473aa70de..eb098f2896c5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -96,19 +96,12 @@ struct i915_gem_context {
 */
struct i915_address_space __rcu *vm;
 
-   /**
-* @pid: process id of creator
-*
-* Note that who created the context may not be the principle user,
-* as the context may be shared across a local socket. However,
-* that should only affect the default context, all contexts created
-* explicitly by the client are expected to be isolated.
-*/
-   struct pid *pid;
-
/** link: place with _i915_private.context_list */
struct list_head link;
 
+   /** client: struct i915_drm_client */
+   struct i915_drm_client *client;
+
/**
 * @ref: reference count
 *
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 8b964e355cb5..f5dfc15f5f76 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1235,7 +1235,9 @@ static void record_request(const struct i915_request 
*request,
 
ctx = rcu_dereference(request->context->gem_context);
if (ctx)
-   erq->pid = pid_nr(ctx->pid);
+   erq->pid = I915_SELFTEST_ONLY(!ctx->client) ?
+  0 :
+  pid_nr(i915_drm_client_pid(ctx->client));
}
rcu_read_unlock();
 }
@@ -1256,23 +1258,25 @@ static bool record_context(struct 
i915_gem_context_coredump *e,
   const struct i915_request *rq)
 {
struct i915_gem_context *ctx;
-   struct task_struct *task;
bool simulated;
 
rcu_read_lock();
+
ctx = rcu_dereference(rq->context->gem_context);
if (ctx && !kref_get_unless_zero(>ref))
 

Re: i915 and swiotlb_max_segment

2021-05-20 Thread Konrad Rzeszutek Wilk
On Mon, May 10, 2021 at 05:25:25PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> swiotlb_max_segment is a rather strange "API" export by swiotlb.c,
> and i915 is the only (remaining) user.
> 
> swiotlb_max_segment returns 0 if swiotlb is not in use, 1 if
> SWIOTLB_FORCE is set or swiotlb-zen is set, and the swiotlb segment
> size when swiotlb is otherwise enabled.
> 
> i915 then uses it to:
> 
>  a) decided on the max order in i915_gem_object_get_pages_internal
>  b) decide on a max segment size in i915_sg_segment_size
> 
> for a) it really seems i915 should switch to dma_alloc_noncoherent
> or dma_alloc_noncontigous ASAP instead of using alloc_page and
> streaming DMA mappings.  Any chance I could trick one of the i915
> maintaines into doing just that given that the callchain is not
> exactly trivial?
> 
> For b) I'm not sure swiotlb and i915 really agree on the meaning
> of the value.  swiotlb_set_max_segment basically returns the entire
> size of the swiotlb buffer, while i915 seems to use it to limit
> the size each scatterlist entry.  It seems like dma_max_mapping_size
> might be the best value to use here.

Yes. The background behind that was SWIOTLB would fail because well, the
size of the sg was too large. And some way to limit it to max size
was needed - the dma_max_mapping_size "should" be just fine.

> 
> Once that is fixed I'd like to kill off swiotlb_max_segment as it is
> a horribly confusing API.


[RFC 2/7] drm/i915: Update client name on context create

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Some clients have the DRM fd passed to them over a socket by the X server.

Grab the real client and pid when they create their first context and
update the exposed data for more useful enumeration.

To enable lockless access to client name and pid data from the following
patches, we also make these fields rcu protected. In this way asynchronous
code paths where both contexts which remain after the client exit, and
access to client name and pid as they are getting updated due context
creation running in parallel with name/pid queries.

v2:
 * Do not leak the pid reference and borrow context idr_lock. (Chris)

v3:
 * More avoiding leaks. (Chris)

v4:
 * Move update completely to drm client. (Chris)
 * Do not lose previous client data on failure to re-register and simplify
   update to only touch what it needs.

v5:
 * Reuse ext_data local. (Chris)

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Chris Wilson 
Reviewed-by: Aravind Iddamsetty 
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  5 ++
 drivers/gpu/drm/i915/i915_drm_client.c  | 66 +++--
 drivers/gpu/drm/i915/i915_drm_client.h  | 34 ++-
 3 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 188dee13e017..e5f8d94666e8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -76,6 +76,7 @@
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
+#include "i915_drm_client.h"
 #include "i915_gem_context.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
@@ -2321,6 +2322,10 @@ int i915_gem_context_create_ioctl(struct drm_device 
*dev, void *data,
return -EIO;
}
 
+   ret = i915_drm_client_update(ext_data.fpriv->client, current);
+   if (ret)
+   return ret;
+
ext_data.ctx = i915_gem_create_context(i915, args->flags);
if (IS_ERR(ext_data.ctx))
return PTR_ERR(ext_data.ctx);
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index 83080d9836b0..0b7a70ed61d0 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -7,7 +7,10 @@
 #include 
 #include 
 
+#include 
+
 #include "i915_drm_client.h"
+#include "i915_drv.h"
 #include "i915_gem.h"
 #include "i915_utils.h"
 
@@ -20,26 +23,57 @@ void i915_drm_clients_init(struct i915_drm_clients *clients,
xa_init_flags(>xarray, XA_FLAGS_ALLOC);
 }
 
+static struct i915_drm_client_name *get_name(struct i915_drm_client *client,
+struct task_struct *task)
+{
+   struct i915_drm_client_name *name;
+   int len = strlen(task->comm);
+
+   name = kmalloc(struct_size(name, name, len + 1), GFP_KERNEL);
+   if (!name)
+   return NULL;
+
+   init_rcu_head(>rcu);
+   name->client = client;
+   name->pid = get_task_pid(task, PIDTYPE_PID);
+   memcpy(name->name, task->comm, len + 1);
+
+   return name;
+}
+
+static void free_name(struct rcu_head *rcu)
+{
+   struct i915_drm_client_name *name =
+   container_of(rcu, typeof(*name), rcu);
+
+   put_pid(name->pid);
+   kfree(name);
+}
+
 static int
 __i915_drm_client_register(struct i915_drm_client *client,
   struct task_struct *task)
 {
-   char *name;
+   struct i915_drm_client_name *name;
 
-   name = kstrdup(task->comm, GFP_KERNEL);
+   name = get_name(client, task);
if (!name)
return -ENOMEM;
 
-   client->pid = get_task_pid(task, PIDTYPE_PID);
-   client->name = name;
+   RCU_INIT_POINTER(client->name, name);
 
return 0;
 }
 
 static void __i915_drm_client_unregister(struct i915_drm_client *client)
 {
-   put_pid(fetch_and_zero(>pid));
-   kfree(fetch_and_zero(>name));
+   struct i915_drm_client_name *name;
+
+   mutex_lock(>update_lock);
+   name = rcu_replace_pointer(client->name, NULL, true);
+   mutex_unlock(>update_lock);
+
+   call_rcu(>rcu, free_name);
 }
 
 static void __rcu_i915_drm_client_free(struct work_struct *wrk)
@@ -65,6 +99,7 @@ i915_drm_client_add(struct i915_drm_clients *clients, struct 
task_struct *task)
return ERR_PTR(-ENOMEM);
 
kref_init(>kref);
+   mutex_init(>update_lock);
client->clients = clients;
INIT_RCU_WORK(>rcu, __rcu_i915_drm_client_free);
 
@@ -102,6 +137,25 @@ void i915_drm_client_close(struct i915_drm_client *client)
i915_drm_client_put(client);
 }
 
+int
+i915_drm_client_update(struct i915_drm_client *client,
+  struct task_struct *task)
+{
+   struct i915_drm_client_name *name;
+
+   name = get_name(client, task);
+   if (!name)
+   return -ENOMEM;
+
+   mutex_lock(>update_lock);
+   if 

[RFC 1/7] drm/i915: Explicitly track DRM clients

2021-05-20 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Tracking DRM clients more explicitly will allow later patches to
accumulate past and current GPU usage in a centralised place and also
consolidate access to owning task pid/name.

Unique client id is also assigned for the purpose of distinguishing/
consolidating between multiple file descriptors owned by the same process.

v2:
 Chris Wilson:
 * Enclose new members into dedicated structs.
 * Protect against failed sysfs registration.

v3:
 * sysfs_attr_init.

v4:
 * Fix for internal clients.

v5:
 * Use cyclic ida for client id. (Chris)
 * Do not leak pid reference. (Chris)
 * Tidy code with some locals.

v6:
 * Use xa_alloc_cyclic to simplify locking. (Chris)
 * No need to unregister individial sysfs files. (Chris)
 * Rebase on top of fpriv kref.
 * Track client closed status and reflect in sysfs.

v7:
 * Make drm_client more standalone concept.

v8:
 * Simplify sysfs show. (Chris)
 * Always track name and pid.

v9:
 * Fix cyclic id assignment.

v10:
 * No need for a mutex around xa_alloc_cyclic.
 * Refactor sysfs into own function.
 * Unregister sysfs before freeing pid and name.
 * Move clients setup into own function.

v11:
 * Call clients init directly from driver init. (Chris)

v12:
 * Do not fail client add on id wrap. (Maciej)

v13 (Lucas): Rebase.

v14:
 * Dropped sysfs bits.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Chris Wilson  # v11
Reviewed-by: Aravind Iddamsetty  # v11
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/Makefile  |   5 +-
 drivers/gpu/drm/i915/i915_drm_client.c | 113 +
 drivers/gpu/drm/i915/i915_drm_client.h |  61 +
 drivers/gpu/drm/i915/i915_drv.c|   6 ++
 drivers/gpu/drm/i915/i915_drv.h|   5 ++
 drivers/gpu/drm/i915/i915_gem.c|  21 -
 6 files changed, 206 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 6947495bf34b..f3f5c4571623 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -33,8 +33,9 @@ subdir-ccflags-y += -I$(srctree)/$(src)
 # Please keep these build lists sorted!
 
 # core driver code
-i915-y += i915_drv.o \
- i915_config.o \
+i915-y += i915_config.o \
+ i915_drm_client.o \
+ i915_drv.o \
  i915_irq.o \
  i915_getparam.o \
  i915_mitigations.o \
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
new file mode 100644
index ..83080d9836b0
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include 
+#include 
+#include 
+
+#include "i915_drm_client.h"
+#include "i915_gem.h"
+#include "i915_utils.h"
+
+void i915_drm_clients_init(struct i915_drm_clients *clients,
+  struct drm_i915_private *i915)
+{
+   clients->i915 = i915;
+
+   clients->next_id = 0;
+   xa_init_flags(>xarray, XA_FLAGS_ALLOC);
+}
+
+static int
+__i915_drm_client_register(struct i915_drm_client *client,
+  struct task_struct *task)
+{
+   char *name;
+
+   name = kstrdup(task->comm, GFP_KERNEL);
+   if (!name)
+   return -ENOMEM;
+
+   client->pid = get_task_pid(task, PIDTYPE_PID);
+   client->name = name;
+
+   return 0;
+}
+
+static void __i915_drm_client_unregister(struct i915_drm_client *client)
+{
+   put_pid(fetch_and_zero(>pid));
+   kfree(fetch_and_zero(>name));
+}
+
+static void __rcu_i915_drm_client_free(struct work_struct *wrk)
+{
+   struct i915_drm_client *client =
+   container_of(wrk, typeof(*client), rcu.work);
+
+   xa_erase(>clients->xarray, client->id);
+
+   __i915_drm_client_unregister(client);
+
+   kfree(client);
+}
+
+struct i915_drm_client *
+i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
+{
+   struct i915_drm_client *client;
+   int ret;
+
+   client = kzalloc(sizeof(*client), GFP_KERNEL);
+   if (!client)
+   return ERR_PTR(-ENOMEM);
+
+   kref_init(>kref);
+   client->clients = clients;
+   INIT_RCU_WORK(>rcu, __rcu_i915_drm_client_free);
+
+   ret = xa_alloc_cyclic(>xarray, >id, client,
+ xa_limit_32b, >next_id, GFP_KERNEL);
+   if (ret < 0)
+   goto err_id;
+
+   ret = __i915_drm_client_register(client, task);
+   if (ret)
+   goto err_register;
+
+   return client;
+
+err_register:
+   xa_erase(>xarray, client->id);
+err_id:
+   kfree(client);
+
+   return ERR_PTR(ret);
+}
+
+void __i915_drm_client_free(struct kref *kref)
+{
+   struct i915_drm_client *client =
+   container_of(kref, typeof(*client), kref);
+
+   queue_rcu_work(system_wq, >rcu);
+}

[RFC PATCH 3/5] drm/ttm: Use drm_memcpy_from_wc for TTM bo moves

2021-05-20 Thread Thomas Hellström
Use fast wc memcpy for reading out of wc memory for TTM bo moves.

Cc: Dave Airlie 
Cc: Christian König 
Cc: Daniel Vetter 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index bad9b16e96ba..919ee03f7eb3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -31,6 +31,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -185,6 +186,7 @@ void ttm_move_memcpy(struct ttm_buffer_object *bo,
struct ttm_resource *old_mem = >mem;
struct ttm_resource_manager *old_man = ttm_manager_type(bdev, 
old_mem->mem_type);
struct dma_buf_map old_map, new_map;
+   bool wc_memcpy;
pgoff_t i;
 
/* Single TTM move. NOP */
@@ -208,11 +210,25 @@ void ttm_move_memcpy(struct ttm_buffer_object *bo,
return;
}
 
+   wc_memcpy = ((!old_man->use_tt || bo->ttm->caching != ttm_cached) &&
+drm_has_memcpy_from_wc());
+
+   /*
+* We use some nasty aliasing for drm_memcpy_from_wc, but assuming
+* that we can move to memremapping in the not too distant future,
+* reduce the fragility for now with a build assert.
+*/
+   BUILD_BUG_ON(offsetof(typeof(old_map), vaddr) !=
+offsetof(typeof(old_map), vaddr_iomem));
+
for (i = 0; i < new_mem->num_pages; ++i) {
new_iter->ops->kmap_local(new_iter, _map, i);
old_iter->ops->kmap_local(old_iter, _map, i);
 
-   if (!old_map.is_iomem && !new_map.is_iomem) {
+   if (wc_memcpy) {
+   drm_memcpy_from_wc(new_map.vaddr, old_map.vaddr,
+  PAGE_SIZE);
+   } else if (!old_map.is_iomem && !new_map.is_iomem) {
memcpy(new_map.vaddr, old_map.vaddr, PAGE_SIZE);
} else if (!old_map.is_iomem) {
dma_buf_map_memcpy_to(_map, old_map.vaddr,
-- 
2.31.1



[RFC PATCH 2/5] drm, drm/i915: Move the memcpy_from_wc functionality to core drm

2021-05-20 Thread Thomas Hellström
Memcpy from wc will be used as well by TTM memcpy.
Move it to core drm, and make the interface do the right thing
even on !X86.

Cc: Christian König 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/Makefile  |  2 +-
 drivers/gpu/drm/drm_drv.c |  2 +
 .../drm/{i915/i915_memcpy.c => drm_memcpy.c}  | 31 +++---
 drivers/gpu/drm/i915/Makefile |  1 -
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  5 ++-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  7 ++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c| 11 ++---
 drivers/gpu/drm/i915/i915_cmd_parser.c|  4 +-
 drivers/gpu/drm/i915/i915_drv.c   |  2 -
 drivers/gpu/drm/i915/i915_gpu_error.c |  8 ++--
 drivers/gpu/drm/i915/i915_memcpy.h| 34 ---
 .../drm/i915/selftests/intel_memory_region.c  |  7 ++--
 include/drm/drm_memcpy.h  | 41 +++
 14 files changed, 83 insertions(+), 76 deletions(-)
 rename drivers/gpu/drm/{i915/i915_memcpy.c => drm_memcpy.c} (84%)
 delete mode 100644 drivers/gpu/drm/i915/i915_memcpy.h
 create mode 100644 include/drm/drm_memcpy.h

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index a91cc7684904..f3ab8586c3d7 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -18,7 +18,7 @@ drm-y   :=drm_aperture.o drm_auth.o drm_cache.o \
drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \
drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \
drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \
-   drm_managed.o drm_vblank_work.o
+   drm_managed.o drm_vblank_work.o drm_memcpy.o \
 
 drm-$(CONFIG_DRM_LEGACY) += drm_agpsupport.o drm_bufs.o drm_context.o 
drm_dma.o \
drm_legacy_misc.o drm_lock.o drm_memory.o 
drm_scatter.o \
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 3d8d68a98b95..351cc2900cf1 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1041,6 +1042,7 @@ static int __init drm_core_init(void)
 
drm_connector_ida_init();
idr_init(_minors_idr);
+   drm_memcpy_init_early();
 
ret = drm_sysfs_init();
if (ret < 0) {
diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/drm_memcpy.c
similarity index 84%
rename from drivers/gpu/drm/i915/i915_memcpy.c
rename to drivers/gpu/drm/drm_memcpy.c
index 1b021a4902de..03688425a096 100644
--- a/drivers/gpu/drm/i915/i915_memcpy.c
+++ b/drivers/gpu/drm/drm_memcpy.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: MIT
 /*
  * Copyright © 2016 Intel Corporation
  *
@@ -22,16 +23,11 @@
  *
  */
 
+#ifdef CONFIG_X86
 #include 
 #include 
 
-#include "i915_memcpy.h"
-
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
-#define CI_BUG_ON(expr) BUG_ON(expr)
-#else
-#define CI_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
-#endif
+#include "drm/drm_memcpy.h"
 
 static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
 
@@ -94,23 +90,23 @@ static void __memcpy_ntdqu(void *dst, const void *src, 
unsigned long len)
 }
 
 /**
- * i915_memcpy_from_wc: perform an accelerated *aligned* read from WC
+ * drm_memcpy_from_wc: perform an accelerated *aligned* read from WC
  * @dst: destination pointer
  * @src: source pointer
  * @len: how many bytes to copy
  *
- * i915_memcpy_from_wc copies @len bytes from @src to @dst using
+ * drm_memcpy_from_wc copies @len bytes from @src to @dst using
  * non-temporal instructions where available. Note that all arguments
  * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple
  * of 16.
  *
  * To test whether accelerated reads from WC are supported, use
- * i915_memcpy_from_wc(NULL, NULL, 0);
+ * drm_memcpy_from_wc(NULL, NULL, 0);
  *
  * Returns true if the copy was successful, false if the preconditions
  * are not met.
  */
-bool i915_memcpy_from_wc(void *dst, const void *src, unsigned long len)
+bool drm_memcpy_from_wc(void *dst, const void *src, unsigned long len)
 {
if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15))
return false;
@@ -123,24 +119,23 @@ bool i915_memcpy_from_wc(void *dst, const void *src, 
unsigned long len)
 
return false;
 }
+EXPORT_SYMBOL(drm_memcpy_from_wc);
 
 /**
- * i915_unaligned_memcpy_from_wc: perform a mostly accelerated read from WC
+ * drm_unaligned_memcpy_from_wc: perform a mostly accelerated read from WC
  * @dst: destination pointer
  * @src: source pointer
  * @len: how many bytes to copy
  *
- * Like i915_memcpy_from_wc(), the unaligned variant copies @len bytes from
+ * Like drm_memcpy_from_wc(), the unaligned variant copies @len bytes from
  * @src to @dst using * non-temporal instructions where available, but
  * accepts that its arguments may not 

[RFC PATCH 5/5] drm/ttm, drm/amdgpu: Allow the driver some control over swapping

2021-05-20 Thread Thomas Hellström
We are calling the eviction_valuable driver callback at eviction time to
determine whether we actually can evict a buffer object.
The upcoming i915 TTM backend needs the same functionality for swapout,
and that might actually be beneficial to other drivers as well.

Add an eviction_valuable call also in the swapout path. Try to keep the
current behaviour for all drivers by returning true if the buffer object
is already in the TTM_PL_SYSTEM placement. We change behaviour for the
case where a buffer object is in a TT backed placement when swapped out,
in which case the drivers normal eviction_valuable path is run.

Finally make sure we don't try to swapout a bo that was recently purged
and therefore unpopulated.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  4 +++
 drivers/gpu/drm/ttm/ttm_bo.c| 41 +++--
 drivers/gpu/drm/ttm/ttm_tt.c|  4 +++
 3 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8c7ec09eb1a4..d5a9d7a88315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1399,6 +1399,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
ttm_buffer_object *bo,
struct dma_fence *f;
int i;
 
+   /* Swapout? */
+   if (bo->mem.mem_type == TTM_PL_SYSTEM)
+   return true;
+
if (bo->type == ttm_bo_type_kernel &&
!amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)))
return false;
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a8fa3375b8aa..3f7c64a1cda1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -536,6 +536,10 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
  const struct ttm_place *place)
 {
+   dma_resv_assert_held(bo->base.resv);
+   if (bo->mem.mem_type == TTM_PL_SYSTEM)
+   return true;
+
/* Don't evict this BO if it's outside of the
 * requested placement range
 */
@@ -558,7 +562,9 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
  * b. Otherwise, trylock it.
  */
 static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
+  struct ttm_operation_ctx *ctx,
+  const struct ttm_place *place,
+  bool *locked, bool *busy)
 {
bool ret = false;
 
@@ -576,6 +582,12 @@ static bool ttm_bo_evict_swapout_allowable(struct 
ttm_buffer_object *bo,
*busy = !ret;
}
 
+   if (ret && place && !bo->bdev->funcs->eviction_valuable(bo, place)) {
+   ret = false;
+   if (locked)
+   dma_resv_unlock(bo->base.resv);
+   }
+
return ret;
 }
 
@@ -630,20 +642,14 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
list_for_each_entry(bo, >lru[i], lru) {
bool busy;
 
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ,
-   )) {
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, place,
+   , )) {
if (busy && !busy_bo && ticket !=
dma_resv_locking_ctx(bo->base.resv))
busy_bo = bo;
continue;
}
 
-   if (place && !bdev->funcs->eviction_valuable(bo,
- place)) {
-   if (locked)
-   dma_resv_unlock(bo->base.resv);
-   continue;
-   }
if (!ttm_bo_get_unless_zero(bo)) {
if (locked)
dma_resv_unlock(bo->base.resv);
@@ -1138,10 +1144,18 @@ EXPORT_SYMBOL(ttm_bo_wait);
 int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
   gfp_t gfp_flags)
 {
+   struct ttm_place place = {};
bool locked;
int ret;
 
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, , NULL))
+   /*
+* While the bo may already reside in SYSTEM placement, set
+* SYSTEM as new placement to cover also the move further below.
+* The driver may use the fact that we're moving from SYSTEM
+* as an indication that we're about to swap out.
+*/
+   place.mem_type = TTM_PL_SYSTEM;
+   if (!ttm_bo_evict_swapout_allowable(bo, 

[RFC PATCH 4/5] drm/ttm: Document and optimize ttm_bo_pipeline_gutting()

2021-05-20 Thread Thomas Hellström
If the bo is idle when calling ttm_bo_pipeline_gutting(), we unnecessarily
create a ghost object and push it out to delayed destroy.
Fix this by adding a path for idle, and document the function.

Also avoid having the bo end up in a bad state vulnerable to user-space
triggered kernel BUGs if the call to ttm_tt_create() fails.

Finally reuse ttm_bo_pipeline_gutting() in ttm_bo_evict().

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo.c  | 20 +-
 drivers/gpu/drm/ttm/ttm_bo_util.c | 63 ---
 drivers/gpu/drm/ttm/ttm_tt.c  |  5 +++
 include/drm/ttm/ttm_tt.h  | 10 +
 4 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ca1b098b6a56..a8fa3375b8aa 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -501,10 +501,15 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
bdev->funcs->evict_flags(bo, );
 
if (!placement.num_placement && !placement.num_busy_placement) {
-   ttm_bo_wait(bo, false, false);
+   ret = ttm_bo_wait(bo, true, false);
+   if (ret)
+   return ret;
 
-   ttm_bo_cleanup_memtype_use(bo);
-   return ttm_tt_create(bo, false);
+   /*
+* Since we've already synced, this frees backing store
+* immediately.
+*/
+   return ttm_bo_pipeline_gutting(bo);
}
 
ret = ttm_bo_mem_space(bo, , _mem, ctx);
@@ -974,13 +979,8 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
/*
 * Remove the backing store if no placement is given.
 */
-   if (!placement->num_placement && !placement->num_busy_placement) {
-   ret = ttm_bo_pipeline_gutting(bo);
-   if (ret)
-   return ret;
-
-   return ttm_tt_create(bo, false);
-   }
+   if (!placement->num_placement && !placement->num_busy_placement)
+   return ttm_bo_pipeline_gutting(bo);
 
/*
 * Check whether we need to move buffer.
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 919ee03f7eb3..1860e2e7563f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -479,7 +479,8 @@ static void ttm_transfered_destroy(struct ttm_buffer_object 
*bo)
  */
 
 static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
- struct ttm_buffer_object **new_obj)
+ struct ttm_buffer_object **new_obj,
+ bool realloc_tt)
 {
struct ttm_transfer_obj *fbo;
int ret;
@@ -493,6 +494,17 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,
ttm_bo_get(bo);
fbo->bo = bo;
 
+   if (realloc_tt) {
+   bo->ttm = NULL;
+   ret = ttm_tt_create(bo, true);
+   if (ret) {
+   bo->ttm = fbo->base.ttm;
+   kfree(fbo);
+   ttm_bo_put(bo);
+   return ret;
+   }
+   }
+
/**
 * Fix up members that we shouldn't copy directly:
 * TODO: Explicit member copy would probably be better here.
@@ -763,7 +775,7 @@ static int ttm_bo_move_to_ghost(struct ttm_buffer_object 
*bo,
dma_fence_put(bo->moving);
bo->moving = dma_fence_get(fence);
 
-   ret = ttm_buffer_object_transfer(bo, _obj);
+   ret = ttm_buffer_object_transfer(bo, _obj, false);
if (ret)
return ret;
 
@@ -836,26 +848,51 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object 
*bo,
 }
 EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
 
+/**
+ * ttm_bo_pipeline_gutting - purge the contents of a bo
+ * @bo: The buffer object
+ *
+ * Purge the contents of a bo, async if the bo is not idle.
+ * After a successful call, the bo is left unpopulated in
+ * system placement. The function may wait uninterruptible
+ * for idle on OOM.
+ *
+ * Return: 0 if successful, negative error code on failure.
+ */
 int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
 {
static const struct ttm_place sys_mem = { .mem_type = TTM_PL_SYSTEM };
struct ttm_buffer_object *ghost;
int ret;
 
-   ret = ttm_buffer_object_transfer(bo, );
-   if (ret)
-   return ret;
+   /* If already idle, no need for ghost object dance. */
+   ret = ttm_bo_wait(bo, false, true);
+   if (ret == -EBUSY) {
+   ret = ttm_buffer_object_transfer(bo, , true);
+   if (ret)
+   return ret;
 
-   ret = dma_resv_copy_fences(>base._resv, bo->base.resv);
-   /* Last resort, wait for the BO to be idle when we are OOM */
-   if (ret)
-   ttm_bo_wait(bo, false, false);
+   

[RFC PATCH 1/5] drm/ttm: Add a generic TTM memcpy move for page-based iomem

2021-05-20 Thread Thomas Hellström
The internal ttm_bo_util memcpy uses ioremap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of ioremap() and consuming ioremap space, elimination of a
critical point of failure and with a slight change of semantics we could
also push the memcpy out async for testing and async driver development
purposes.

A special linear iomem iterator is introduced internally to mimic the
old ioremap behaviour for code-paths that can't immediately be ported
over. This adds to the code size and should be considered a temporary
solution.

Looking at the code we have a lot of checks for iomap tagged pointers.
Ideally we should extend the core memremap functions to also accept
uncached memory and kmap_local functionality. Then we could strip a
lot of code.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 468 --
 include/drm/ttm/ttm_bo_driver.h   |  94 ++
 2 files changed, 407 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index ae8b61460724..bad9b16e96ba 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -35,11 +35,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 struct ttm_transfer_obj {
struct ttm_buffer_object base;
@@ -72,190 +74,366 @@ void ttm_mem_io_free(struct ttm_device *bdev,
mem->bus.addr = NULL;
 }
 
-static int ttm_resource_ioremap(struct ttm_device *bdev,
-  struct ttm_resource *mem,
-  void **virtual)
+static pgprot_t ttm_prot_from_caching(enum ttm_caching caching, pgprot_t tmp)
 {
-   int ret;
-   void *addr;
+   /* Cached mappings need no adjustment */
+   if (caching == ttm_cached)
+   return tmp;
 
-   *virtual = NULL;
-   ret = ttm_mem_io_reserve(bdev, mem);
-   if (ret || !mem->bus.is_iomem)
-   return ret;
+#if defined(__i386__) || defined(__x86_64__)
+   if (caching == ttm_write_combined)
+   tmp = pgprot_writecombine(tmp);
+   else if (boot_cpu_data.x86 > 3)
+   tmp = pgprot_noncached(tmp);
+#endif
+#if defined(__ia64__) || defined(__arm__) || defined(__aarch64__) || \
+   defined(__powerpc__) || defined(__mips__)
+   if (caching == ttm_write_combined)
+   tmp = pgprot_writecombine(tmp);
+   else
+   tmp = pgprot_noncached(tmp);
+#endif
+#if defined(__sparc__)
+   tmp = pgprot_noncached(tmp);
+#endif
+   return tmp;
+}
 
-   if (mem->bus.addr) {
-   addr = mem->bus.addr;
-   } else {
-   size_t bus_size = (size_t)mem->num_pages << PAGE_SHIFT;
+static void ttm_kmap_iter_tt_kmap_local(struct ttm_kmap_iter *iter,
+   struct dma_buf_map *dmap,
+   pgoff_t i)
+{
+   struct ttm_kmap_iter_tt *iter_tt =
+   container_of(iter, typeof(*iter_tt), base);
 
-   if (mem->bus.caching == ttm_write_combined)
-   addr = ioremap_wc(mem->bus.offset, bus_size);
-#ifdef CONFIG_X86
-   else if (mem->bus.caching == ttm_cached)
-   addr = ioremap_cache(mem->bus.offset, bus_size);
-#endif
-   else
-   addr = ioremap(mem->bus.offset, bus_size);
-   if (!addr) {
-   ttm_mem_io_free(bdev, mem);
-   return -ENOMEM;
-   }
+   dma_buf_map_set_vaddr(dmap, kmap_local_page_prot(iter_tt->tt->pages[i],
+iter_tt->prot));
+}
+
+static void ttm_kmap_iter_iomap_kmap_local(struct ttm_kmap_iter *iter,
+  struct dma_buf_map *dmap,
+  pgoff_t i)
+{
+   struct ttm_kmap_iter_iomap *iter_io =
+   container_of(iter, typeof(*iter_io), base);
+   void __iomem *addr;
+
+retry:
+   while (i >= iter_io->cache.end) {
+   iter_io->cache.sg = iter_io->cache.sg ?
+   sg_next(iter_io->cache.sg) : iter_io->st->sgl;
+   iter_io->cache.i = iter_io->cache.end;
+   iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >>
+   PAGE_SHIFT;
+   iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) -
+   iter_io->start;
}
-   *virtual = addr;
-   return 0;
+
+   if (i < iter_io->cache.i) {
+   iter_io->cache.end = 0;
+   

[RFC PATCH 0/5] Core TTM changes for i915 TTM enabling

2021-05-20 Thread Thomas Hellström
This is mainly a pre-check that the core TTM changes for the initial
i915 TTM patch series look reasonably ok.

Main thing is we add the new page-based iomem memcpy util to TTM, and
for some speed the copy-from-wc-x86-only prefetching memcpy to core drm.
Note that the legacy memcpy path is largely untested. Perhaps can give
it some testing on vmwgfx.

A bugfix and some minor optimization for the ttm_bo_pipeline_gutting()
idle case

Finally allow the frequently-pinning i915 driver to block swapping of
pinned memory that is still on the LRU.

If OK, I'd like to include these as a part of the i915 series.

Cc: Christian König 
Cc: Dave Airlie 
Cc: Daniel Vetter 

Thomas Hellström (5):
  drm/ttm: Add a generic TTM memcpy move for page-based iomem
  drm, drm/i915: Move the memcpy_from_wc functionality to core drm
  drm/ttm: Use drm_memcpy_from_wc for TTM bo moves
  drm/ttm: Document and optimize ttm_bo_pipeline_gutting()
  drm/ttm, drm/amdgpu: Allow the driver some control over swapping

 drivers/gpu/drm/Makefile  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   4 +
 drivers/gpu/drm/drm_drv.c |   2 +
 .../drm/{i915/i915_memcpy.c => drm_memcpy.c}  |  31 +-
 drivers/gpu/drm/i915/Makefile |   1 -
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   5 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |   7 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c|  11 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c|   4 +-
 drivers/gpu/drm/i915/i915_drv.c   |   2 -
 drivers/gpu/drm/i915/i915_gpu_error.c |   8 +-
 drivers/gpu/drm/i915/i915_memcpy.h|  34 --
 .../drm/i915/selftests/intel_memory_region.c  |   7 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  61 +-
 drivers/gpu/drm/ttm/ttm_bo_util.c | 547 --
 drivers/gpu/drm/ttm/ttm_tt.c  |   9 +
 include/drm/drm_memcpy.h  |  41 ++
 include/drm/ttm/ttm_bo_driver.h   |  94 +++
 include/drm/ttm/ttm_tt.h  |  10 +
 20 files changed, 614 insertions(+), 270 deletions(-)
 rename drivers/gpu/drm/{i915/i915_memcpy.c => drm_memcpy.c} (84%)
 delete mode 100644 drivers/gpu/drm/i915/i915_memcpy.h
 create mode 100644 include/drm/drm_memcpy.h

-- 
2.31.1



[PATCH 2/4] dt-bindings: display: bcm2835-vec: Add BCM2711 compatible

2021-05-20 Thread Maxime Ripard
From: Mateusz Kwiatkowski 

The BCM2711 VEC uses a slightly different, incompatible, setup than the
one used for the earlier SoC. Add a new compatible for it.

Signed-off-by: Mateusz Kwiatkowski 
Signed-off-by: Maxime Ripard 
---
 .../devicetree/bindings/display/brcm,bcm2835-vec.yaml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-vec.yaml 
b/Documentation/devicetree/bindings/display/brcm,bcm2835-vec.yaml
index d900cc57b4ec..9b24081a0dbd 100644
--- a/Documentation/devicetree/bindings/display/brcm,bcm2835-vec.yaml
+++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-vec.yaml
@@ -11,7 +11,9 @@ maintainers:
 
 properties:
   compatible:
-const: brcm,bcm2835-vec
+enum:
+  - brcm,bcm2711-vec
+  - brcm,bcm2835-vec
 
   reg:
 maxItems: 1
-- 
2.31.1



[PATCH 1/4] drm/vc4: Fix clock source for VEC PixelValve on BCM2711

2021-05-20 Thread Maxime Ripard
From: Mateusz Kwiatkowski 

On the BCM2711 (Raspberry Pi 4), the VEC is actually connected to
output 2 of pixelvalve3.

NOTE: This contradicts the Broadcom docs, but has been empirically
tested and confirmed by Raspberry Pi firmware devs.

Signed-off-by: Mateusz Kwiatkowski 
Signed-off-by: Maxime Ripard 
---
 drivers/gpu/drm/vc4/vc4_crtc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 76657dcdf9b0..665ddf8f347f 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -994,7 +994,7 @@ static const struct vc4_pv_data bcm2711_pv3_data = {
.fifo_depth = 64,
.pixels_per_clock = 1,
.encoder_types = {
-   [0] = VC4_ENCODER_TYPE_VEC,
+   [PV_CONTROL_CLK_SELECT_VEC] = VC4_ENCODER_TYPE_VEC,
},
 };
 
-- 
2.31.1



[PATCH 4/4] ARM: boot: dts: bcm2711: Add BCM2711 VEC compatible

2021-05-20 Thread Maxime Ripard
From: Mateusz Kwiatkowski 

The BCM2711 has a slightly different VEC than the one found in the older
SoCs. Now that we support the new variant, add its compatible to the
device tree.

Signed-off-by: Mateusz Kwiatkowski 
Signed-off-by: Maxime Ripard 
---
 arch/arm/boot/dts/bcm2711.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/bcm2711.dtsi b/arch/arm/boot/dts/bcm2711.dtsi
index 720beec54d61..0b6900815d19 100644
--- a/arch/arm/boot/dts/bcm2711.dtsi
+++ b/arch/arm/boot/dts/bcm2711.dtsi
@@ -1087,5 +1087,6 @@  {
 };
 
  {
+   compatible = "brcm,bcm2711-vec";
interrupts = ;
 };
-- 
2.31.1



[PATCH 3/4] drm/vc4: Separate VEC compatible variants

2021-05-20 Thread Maxime Ripard
From: Mateusz Kwiatkowski 

The VEC's DAC on BCM2711 is slightly different compared to the one on
BCM283x and needs different configuration. In particular, bit 3
(mask 0x8) switches the BCM2711 DAC input to "self-test input data",
which makes the output unusable. Separating two compatible variants in
devicetrees and the DRM driver was therefore necessary.

The configurations used for both variants have been borrowed from
Raspberry Pi (model 3B for BCM283x, 4B for BCM2711) firmware defaults.

Signed-off-by: Mateusz Kwiatkowski 
Signed-off-by: Maxime Ripard 
---
 drivers/gpu/drm/vc4/vc4_vec.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_vec.c b/drivers/gpu/drm/vc4/vc4_vec.c
index bd5b8eb58b18..a467ceba75e4 100644
--- a/drivers/gpu/drm/vc4/vc4_vec.c
+++ b/drivers/gpu/drm/vc4/vc4_vec.c
@@ -154,9 +154,14 @@
 #define VEC_DAC_MISC_DAC_RST_N BIT(0)
 
 
+struct vc4_vec_variant {
+   u32 dac_config;
+};
+
 /* General VEC hardware state. */
 struct vc4_vec {
struct platform_device *pdev;
+   const struct vc4_vec_variant *variant;
 
struct drm_encoder *encoder;
struct drm_connector *connector;
@@ -451,10 +456,7 @@ static void vc4_vec_encoder_enable(struct drm_encoder 
*encoder)
VEC_WRITE(VEC_CONFIG2,
  VEC_CONFIG2_UV_DIG_DIS | VEC_CONFIG2_RGB_DIG_DIS);
VEC_WRITE(VEC_CONFIG3, VEC_CONFIG3_HORIZ_LEN_STD);
-   VEC_WRITE(VEC_DAC_CONFIG,
- VEC_DAC_CONFIG_DAC_CTRL(0xc) |
- VEC_DAC_CONFIG_DRIVER_CTRL(0xc) |
- VEC_DAC_CONFIG_LDO_BIAS_CTRL(0x46));
+   VEC_WRITE(VEC_DAC_CONFIG, vec->variant->dac_config);
 
/* Mask all interrupts. */
VEC_WRITE(VEC_MASK0, 0);
@@ -507,8 +509,21 @@ static const struct drm_encoder_helper_funcs 
vc4_vec_encoder_helper_funcs = {
.atomic_mode_set = vc4_vec_encoder_atomic_mode_set,
 };
 
+static const struct vc4_vec_variant bcm2835_vec_variant = {
+   .dac_config = VEC_DAC_CONFIG_DAC_CTRL(0xc) |
+ VEC_DAC_CONFIG_DRIVER_CTRL(0xc) |
+ VEC_DAC_CONFIG_LDO_BIAS_CTRL(0x46)
+};
+
+static const struct vc4_vec_variant bcm2711_vec_variant = {
+   .dac_config = VEC_DAC_CONFIG_DAC_CTRL(0x0) |
+ VEC_DAC_CONFIG_DRIVER_CTRL(0x80) |
+ VEC_DAC_CONFIG_LDO_BIAS_CTRL(0x61)
+};
+
 static const struct of_device_id vc4_vec_dt_match[] = {
-   { .compatible = "brcm,bcm2835-vec", .data = NULL },
+   { .compatible = "brcm,bcm2835-vec", .data = _vec_variant },
+   { .compatible = "brcm,bcm2711-vec", .data = _vec_variant },
{ /* sentinel */ },
 };
 
@@ -546,6 +561,8 @@ static int vc4_vec_bind(struct device *dev, struct device 
*master, void *data)
vec->encoder = _vec_encoder->base.base;
 
vec->pdev = pdev;
+   vec->variant = (const struct vc4_vec_variant *)
+   of_device_get_match_data(dev);
vec->regs = vc4_ioremap_regs(pdev, 0);
if (IS_ERR(vec->regs))
return PTR_ERR(vec->regs);
-- 
2.31.1



  1   2   3   >