Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Thomas Hellström



On 2/16/23 08:11, Christian König wrote:

Am 15.02.23 um 20:00 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 19:30 +0100, Christian König wrote:

Am 15.02.23 um 19:12 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

When swapping out, we will split multi-order pages both in
order to
move them to the swap-cache and to be able to return memory to
the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can
be
nicer
to the system and avoid splitting gigantic pages.
On top of this we also
include the 64K page size in the page sizes tried, since that
appears to
be a common size for GPU applications.

Please completely drop that.

You mean the 64K page size, or the whole patch?

The 64K page size. This was an invention from Microsoft to
standardize
GPU handling ~15-20years ago.

It turned out to be a complete shipwreck and by now 2MiB and 1GiB
pages
or just flexible hardware which can handle everything seem to become
standard.


This is just nonsense spilling in from the
Windows drivers.

Agreed, but IIRC on the last RFC you asked me not to drop the 64K
pages, so that's why they are here. I can remove them if needed.

We could keep it if it's in any way beneficial, but I'm pretty sure I
must have been drunk to ask for that.


The only reason for keeping them from a performance point of view
is
better efficiency on GPUs with 64K page size if not using a
coalescing
IOMMU for dma-mapping.

Are any of those still produced? As far as I know neither NVidia,
Intel
nor AMD still assumes that page size in their hardware for quite a
while
now.

Intel still supports 64K PTEs, so we use them where possible, otherwise
falling back to 4K. Typically we have coalescing IOMMU enabled when
testing, so can't really see the impact, but TBH I was surprised by the
number of 64K page allocations TTM spat out with this patch series, so
I definitely think there is a performance impact with !IOMMU, although
I can't quantify it ATM.

So then if it's OK with you I'll keep that size for now.


If it makes 64K pages preferred then this is a pretty clear NAK.

What we can do is to support any page size up to at least 2MiB here.


OK, I'll use that latter approach then. I don't have any strong 
preferences here except the swapin helper wants to keep the max pagesize 
as low as possible since it needs to store one page worth of 4K swap 
entries.


/Thomas



Christian.



/Thomas




Regards,
Christian.


Let me know what you think is best and I'll adjust accordingly.

/Thomas



Christian.


Looking forward to when we might be able to swap out PMD size
folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström

---
    drivers/gpu/drm/ttm/ttm_pool.c | 58
++---
-
    1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
 * cause they are rather slow compared to alloc_pages+map.
 */
    +#define pr_fmt(fmt) "[TTM POOL] " fmt
+
    #include 
    #include 
    #include 
@@ -47,6 +49,18 @@
        #include "ttm_module.h"
    +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
    /**
 * struct ttm_pool_dma - Helper object for coherent DMA
mappings
 *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
        static atomic_long_t allocated_pages;
    -static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type
global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
    -static struct ttm_pool_type
global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type
global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type
global_dma32_uncached[TTM_DIM_ORDER];
        static spinlock_t shrinker_lock;
    static struct list_head shrinker_list;
    static struct shrinker mm_shrinker;
    +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
+
    /* Allocate pages of size 1 << order with the given
gfp_flags */
    static struct page *ttm_pool_alloc_page(struct ttm_pool
*pool,
gfp_t gfp_flags,
  unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct
ttm_pool
*pool, struct ttm_tt *tt,
  }
    }
    +static unsigned int 

Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Christian König

Am 15.02.23 um 20:00 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 19:30 +0100, Christian König wrote:

Am 15.02.23 um 19:12 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

When swapping out, we will split multi-order pages both in
order to
move them to the swap-cache and to be able to return memory to
the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can
be
nicer
to the system and avoid splitting gigantic pages.
On top of this we also
include the 64K page size in the page sizes tried, since that
appears to
be a common size for GPU applications.

Please completely drop that.

You mean the 64K page size, or the whole patch?

The 64K page size. This was an invention from Microsoft to
standardize
GPU handling ~15-20years ago.

It turned out to be a complete shipwreck and by now 2MiB and 1GiB
pages
or just flexible hardware which can handle everything seem to become
standard.


This is just nonsense spilling in from the
Windows drivers.

Agreed, but IIRC on the last RFC you asked me not to drop the 64K
pages, so that's why they are here. I can remove them if needed.

We could keep it if it's in any way beneficial, but I'm pretty sure I
must have been drunk to ask for that.


The only reason for keeping them from a performance point of view
is
better efficiency on GPUs with 64K page size if not using a
coalescing
IOMMU for dma-mapping.

Are any of those still produced? As far as I know neither NVidia,
Intel
nor AMD still assumes that page size in their hardware for quite a
while
now.

Intel still supports 64K PTEs, so we use them where possible, otherwise
falling back to 4K. Typically we have coalescing IOMMU enabled when
testing, so can't really see the impact, but TBH I was surprised by the
number of 64K page allocations TTM spat out with this patch series, so
I definitely think there is a performance impact with !IOMMU, although
I can't quantify it ATM.

So then if it's OK with you I'll keep that size for now.


If it makes 64K pages preferred then this is a pretty clear NAK.

What we can do is to support any page size up to at least 2MiB here.

Christian.



/Thomas




Regards,
Christian.


Let me know what you think is best and I'll adjust accordingly.

/Thomas



Christian.


Looking forward to when we might be able to swap out PMD size
folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström

---
    drivers/gpu/drm/ttm/ttm_pool.c | 58
++---
-
    1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
     * cause they are rather slow compared to alloc_pages+map.
     */

+#define pr_fmt(fmt) "[TTM POOL] " fmt

+
    #include 
    #include 
    #include 
@@ -47,6 +49,18 @@

    #include "ttm_module.h"

+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)

+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
    /**
     * struct ttm_pool_dma - Helper object for coherent DMA
mappings
     *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);

    static atomic_long_t allocated_pages;

-static struct ttm_pool_type global_write_combined[MAX_ORDER];

-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type
global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];

-static struct ttm_pool_type

global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type
global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type
global_dma32_uncached[TTM_DIM_ORDER];

    static spinlock_t shrinker_lock;

    static struct list_head shrinker_list;
    static struct shrinker mm_shrinker;

+static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};

+
    /* Allocate pages of size 1 << order with the given
gfp_flags */
    static struct page *ttm_pool_alloc_page(struct ttm_pool
*pool,
gfp_t gfp_flags,
  unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct
ttm_pool
*pool, struct ttm_tt *tt,
  }
    }

+static unsigned int ttm_pool_select_order(unsigned int order,

pgoff_t num_pages)
+{
+   unsigned int *cur_order = ttm_pool_orders;
+
+   order = min_t(unsigned int, __fls(num_pages), order);
+   while (order < *cur_order)
+   ++cur_order;
+
+   return *cur_order;
+}
+
    /**
     

Re: [PATCH] drm/amdgpu: make kobj_type structures constant

2023-02-15 Thread Christian König

Am 16.02.23 um 02:07 schrieb Thomas Weißschuh:

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 10 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |  2 +-
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 1bbd56029a4f..8e04952e5144 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -704,7 +704,7 @@ static void ip_hw_instance_release(struct kobject *kobj)
kfree(ip_hw_instance);
  }
  
-static struct kobj_type ip_hw_instance_ktype = {

+static const struct kobj_type ip_hw_instance_ktype = {
.release = ip_hw_instance_release,
.sysfs_ops = _hw_instance_sysfs_ops,
.default_groups = ip_hw_instance_groups,
@@ -723,7 +723,7 @@ static void ip_hw_id_release(struct kobject *kobj)
kfree(ip_hw_id);
  }
  
-static struct kobj_type ip_hw_id_ktype = {

+static const struct kobj_type ip_hw_id_ktype = {
.release = ip_hw_id_release,
.sysfs_ops = _sysfs_ops,
  };
@@ -786,18 +786,18 @@ static const struct sysfs_ops ip_die_entry_sysfs_ops = {
.show = ip_die_entry_attr_show,
  };
  
-static struct kobj_type ip_die_entry_ktype = {

+static const struct kobj_type ip_die_entry_ktype = {
.release = ip_die_entry_release,
.sysfs_ops = _die_entry_sysfs_ops,
.default_groups = ip_die_entry_groups,
  };
  
-static struct kobj_type die_kobj_ktype = {

+static const struct kobj_type die_kobj_ktype = {
.release = die_kobj_release,
.sysfs_ops = _sysfs_ops,
  };
  
-static struct kobj_type ip_discovery_ktype = {

+static const struct kobj_type ip_discovery_ktype = {
.release = ip_disc_release,
.sysfs_ops = _sysfs_ops,
  };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 4b9e7b050ccd..6d13ce6ec9cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -228,7 +228,7 @@ static const struct sysfs_ops amdgpu_xgmi_hive_ops = {
.show = amdgpu_xgmi_show_attrs,
  };
  
-struct kobj_type amdgpu_xgmi_hive_type = {

+static const struct kobj_type amdgpu_xgmi_hive_type = {
.release = amdgpu_xgmi_hive_release,
.sysfs_ops = _xgmi_hive_ops,
.default_groups = amdgpu_xgmi_hive_groups,

---
base-commit: 033c40a89f55525139fd5b6342281b09b97d05bf
change-id: 20230216-kobj_type-amdgpu-4d3f0e1e05d4

Best regards,




[pull] amdgpu drm-fixes-6.2

2023-02-15 Thread Alex Deucher
Hi Dave, Daniel,

A couple of warning fixes for 6.2.

The following changes since commit ceaa837f96adb69c0df0397937cd74991d5d821a:

  Linux 6.2-rc8 (2023-02-12 14:10:17 -0800)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-6.2-2023-02-15

for you to fetch changes up to 2a00299e7447395d0898e7c6214817c06a61a8e8:

  drm/amd/display: Fail atomic_check early on normalize_zpos error (2023-02-15 
22:46:42 -0500)


amd-drm-fixes-6.2-2023-02-15:

amdgpu:
- Fix GC11.x suspend warning
- Fix display warning


Jack Xiao (1):
  drm/amd/amdgpu: fix warning during suspend

Leo Li (1):
  drm/amd/display: Fail atomic_check early on normalize_zpos error

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 3 +++
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 2 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 +-
 3 files changed, 9 insertions(+), 2 deletions(-)


Re: [PATCH 0/3] drm/msm/dpu: Initialize SSPP scaler version (from register read)

2023-02-15 Thread Dmitry Baryshkov
On Thu, 16 Feb 2023 at 01:02, Marijn Suijten
 wrote:
>
> Random inspection of the SSPP code surfaced that the version field of
> dpu_scaler_blk was never assigned in the catalog, resulting in wrong
> codepaths to be taken within dpu_hw_setup_scaler3 based on a 0 version.
> Rectify this by reading an accurate value from a register (that is not
> equal to the values represented by DPU_SSPP_SCALER_QSEEDx enum
> variants) and deleting dead code around QSEED versioning.
>
> Future changes should likely get rid of the distinction between QSEED3
> and up, as these are now purely determined from the register value.
> Furthermore implementations could look at the scaler subblk .id field
> rather than the SSPP feature bits, which currently hold redundant
> information.
>
> ---
> Marijn Suijten (3):
>   drm/msm/dpu: Read previously-uninitialized SSPP scaler version from hw
>   drm/msm/dpu: Drop unused get_scaler_ver callback from SSPP
>   drm/msm/dpu: Drop unused qseed_type from catalog dpu_caps

The cleanup looks good. However as you are on it, maybe you can also
add patch 4, dropping DPU_SSPP_SCALER_QSEED3LITE and
DPU_SSPP_SCALER_QSEED4 in favour of using QSEED3 for all these
scalers? As we are going to use scaler_version to distinguish between
them, it would be logical not to duplicate that bit of information
(not to mention all the possible troubles if scaler_version disagrees
with the sblk->scaler_blk.id).

>
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 12 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |  4 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c| 12 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h|  9 +++--
>  4 files changed, 11 insertions(+), 26 deletions(-)
> ---
> base-commit: 9d9019bcea1aac7eed64a1a4966282b6b7b141c8
> change-id: 20230215-sspp-scaler-version-19f221585c5e
>
> Best regards,
> --
> Marijn Suijten 
>


-- 
With best wishes
Dmitry


Re: [PATCH 3/3] drm/msm/dpu: Drop unused qseed_type from catalog dpu_caps

2023-02-15 Thread Dmitry Baryshkov
On Thu, 16 Feb 2023 at 01:02, Marijn Suijten
 wrote:
>
> The SSPP scaler subblk is responsible for reporting its version (via the
> .id field, feature bits on the parent SSPP block, and since recently
> also from reading a register to supersede a read-but-unset version field
> in the catalog), leaving this global qseed_type field logically unused.
> Remove this dead code to lighten the catalog and bringup-overhead.
>
> Signed-off-by: Marijn Suijten 

Reviewed-by: Dmitry Baryshkov 

> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 12 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |  2 --
>  2 files changed, 14 deletions(-)

-- 
With best wishes
Dmitry


Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: Don't use BAR mappings for ring buffers with LLC

2023-02-15 Thread kernel test robot
Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]

url:
https://github.com/intel-lab-lkp/linux/commits/John-C-Harrison-Intel-com/drm-i915-Don-t-use-stolen-memory-for-ring-buffers-with-LLC/20230216-082552
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
patch link:
https://lore.kernel.org/r/20230216002248.1851966-3-John.C.Harrison%40Intel.com
patch subject: [Intel-gfx] [PATCH v2 2/2] drm/i915: Don't use BAR mappings for 
ring buffers with LLC
config: i386-randconfig-a011-20230213 
(https://download.01.org/0day-ci/archive/20230216/202302161021.tjavhrph-...@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# 
https://github.com/intel-lab-lkp/linux/commit/fa748ad303922e4138a246d4db247dfa96e45651
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
John-C-Harrison-Intel-com/drm-i915-Don-t-use-stolen-memory-for-ring-buffers-with-LLC/20230216-082552
git checkout fa748ad303922e4138a246d4db247dfa96e45651
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 olddefconfig
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-kbuild-all/202302161021.tjavhrph-...@intel.com/

All errors (new ones prefixed by >>):

   drivers/gpu/drm/i915/gt/intel_ring.c: In function 'intel_ring_unpin':
>> drivers/gpu/drm/i915/gt/intel_ring.c:103:9: error: expected '}' before 'else'
 103 | else
 | ^~~~


vim +103 drivers/gpu/drm/i915/gt/intel_ring.c

2871ea85c119e6f Chris Wilson   2019-10-24   92  
2871ea85c119e6f Chris Wilson   2019-10-24   93  void 
intel_ring_unpin(struct intel_ring *ring)
2871ea85c119e6f Chris Wilson   2019-10-24   94  {
2871ea85c119e6f Chris Wilson   2019-10-24   95  struct i915_vma 
*vma = ring->vma;
2871ea85c119e6f Chris Wilson   2019-10-24   96  
2871ea85c119e6f Chris Wilson   2019-10-24   97  if 
(!atomic_dec_and_test(>pin_count))
2871ea85c119e6f Chris Wilson   2019-10-24   98  return;
2871ea85c119e6f Chris Wilson   2019-10-24   99  
2871ea85c119e6f Chris Wilson   2019-10-24  100  
i915_vma_unset_ggtt_write(vma);
fa748ad303922e4 Daniele Ceraolo Spurio 2023-02-15  101  if 
(i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
2871ea85c119e6f Chris Wilson   2019-10-24  102  
i915_vma_unpin_iomap(vma);
2871ea85c119e6f Chris Wilson   2019-10-24 @103  else
2871ea85c119e6f Chris Wilson   2019-10-24  104  
i915_gem_object_unpin_map(vma->obj);
2871ea85c119e6f Chris Wilson   2019-10-24  105  
2871ea85c119e6f Chris Wilson   2019-10-24  106  
i915_vma_make_purgeable(vma);
a266bf420060043 Chris Wilson   2019-11-18  107  
i915_vma_unpin(vma);
2871ea85c119e6f Chris Wilson   2019-10-24  108  }
2871ea85c119e6f Chris Wilson   2019-10-24  109  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


Re: [PATCH 2/3] drm/msm/dpu: Drop unused get_scaler_ver callback from SSPP

2023-02-15 Thread Dmitry Baryshkov
On Thu, 16 Feb 2023 at 01:02, Marijn Suijten
 wrote:
>
> This pointer callback is never used and should be removed.  The helper
> _dpu_hw_sspp_get_scaler3_ver function is retained as it is being used by
> dpu_hw_sspp_init which didn't itself compute _sspp_subblk_offset yet.
>
> Signed-off-by: Marijn Suijten 

Reviewed-by: Dmitry Baryshkov 

> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c | 4 +---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h | 6 --
>  2 files changed, 1 insertion(+), 9 deletions(-)


-- 
With best wishes
Dmitry


Re: [PATCH 1/3] drm/msm/dpu: Read previously-uninitialized SSPP scaler version from hw

2023-02-15 Thread Dmitry Baryshkov
On Thu, 16 Feb 2023 at 01:02, Marijn Suijten
 wrote:
>
> DPU's catalog never assigned dpu_scaler_blk::version leading to
> initialization code in dpu_hw_setup_scaler3 to wander the wrong
> codepaths.  Instead of hardcoding the correct QSEED algorithm version,
> read it back from a hardware register.
>
> Note that this register is only available starting with QSEED3, where
> 0x1002 corresponds to QSEED3, 0x2004 to QSEED3LITE and 0x3000 to QSEED4.

This is not purely accurate. 0x1003 (sdm845) also corresponds to QSEED3.
I'd say instead that there are several variations of QSEED3 scalers,
where starting from 0x2004 it is called QSEED3LITE and starting from
0x3000 it is called QSEED4.

>
> Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support")
> Signed-off-by: Marijn Suijten 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h | 2 --
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c| 8 +++-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h| 3 +++
>  3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
> index ddab9caebb18..96ce1766f4a1 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
> @@ -324,11 +324,9 @@ struct dpu_src_blk {
>  /**
>   * struct dpu_scaler_blk: Scaler information
>   * @info:   HW register and features supported by this sub-blk
> - * @version: qseed block revision
>   */
>  struct dpu_scaler_blk {
> DPU_HW_SUBBLK_INFO;
> -   u32 version;

No. Please keep the version in the scaler subblk.  It is a version of
the QSEED (scaler block), not the SSPP's version.

There is a block called DS (destination scaler), which can be used to
scale the resulting image after the LM. This block also uses the
QSEED3(,LITE,4) scaler block.

>  };
>
>  struct dpu_csc_blk {
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
> index 4246ab0b3bee..d4e181e1378c 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
> @@ -430,7 +430,7 @@ static void _dpu_hw_sspp_setup_scaler3(struct dpu_hw_pipe 
> *ctx,
> return;
>
> dpu_hw_setup_scaler3(>hw, scaler3_cfg, idx,
> -   ctx->cap->sblk->scaler_blk.version,
> +   ctx->version,
> sspp->layout.format);
>  }
>
> @@ -807,6 +807,12 @@ struct dpu_hw_pipe *dpu_hw_sspp_init(enum dpu_sspp idx,
> hw_pipe->mdp = >mdp[0];
> hw_pipe->idx = idx;
> hw_pipe->cap = cfg;
> +
> +   if (test_bit(DPU_SSPP_SCALER_QSEED3, >features) ||
> +   test_bit(DPU_SSPP_SCALER_QSEED3LITE, >features) 
> ||
> +   test_bit(DPU_SSPP_SCALER_QSEED4, >features))
> +   hw_pipe->version = _dpu_hw_sspp_get_scaler3_ver(hw_pipe);
> +
> _setup_layer_ops(hw_pipe, hw_pipe->cap->features);
>
> return hw_pipe;
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
> index 0c95b7e64f6c..eeaf16c6af15 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
> @@ -352,6 +352,7 @@ struct dpu_hw_sspp_ops {
>   * @hw: block hardware details
>   * @catalog: back pointer to catalog
>   * @mdp: pointer to associated mdp portion of the catalog
> + * @version: qseed block revision
>   * @idx: pipe index
>   * @cap: pointer to layer_cfg
>   * @ops: pointer to operations possible for this pipe
> @@ -362,6 +363,8 @@ struct dpu_hw_pipe {
> const struct dpu_mdss_cfg *catalog;
> const struct dpu_mdp_cfg *mdp;
>
> +   u32 version;
> +
> /* Pipe */
> enum dpu_sspp idx;
> const struct dpu_sspp_cfg *cap;
>
> --
> 2.39.2
>


-- 
With best wishes
Dmitry


[PATCH v3 2/2] drm/i915: Don't use BAR mappings for ring buffers with LLC

2023-02-15 Thread John . C . Harrison
From: John Harrison 

Direction from hardware is that ring buffers should never be mapped
via the BAR on systems with LLC. There are too many caching pitfalls
due to the way BAR accesses are routed. So it is safest to just not
use it.

Signed-off-by: John Harrison 
Fixes: 9d80841ea4c9 ("drm/i915: Allow ringbuffers to be bound anywhere")
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: intel-...@lists.freedesktop.org
Cc:  # v4.9+
---
 drivers/gpu/drm/i915/gt/intel_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index fb1d2595392ed..fb99143be98e7 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -53,7 +53,7 @@ int intel_ring_pin(struct intel_ring *ring, struct 
i915_gem_ww_ctx *ww)
if (unlikely(ret))
goto err_unpin;
 
-   if (i915_vma_is_map_and_fenceable(vma)) {
+   if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
addr = (void __force *)i915_vma_pin_iomap(vma);
} else {
int type = i915_coherent_map_type(vma->vm->i915, vma->obj, 
false);
@@ -98,7 +98,7 @@ void intel_ring_unpin(struct intel_ring *ring)
return;
 
i915_vma_unset_ggtt_write(vma);
-   if (i915_vma_is_map_and_fenceable(vma))
+   if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915))
i915_vma_unpin_iomap(vma);
else
i915_gem_object_unpin_map(vma->obj);
-- 
2.39.1



[PATCH v3 1/2] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-02-15 Thread John . C . Harrison
From: John Harrison 

Direction from hardware is that stolen memory should never be used for
ring buffer allocations on platforms with LLC. There are too many
caching pitfalls due to the way stolen memory accesses are routed. So
it is safest to just not use it.

Signed-off-by: John Harrison 
Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: intel-...@lists.freedesktop.org
Cc:  # v4.9+
---
 drivers/gpu/drm/i915/gt/intel_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c44..fb1d2595392ed 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -116,7 +116,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt 
*ggtt, int size)
 
obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
  I915_BO_ALLOC_PM_VOLATILE);
-   if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
+   if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt) && !HAS_LLC(i915))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
-- 
2.39.1



[PATCH v3 0/2] Don't use stolen memory or BAR mappings for ring buffers

2023-02-15 Thread John . C . Harrison
From: John Harrison 

Instruction from hardware arch is that stolen memory and BAR mappings
are unsafe for use as ring buffers. There can be issues with cache
aliasing due to the CPU access going to memory via the BAR. So, don't
do it.

v2: Dont use BAR mappings either.
Make conditional on LLC so as not to change platforms that don't need
to change (Daniele).
Add 'Fixes' tags (Tvrtko).
v3: Fix dumb typo.

Signed-off-by: John Harrison 


John Harrison (2):
  drm/i915: Don't use stolen memory for ring buffers with LLC
  drm/i915: Don't use BAR mappings for ring buffers with LLC

 drivers/gpu/drm/i915/gt/intel_ring.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.39.1



[PATCH] drm/amdkfd: Make kobj_type structures constant

2023-02-15 Thread Thomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  8 
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 51b1683ac5c1..8d719f90db40 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -344,7 +344,7 @@ static const struct sysfs_ops kfd_procfs_ops = {
.show = kfd_procfs_show,
 };
 
-static struct kobj_type procfs_type = {
+static const struct kobj_type procfs_type = {
.release = kfd_procfs_kobj_release,
.sysfs_ops = _procfs_ops,
 };
@@ -469,7 +469,7 @@ static const struct sysfs_ops procfs_queue_ops = {
.show = kfd_procfs_queue_show,
 };
 
-static struct kobj_type procfs_queue_type = {
+static const struct kobj_type procfs_queue_type = {
.sysfs_ops = _queue_ops,
.default_groups = procfs_queue_groups,
 };
@@ -478,7 +478,7 @@ static const struct sysfs_ops procfs_stats_ops = {
.show = kfd_procfs_stats_show,
 };
 
-static struct kobj_type procfs_stats_type = {
+static const struct kobj_type procfs_stats_type = {
.sysfs_ops = _stats_ops,
.release = kfd_procfs_kobj_release,
 };
@@ -487,7 +487,7 @@ static const struct sysfs_ops sysfs_counters_ops = {
.show = kfd_sysfs_counters_show,
 };
 
-static struct kobj_type sysfs_counters_type = {
+static const struct kobj_type sysfs_counters_type = {
.sysfs_ops = _counters_ops,
.release = kfd_procfs_kobj_release,
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3fdaba56be6f..8e4124dcb6e4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -278,7 +278,7 @@ static const struct sysfs_ops sysprops_ops = {
.show = sysprops_show,
 };
 
-static struct kobj_type sysprops_type = {
+static const struct kobj_type sysprops_type = {
.release = kfd_topology_kobj_release,
.sysfs_ops = _ops,
 };
@@ -318,7 +318,7 @@ static const struct sysfs_ops iolink_ops = {
.show = iolink_show,
 };
 
-static struct kobj_type iolink_type = {
+static const struct kobj_type iolink_type = {
.release = kfd_topology_kobj_release,
.sysfs_ops = _ops,
 };
@@ -350,7 +350,7 @@ static const struct sysfs_ops mem_ops = {
.show = mem_show,
 };
 
-static struct kobj_type mem_type = {
+static const struct kobj_type mem_type = {
.release = kfd_topology_kobj_release,
.sysfs_ops = _ops,
 };
@@ -395,7 +395,7 @@ static const struct sysfs_ops cache_ops = {
.show = kfd_cache_show,
 };
 
-static struct kobj_type cache_type = {
+static const struct kobj_type cache_type = {
.release = kfd_topology_kobj_release,
.sysfs_ops = _ops,
 };
@@ -566,7 +566,7 @@ static const struct sysfs_ops node_ops = {
.show = node_show,
 };
 
-static struct kobj_type node_type = {
+static const struct kobj_type node_type = {
.release = kfd_topology_kobj_release,
.sysfs_ops = _ops,
 };

---
base-commit: 033c40a89f55525139fd5b6342281b09b97d05bf
change-id: 20230216-kobj_type-amdkfd-abd9fe9ab060

Best regards,
-- 
Thomas Weißschuh 



[PATCH] drm/amdgpu: make kobj_type structures constant

2023-02-15 Thread Thomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 1bbd56029a4f..8e04952e5144 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -704,7 +704,7 @@ static void ip_hw_instance_release(struct kobject *kobj)
kfree(ip_hw_instance);
 }
 
-static struct kobj_type ip_hw_instance_ktype = {
+static const struct kobj_type ip_hw_instance_ktype = {
.release = ip_hw_instance_release,
.sysfs_ops = _hw_instance_sysfs_ops,
.default_groups = ip_hw_instance_groups,
@@ -723,7 +723,7 @@ static void ip_hw_id_release(struct kobject *kobj)
kfree(ip_hw_id);
 }
 
-static struct kobj_type ip_hw_id_ktype = {
+static const struct kobj_type ip_hw_id_ktype = {
.release = ip_hw_id_release,
.sysfs_ops = _sysfs_ops,
 };
@@ -786,18 +786,18 @@ static const struct sysfs_ops ip_die_entry_sysfs_ops = {
.show = ip_die_entry_attr_show,
 };
 
-static struct kobj_type ip_die_entry_ktype = {
+static const struct kobj_type ip_die_entry_ktype = {
.release = ip_die_entry_release,
.sysfs_ops = _die_entry_sysfs_ops,
.default_groups = ip_die_entry_groups,
 };
 
-static struct kobj_type die_kobj_ktype = {
+static const struct kobj_type die_kobj_ktype = {
.release = die_kobj_release,
.sysfs_ops = _sysfs_ops,
 };
 
-static struct kobj_type ip_discovery_ktype = {
+static const struct kobj_type ip_discovery_ktype = {
.release = ip_disc_release,
.sysfs_ops = _sysfs_ops,
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 4b9e7b050ccd..6d13ce6ec9cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -228,7 +228,7 @@ static const struct sysfs_ops amdgpu_xgmi_hive_ops = {
.show = amdgpu_xgmi_show_attrs,
 };
 
-struct kobj_type amdgpu_xgmi_hive_type = {
+static const struct kobj_type amdgpu_xgmi_hive_type = {
.release = amdgpu_xgmi_hive_release,
.sysfs_ops = _xgmi_hive_ops,
.default_groups = amdgpu_xgmi_hive_groups,

---
base-commit: 033c40a89f55525139fd5b6342281b09b97d05bf
change-id: 20230216-kobj_type-amdgpu-4d3f0e1e05d4

Best regards,
-- 
Thomas Weißschuh 



[PATCH] drm/i915: Make kobj_type structures constant

2023-02-15 Thread Thomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh 
---
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c | 2 +-
 drivers/gpu/drm/i915/gt/sysfs_engines.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
index 9486dd3bed99..df15b17caf89 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -71,7 +71,7 @@ static void kobj_gt_release(struct kobject *kobj)
 {
 }
 
-static struct kobj_type kobj_gt_type = {
+static const struct kobj_type kobj_gt_type = {
.release = kobj_gt_release,
.sysfs_ops = _sysfs_ops,
.default_groups = id_groups,
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index f2d9858d827c..b5e0fe5dbf6c 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -421,7 +421,7 @@ static void kobj_engine_release(struct kobject *kobj)
kfree(kobj);
 }
 
-static struct kobj_type kobj_engine_type = {
+static const struct kobj_type kobj_engine_type = {
.release = kobj_engine_release,
.sysfs_ops = _sysfs_ops
 };

---
base-commit: 033c40a89f55525139fd5b6342281b09b97d05bf
change-id: 20230216-kobj_type-i915-886bebc36129

Best regards,
-- 
Thomas Weißschuh 



[PATCH v2 0/2] Don't use stolen memory or BAR mappings for ring buffers

2023-02-15 Thread John . C . Harrison
From: John Harrison 

Instruction from hardware arch is that stolen memory and BAR mappings
are unsafe for use as ring buffers. There can be issues with cache
aliasing due to the CPU access going to memory via the BAR. So, don't
do it.

v2: Dont use BAR mappings either.
Make conditional on LLC so as not to change platforms that don't need
to change (Daniele).
Add 'Fixes' tags (Tvrtko).

Signed-off-by: John Harrison 


Daniele Ceraolo Spurio (1):
  drm/i915: Don't use BAR mappings for ring buffers with LLC

John Harrison (1):
  drm/i915: Don't use stolen memory for ring buffers with LLC

 drivers/gpu/drm/i915/gt/intel_ring.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.39.1



[PATCH v2 2/2] drm/i915: Don't use BAR mappings for ring buffers with LLC

2023-02-15 Thread John . C . Harrison
From: Daniele Ceraolo Spurio 

Direction from hardware is that ring buffers should never be mapped
via the BAR on systems with LLC. There are too many caching pitfalls
due to the way BAR accesses are routed. So it is safest to just not
use it.

Signed-off-by: John Harrison 
Fixes: 9d80841ea4c9 ("drm/i915: Allow ringbuffers to be bound anywhere")
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: intel-...@lists.freedesktop.org
Cc:  # v4.9+
---
 drivers/gpu/drm/i915/gt/intel_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index fb1d2595392ed..8675ec8ead353 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -53,7 +53,7 @@ int intel_ring_pin(struct intel_ring *ring, struct 
i915_gem_ww_ctx *ww)
if (unlikely(ret))
goto err_unpin;
 
-   if (i915_vma_is_map_and_fenceable(vma)) {
+   if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
addr = (void __force *)i915_vma_pin_iomap(vma);
} else {
int type = i915_coherent_map_type(vma->vm->i915, vma->obj, 
false);
@@ -98,7 +98,7 @@ void intel_ring_unpin(struct intel_ring *ring)
return;
 
i915_vma_unset_ggtt_write(vma);
-   if (i915_vma_is_map_and_fenceable(vma))
+   if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
i915_vma_unpin_iomap(vma);
else
i915_gem_object_unpin_map(vma->obj);
-- 
2.39.1



[PATCH v2 1/2] drm/i915: Don't use stolen memory for ring buffers with LLC

2023-02-15 Thread John . C . Harrison
From: John Harrison 

Direction from hardware is that stolen memory should never be used for
ring buffer allocations on platforms with LLC. There are too many
caching pitfalls due to the way stolen memory accesses are routed. So
it is safest to just not use it.

Signed-off-by: John Harrison 
Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: intel-...@lists.freedesktop.org
Cc:  # v4.9+
---
 drivers/gpu/drm/i915/gt/intel_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c44..fb1d2595392ed 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -116,7 +116,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt 
*ggtt, int size)
 
obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
  I915_BO_ALLOC_PM_VOLATILE);
-   if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
+   if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt) && !HAS_LLC(i915))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
-- 
2.39.1



[PATCH] drm/msm: Fix potential invalid ptr free

2023-02-15 Thread Rob Clark
From: Rob Clark 

The error path cleanup expects that chain and syncobj are either NULL or
valid pointers.  But post_deps was not allocated with __GFP_ZERO.

Fixes: ab723b7a992a ("drm/msm: Add syncobj support.")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 6503220e5a4b..e4d13540300e 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -640,8 +640,8 @@ static struct msm_submit_post_dep 
*msm_parse_post_deps(struct drm_device *dev,
int ret = 0;
uint32_t i, j;
 
-   post_deps = kmalloc_array(nr_syncobjs, sizeof(*post_deps),
- GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
+   post_deps = kcalloc(nr_syncobjs, sizeof(*post_deps),
+   GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
if (!post_deps)
return ERR_PTR(-ENOMEM);
 
@@ -656,7 +656,6 @@ static struct msm_submit_post_dep 
*msm_parse_post_deps(struct drm_device *dev,
}
 
post_deps[i].point = syncobj_desc.point;
-   post_deps[i].chain = NULL;
 
if (syncobj_desc.flags) {
ret = -EINVAL;
-- 
2.39.1



[PATCH 0/3] drm/msm/dpu: Initialize SSPP scaler version (from register read)

2023-02-15 Thread Marijn Suijten
Random inspection of the SSPP code surfaced that the version field of
dpu_scaler_blk was never assigned in the catalog, resulting in wrong
codepaths to be taken within dpu_hw_setup_scaler3 based on a 0 version.
Rectify this by reading an accurate value from a register (that is not
equal to the values represented by DPU_SSPP_SCALER_QSEEDx enum
variants) and deleting dead code around QSEED versioning.

Future changes should likely get rid of the distinction between QSEED3
and up, as these are now purely determined from the register value.
Furthermore implementations could look at the scaler subblk .id field
rather than the SSPP feature bits, which currently hold redundant
information.

---
Marijn Suijten (3):
  drm/msm/dpu: Read previously-uninitialized SSPP scaler version from hw
  drm/msm/dpu: Drop unused get_scaler_ver callback from SSPP
  drm/msm/dpu: Drop unused qseed_type from catalog dpu_caps

 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 12 
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |  4 
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c| 12 
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h|  9 +++--
 4 files changed, 11 insertions(+), 26 deletions(-)
---
base-commit: 9d9019bcea1aac7eed64a1a4966282b6b7b141c8
change-id: 20230215-sspp-scaler-version-19f221585c5e

Best regards,
-- 
Marijn Suijten 



[PATCH 3/3] drm/msm/dpu: Drop unused qseed_type from catalog dpu_caps

2023-02-15 Thread Marijn Suijten
The SSPP scaler subblk is responsible for reporting its version (via the
.id field, feature bits on the parent SSPP block, and since recently
also from reading a register to supersede a read-but-unset version field
in the catalog), leaving this global qseed_type field logically unused.
Remove this dead code to lighten the catalog and bringup-overhead.

Signed-off-by: Marijn Suijten 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 12 
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |  2 --
 2 files changed, 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index cf053e8f081e..bd57a4cce4a9 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -300,7 +300,6 @@ static const uint32_t wb2_formats[] = {
 static const struct dpu_caps msm8998_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0x7,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V1,
.ubwc_version = DPU_HW_UBWC_VER_10,
.has_src_split = true,
@@ -327,7 +326,6 @@ static const struct dpu_caps qcm2290_dpu_caps = {
 static const struct dpu_caps sdm845_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2,
.ubwc_version = DPU_HW_UBWC_VER_20,
.has_src_split = true,
@@ -343,7 +341,6 @@ static const struct dpu_caps sdm845_dpu_caps = {
 static const struct dpu_caps sc7180_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0x9,
-   .qseed_type = DPU_SSPP_SCALER_QSEED4,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2,
.ubwc_version = DPU_HW_UBWC_VER_20,
.has_dim_layer = true,
@@ -355,7 +352,6 @@ static const struct dpu_caps sc7180_dpu_caps = {
 static const struct dpu_caps sm6115_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0x4,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3LITE,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_10,
.has_dim_layer = true,
@@ -367,7 +363,6 @@ static const struct dpu_caps sm6115_dpu_caps = {
 static const struct dpu_caps sm8150_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_30,
.has_src_split = true,
@@ -383,7 +378,6 @@ static const struct dpu_caps sm8150_dpu_caps = {
 static const struct dpu_caps sc8180x_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_30,
.has_src_split = true,
@@ -399,7 +393,6 @@ static const struct dpu_caps sc8180x_dpu_caps = {
 static const struct dpu_caps sc8280xp_dpu_caps = {
.max_mixer_width = 2560,
.max_mixer_blendstages = 11,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3LITE,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_40,
.has_src_split = true,
@@ -413,7 +406,6 @@ static const struct dpu_caps sc8280xp_dpu_caps = {
 static const struct dpu_caps sm8250_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3LITE,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_40,
.has_src_split = true,
@@ -427,7 +419,6 @@ static const struct dpu_caps sm8250_dpu_caps = {
 static const struct dpu_caps sm8350_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = DPU_SSPP_SCALER_QSEED3LITE,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_40,
.has_src_split = true,
@@ -441,7 +432,6 @@ static const struct dpu_caps sm8350_dpu_caps = {
 static const struct dpu_caps sm8450_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = DPU_SSPP_SCALER_QSEED4,
.smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_40,
.has_src_split = true,
@@ -455,7 +445,6 @@ static const struct dpu_caps sm8450_dpu_caps = {
 static const struct dpu_caps sm8550_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
-   .qseed_type = 

[PATCH 2/3] drm/msm/dpu: Drop unused get_scaler_ver callback from SSPP

2023-02-15 Thread Marijn Suijten
This pointer callback is never used and should be removed.  The helper
_dpu_hw_sspp_get_scaler3_ver function is retained as it is being used by
dpu_hw_sspp_init which didn't itself compute _sspp_subblk_offset yet.

Signed-off-by: Marijn Suijten 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c | 4 +---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h | 6 --
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
index d4e181e1378c..00e5dc2318db 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
@@ -689,10 +689,8 @@ static void _setup_layer_ops(struct dpu_hw_pipe *c,
 
if (test_bit(DPU_SSPP_SCALER_QSEED3, ) ||
test_bit(DPU_SSPP_SCALER_QSEED3LITE, ) ||
-   test_bit(DPU_SSPP_SCALER_QSEED4, )) {
+   test_bit(DPU_SSPP_SCALER_QSEED4, ))
c->ops.setup_scaler = _dpu_hw_sspp_setup_scaler3;
-   c->ops.get_scaler_ver = _dpu_hw_sspp_get_scaler3_ver;
-   }
 
if (test_bit(DPU_SSPP_CDP, ))
c->ops.setup_cdp = dpu_hw_sspp_setup_cdp;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
index eeaf16c6af15..bebb62c09dd8 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
@@ -329,12 +329,6 @@ struct dpu_hw_sspp_ops {
struct dpu_hw_pipe_cfg *pipe_cfg,
void *scaler_cfg);
 
-   /**
-* get_scaler_ver - get scaler h/w version
-* @ctx: Pointer to pipe context
-*/
-   u32 (*get_scaler_ver)(struct dpu_hw_pipe *ctx);
-
/**
 * setup_cdp - setup client driven prefetch
 * @ctx: Pointer to pipe context

-- 
2.39.2



[PATCH 1/3] drm/msm/dpu: Read previously-uninitialized SSPP scaler version from hw

2023-02-15 Thread Marijn Suijten
DPU's catalog never assigned dpu_scaler_blk::version leading to
initialization code in dpu_hw_setup_scaler3 to wander the wrong
codepaths.  Instead of hardcoding the correct QSEED algorithm version,
read it back from a hardware register.

Note that this register is only available starting with QSEED3, where
0x1002 corresponds to QSEED3, 0x2004 to QSEED3LITE and 0x3000 to QSEED4.

Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support")
Signed-off-by: Marijn Suijten 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h | 2 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c| 8 +++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h| 3 +++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index ddab9caebb18..96ce1766f4a1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -324,11 +324,9 @@ struct dpu_src_blk {
 /**
  * struct dpu_scaler_blk: Scaler information
  * @info:   HW register and features supported by this sub-blk
- * @version: qseed block revision
  */
 struct dpu_scaler_blk {
DPU_HW_SUBBLK_INFO;
-   u32 version;
 };
 
 struct dpu_csc_blk {
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
index 4246ab0b3bee..d4e181e1378c 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
@@ -430,7 +430,7 @@ static void _dpu_hw_sspp_setup_scaler3(struct dpu_hw_pipe 
*ctx,
return;
 
dpu_hw_setup_scaler3(>hw, scaler3_cfg, idx,
-   ctx->cap->sblk->scaler_blk.version,
+   ctx->version,
sspp->layout.format);
 }
 
@@ -807,6 +807,12 @@ struct dpu_hw_pipe *dpu_hw_sspp_init(enum dpu_sspp idx,
hw_pipe->mdp = >mdp[0];
hw_pipe->idx = idx;
hw_pipe->cap = cfg;
+
+   if (test_bit(DPU_SSPP_SCALER_QSEED3, >features) ||
+   test_bit(DPU_SSPP_SCALER_QSEED3LITE, >features) ||
+   test_bit(DPU_SSPP_SCALER_QSEED4, >features))
+   hw_pipe->version = _dpu_hw_sspp_get_scaler3_ver(hw_pipe);
+
_setup_layer_ops(hw_pipe, hw_pipe->cap->features);
 
return hw_pipe;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
index 0c95b7e64f6c..eeaf16c6af15 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.h
@@ -352,6 +352,7 @@ struct dpu_hw_sspp_ops {
  * @hw: block hardware details
  * @catalog: back pointer to catalog
  * @mdp: pointer to associated mdp portion of the catalog
+ * @version: qseed block revision
  * @idx: pipe index
  * @cap: pointer to layer_cfg
  * @ops: pointer to operations possible for this pipe
@@ -362,6 +363,8 @@ struct dpu_hw_pipe {
const struct dpu_mdss_cfg *catalog;
const struct dpu_mdp_cfg *mdp;
 
+   u32 version;
+
/* Pipe */
enum dpu_sspp idx;
const struct dpu_sspp_cfg *cap;

-- 
2.39.2



Re: [PATCH] drm/nouveau/led: explicitly include linux/leds.h

2023-02-15 Thread Lyude Paul
Reviewed-by: Lyude Paul 

Will push to drm-misc-next in a moment

On Wed, 2023-02-15 at 01:04 +, Thomas Weißschuh wrote:
> Instead of relying on an accidental, transitive inclusion of linux/leds.h
> use it directly.
> 
> Also drop the forware definition of struct led_classdev that is now
> provided by linux/leds.h.
> 
> Signed-off-by: Thomas Weißschuh 
> ---
>  drivers/gpu/drm/nouveau/nouveau_led.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_led.h 
> b/drivers/gpu/drm/nouveau/nouveau_led.h
> index 21a5775028cc..bc9bc7208da3 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_led.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_led.h
> @@ -27,7 +27,7 @@
>  
>  #include "nouveau_drv.h"
>  
> -struct led_classdev;
> +#include 
>  
>  struct nouveau_led {
>   struct drm_device *dev;
> 
> ---
> base-commit: e1c04510f521e853019afeca2a5991a5ef8d6a5b
> change-id: 20230215-power_supply-leds-nouveau-ff4995ba0794
> 
> Best regards,

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat



Re: [PATCH 3/3] drm/connector: Deprecate split for BT.2020 in drm_colorspace enum

2023-02-15 Thread Daniel Stone
On Wed, 15 Feb 2023 at 20:54, Harry Wentland  wrote:
> On 2/15/23 06:46, Daniel Stone wrote:
> > On Tue, 14 Feb 2023 at 16:57, Harry Wentland  wrote:
> >> On 2/14/23 10:49, Sebastian Wick wrote:
> >> From what I've seen recently I am inclined to favor an incremental
> >> approach more. The reason is that any API, or portion thereof, is
> >> useless unless it's enabled full stack. When it isn't it becomes
> >> dead code quickly, or never really works because we overlooked
> >> one thing. The colorspace debacle shows how even something as
> >> simple as extra enum values in KMS APIs shouldn't be added unless
> >> someone in a canonical upstream project actually uses them. I
> >> would argue that such a canonical upstream project actually has
> >> to be a production environment and not something like Weston.
> >
> > Just to chime in as well that it is a real production environment;
> > it's probably actually shipped the most of any compositor by a long
> > way. It doesn't have much place on the desktop, but it does live in
> > planes, trains, automobiles, digital signage, kiosks, STBs/TVs, and
> > about a billion other places you might not have expected.
> >
>
> Understood.
>
> Curious if there's a list of some concrete examples.

If I was allowed to name them, I'd definitely be doing a much better
job of promoting it ... but if you've bought a car in the last 7-8
years, it's much more likely than not that its console display is
using Weston. Probably about 50% odds that you've flown on a plane
whose IFE is driven by Weston. You've definitely walked past a lot of
digital signage advertisements and display walls which are driven by
Weston. There are a huge number of consumer products (and other modes
of transport, would you believe?) that are too, but I can't name them
because it gets too specific.

The cars are probably using a 10+ year old (and frankly awful) SoC.
The display walls are probably using a 6ish-year-old SoC with
notoriously poor memory bandwidth. Or TVs trying to make 4K UIs fly on
an ancient (pre-unified-shader) GPU. The hits go on. We do ship things
on nice and capable new hardware as well, but keeping old hardware
working with new software stacks is non-negotiable for us, and we have
to bend over backwards to make that happen.

> >> We should look at this from a use-case angle, similar to what
> >> the gamescope guys are doing. Small steps, like:
> >> 1) Add HDR10 output (PQ, BT.2020) to the display
> >> 2) Add ability to do sRGB linear blending
> >> 3) Add ability to do sRGB and PQ linear blending
> >> 4) Post-blending 3D LUT
> >> 5) Pre-blending 3D LUT
> >>
> >> At each stage the whole stack needs to work together in production.
> >
> > Personally, I do think at this stage we probably have enough of an
> > understanding to be able to work with an intermediate solution. We
> > just need to think hard about what that intermediate solution is -
> > making sure that we don't end up in the same tangle of impossible
> > semantics like the old 'broadcast RGB' / colorspace / HDR properties
> > which were never thought through - so that it is something we can
> > build on rather than something we have to work around. But it would be
> > really good to make HDR10/HDR10+ media and HDR games work on HDR
> > displays, yeah.
>
> I have a feeling we'll make some progress here this year. I definitely
> think the whole HDR/Colour work is on the right track in Weston and
> Wayland which will hopefully give us a good base to work with over
> many years.

Yep!

Coming to the point you were making in the other mail - Weston was
traditionally used as _the_ enablement vehicle for KMS, because we
cared about using the depth of hardware much more than anyone else
(e.g. being years ahead on planes), and the vendor who wanted to
enable it either wanted to enable Weston specifically or just didn't
have an open userspace stack for it. The other compositors couldn't be
that vehicle, either because they were more focused on desktop UI, or
they could just afford to throw the GPU at it and suck up the
occasional frame hitch / thermal burn / etc. I like to think we had a
reputation for being pretty thoughtful and careful with our review as
well, and didn't give it lightly to misguided ideas which caused
long-term problems.

But we've got a greater diversity in userspace these days, and that's
no bad thing. If the best vehicle to demonstrate HDR GPU rendering is
gamescope, then use gamescope as that vehicle. We'll be there if we
can, and if it makes sense for us, but it's not a requirement.

Cheers,
Daniel


Re: [PATCH] drm/i915/xelpmp: Consider GSI offset when doing MCR lookups

2023-02-15 Thread Matt Roper
On Wed, Feb 15, 2023 at 11:48:13AM -0800, Sripada, Radhakrishna wrote:
> 
> 
> > -Original Message-
> > From: dri-devel  On Behalf Of Matt
> > Roper
> > Sent: Monday, February 13, 2023 4:19 PM
> > To: intel-...@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Subject: [PATCH] drm/i915/xelpmp: Consider GSI offset when doing MCR
> > lookups
> > 
> > MCR range tables use the final MMIO offset of a register (including the
> > 0x38 GSI offset when applicable).  Since the i915_mcr_reg_t passed
> > as a parameter during steering lookup does not include the GSI offset,
> > we need to add it back in for GSI registers before searching the tables.
> > 
> > Fixes: a7ec65fc7e83 ("drm/i915/xelpmp: Add multicast steering for media GT")
> 
> LGTM,
> Reviewed-by: Radhakrishna Sripada 

Thanks, applied to drm-intel-gt-next.


Matt

> 
> > Signed-off-by: Matt Roper 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > index a4a8b8bc5737..03632df27de3 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> > @@ -561,12 +561,15 @@ static bool reg_needs_read_steering(struct intel_gt
> > *gt,
> > i915_mcr_reg_t reg,
> > enum intel_steering_type type)
> >  {
> > -   const u32 offset = i915_mmio_reg_offset(reg);
> > +   u32 offset = i915_mmio_reg_offset(reg);
> > const struct intel_mmio_range *entry;
> > 
> > if (likely(!gt->steering_table[type]))
> > return false;
> > 
> > +   if (IS_GSI_REG(offset))
> > +   offset += gt->uncore->gsi_offset;
> > +
> > for (entry = gt->steering_table[type]; entry->end; entry++) {
> > if (offset >= entry->start && offset <= entry->end)
> > return true;
> > --
> > 2.39.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/amd/display: only warn once in dce110_edp_wait_for_hpd_ready()

2023-02-15 Thread Harry Wentland




On 2/14/23 16:12, Hamza Mahfooz wrote:

Since, hot plugging eDP displays isn't supported, it is sufficient for
us to warn about the lack of a connected display once. So, use ASSERT()
in dce110_edp_wait_for_hpd_ready() instead of DC_LOG_WARNING().

Signed-off-by: Hamza Mahfooz 


Reviewed-by: Harry Wentland 

Harry


---
  drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index fb3fd5b7c78b..0d4d3d586166 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -779,10 +779,8 @@ void dce110_edp_wait_for_hpd_ready(
  
  	dal_gpio_destroy_irq();
  
-	if (false == edp_hpd_high) {

-   DC_LOG_WARNING(
-   "%s: wait timed out!\n", __func__);
-   }
+   /* ensure that the panel is detected */
+   ASSERT(edp_hpd_high >   }
  
  void dce110_edp_power_control(


Re: [PATCH AUTOSEL 6.1 24/24] drm/amd/display: disable S/G display on DCN 3.1.2/3

2023-02-15 Thread Sasha Levin

On Wed, Feb 15, 2023 at 03:55:07PM -0500, Alex Deucher wrote:

On Wed, Feb 15, 2023 at 3:46 PM Sasha Levin  wrote:


From: Alex Deucher 

[ Upstream commit 077e9659581acab70f2dcc04b5bc799aca3a056b ]

Causes flickering or white screens in some configurations.
Disable it for now until we can fix the issue.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2352
Cc: roman...@amd.com
Cc: yifan1.zh...@amd.com
Reviewed-by: Yifan Zhang 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 


This was reverted upstream and should be dropped.


Ack, I'll drop it. Thanks!

--
Thanks,
Sasha


Re: [PATCH AUTOSEL 6.1 24/24] drm/amd/display: disable S/G display on DCN 3.1.2/3

2023-02-15 Thread Alex Deucher
On Wed, Feb 15, 2023 at 3:46 PM Sasha Levin  wrote:
>
> From: Alex Deucher 
>
> [ Upstream commit 077e9659581acab70f2dcc04b5bc799aca3a056b ]
>
> Causes flickering or white screens in some configurations.
> Disable it for now until we can fix the issue.
>
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2352
> Cc: roman...@amd.com
> Cc: yifan1.zh...@amd.com
> Reviewed-by: Yifan Zhang 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 

This was reverted upstream and should be dropped.

Alex

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 988b1c947aefc..c026ba532b733 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1524,8 +1524,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
> break;
> case IP_VERSION(2, 1, 0):
> case IP_VERSION(3, 0, 1):
> -   case IP_VERSION(3, 1, 2):
> -   case IP_VERSION(3, 1, 3):
> case IP_VERSION(3, 1, 6):
> init_data.flags.gpu_vm_support = true;
> break;
> --
> 2.39.0
>


Re: [PATCH 3/3] drm/connector: Deprecate split for BT.2020 in drm_colorspace enum

2023-02-15 Thread Harry Wentland



On 2/15/23 06:46, Daniel Stone wrote:
> Hi,
> 
> On Tue, 14 Feb 2023 at 16:57, Harry Wentland  wrote:
>> On 2/14/23 10:49, Sebastian Wick wrote:
>> From what I've seen recently I am inclined to favor an incremental
>> approach more. The reason is that any API, or portion thereof, is
>> useless unless it's enabled full stack. When it isn't it becomes
>> dead code quickly, or never really works because we overlooked
>> one thing. The colorspace debacle shows how even something as
>> simple as extra enum values in KMS APIs shouldn't be added unless
>> someone in a canonical upstream project actually uses them. I
>> would argue that such a canonical upstream project actually has
>> to be a production environment and not something like Weston.
> 
> Just to chime in as well that it is a real production environment;
> it's probably actually shipped the most of any compositor by a long
> way. It doesn't have much place on the desktop, but it does live in
> planes, trains, automobiles, digital signage, kiosks, STBs/TVs, and
> about a billion other places you might not have expected.
> 

Understood.

Curious if there's a list of some concrete examples.

> Probably the main factor that joins all these together - apart from
> not having much desktop-style click-and-drag reconfigurable UI - is
> that we need to use the hardware pipeline as efficiently as possible,
> because either we don't have the memory bandwidth to burn like
> desktops, or we need to minimise it for power/thermal reasons.
> 

I think we're very much aligned here.

> Given that, we don't really want to paint ourselves into a corner with
> incremental solutions that mean we can't do fully efficient things
> later. We're also somewhat undermanned, and we've been using our
> effort to try to make sure that the full solution - including full
> colour-managed pathways for things like movie and TV post-prod
> composition, design, etc - is possible at some point through the full
> Wayland ecosystem at some point. The X11 experience was so horribly
> botched that it wasn't really possible without a complete professional
> setup, and that's something I personally don't want to see. However
> ...

Agreed.

> 
>> I could see us getting to a fully new color pipeline API but
>> the only way to do that is with a development model that supports
>> it. While upstream needs to be our ultimate goal, a good way
>> to bring in new APIs and ensure a full-stack implementation is
>> to develop them in a downstream production kernel, alongside
>> userspace that makes use of it. Once the implementation is
>> proven in the downstream repos it can then go upstream. This
>> brings new challenges, though, as things don't get wide
>> testing and get out of sync with upstream quickly. The
>> alternative is the incremental approach.
>>
>> We should look at this from a use-case angle, similar to what
>> the gamescope guys are doing. Small steps, like:
>> 1) Add HDR10 output (PQ, BT.2020) to the display
>> 2) Add ability to do sRGB linear blending
>> 3) Add ability to do sRGB and PQ linear blending
>> 4) Post-blending 3D LUT
>> 5) Pre-blending 3D LUT
>>
>> At each stage the whole stack needs to work together in production.
> 
> Personally, I do think at this stage we probably have enough of an
> understanding to be able to work with an intermediate solution. We
> just need to think hard about what that intermediate solution is -
> making sure that we don't end up in the same tangle of impossible
> semantics like the old 'broadcast RGB' / colorspace / HDR properties
> which were never thought through - so that it is something we can
> build on rather than something we have to work around. But it would be
> really good to make HDR10/HDR10+ media and HDR games work on HDR
> displays, yeah.
> 

I have a feeling we'll make some progress here this year. I definitely
think the whole HDR/Colour work is on the right track in Weston and
Wayland which will hopefully give us a good base to work with over
many years.

Harry

> Cheers,
> Daniel



[PATCH AUTOSEL 6.1 24/24] drm/amd/display: disable S/G display on DCN 3.1.2/3

2023-02-15 Thread Sasha Levin
From: Alex Deucher 

[ Upstream commit 077e9659581acab70f2dcc04b5bc799aca3a056b ]

Causes flickering or white screens in some configurations.
Disable it for now until we can fix the issue.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2352
Cc: roman...@amd.com
Cc: yifan1.zh...@amd.com
Reviewed-by: Yifan Zhang 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 988b1c947aefc..c026ba532b733 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1524,8 +1524,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
break;
case IP_VERSION(2, 1, 0):
case IP_VERSION(3, 0, 1):
-   case IP_VERSION(3, 1, 2):
-   case IP_VERSION(3, 1, 3):
case IP_VERSION(3, 1, 6):
init_data.flags.gpu_vm_support = true;
break;
-- 
2.39.0



Re: [PATCH 3/3] drm/connector: Deprecate split for BT.2020 in drm_colorspace enum

2023-02-15 Thread Harry Wentland



On 2/15/23 04:40, Pekka Paalanen wrote:
> On Tue, 14 Feb 2023 15:04:52 -0500
> Harry Wentland  wrote:
> 
>> On 2/14/23 14:45, Sebastian Wick wrote:
>>> On Tue, Feb 14, 2023 at 5:57 PM Harry Wentland  
>>> wrote:  



 On 2/14/23 10:49, Sebastian Wick wrote:  
> On Fri, Feb 3, 2023 at 5:00 PM Ville Syrjälä
>  wrote:  
>>
>> On Fri, Feb 03, 2023 at 10:24:52AM -0500, Harry Wentland wrote:  
>>>
>>>
>>> On 2/3/23 10:19, Ville Syrjälä wrote:  
 On Fri, Feb 03, 2023 at 09:39:42AM -0500, Harry Wentland wrote:  
>
>
> On 2/3/23 07:59, Sebastian Wick wrote:  
>> On Fri, Feb 3, 2023 at 11:40 AM Ville Syrjälä
>>  wrote:  
>>>
>>> On Fri, Feb 03, 2023 at 02:07:44AM +, Joshua Ashton wrote:  
 Userspace has no way of controlling or knowing the pixel encoding
 currently, so there is no way for it to ever get the right values 
 here.  
>>>
>>> That applies to a lot of the other values as well (they are
>>> explicitly RGB or YCC). The idea was that this property sets the
>>> infoframe/MSA/SDP value exactly, and other properties should be
>>> added to for use userspace to control the pixel encoding/colorspace
>>> conversion(if desired, or userspace just makes sure to
>>> directly feed in correct kind of data).  
>>
>> I'm all for getting userspace control over pixel encoding but even
>> then the kernel always knows which pixel encoding is selected and
>> which InfoFrame has to be sent. Is there a reason why userspace would
>> want to control the variant explicitly to the wrong value?
>>  
>
> I've asked this before but haven't seen an answer: Is there an 
> existing
> upstream userspace project that makes use of this property (other than
> what Joshua is working on in gamescope right now)? That would help us
> understand the intent better.  

 The intent was to control the infoframe colorimetry bits,
 nothing more. No idea what real userspace there was, if any.
  
>
> I don't think giving userspace explicit control over the exact 
> infoframe
> values is the right thing to do.  

 Only userspace knows what kind of data it's stuffing into
 the pixels (and/or how it configures the csc units/etc.) to
 generate them.
  
>>>
>>> Yes, but userspace doesn't control or know whether we drive
>>> RGB or YCbCr on the wire. In fact, in some cases our driver
>>> needs to fallback to YCbCr420 for bandwidth reasons. There
>>> is currently no way for userspace to know that and I don't
>>> think it makes sense.  
>>
>> People want that control as well for whatever reason. We've
>> been asked to allow YCbCr 4:4:4 output many times.  
>
> I don't really think it's a question of if we want it but rather how
> we get there. Harry is completely right that if we would make the
> subsampling controllable by user space instead of the kernel handling
> it magically, user space which does not adapt to the new control won't
> be able to light up some modes which worked before.
>  

 Thanks for continuing this discussion and touching on the model of how
 we get to where we want to go.
  
> This is obviously a problem and not one we can easily fix. We would
> need a new cap for user space to signal "I know that I can control
> bpc, subsampling and compression to lower the bandwidth and light up
> modes which otherwise fail". That cap would also remove all the
> properties which require kernel magic to work (that's also what I
> proposed for my KMS color pipeline API).
>
> We all want to expose more of the scanout capability and give user
> space more control but I don't think an incremental approach works
> here and we would all do better if we accept that the current API
> requires kernel magic to work and has a few implicit assumptions baked
> in.
>
> With all that being said, I think the right decision here is to
>
> 1. Ignore subsampling for now
> 2. Let the kernel select YCC or RGB on the cable
> 3. Let the kernel figure out the conversion between RGB and YCC based
> on the color space selected
> 4. Let the kernel send the correct infoframe based on the selected
> color space and cable encoding
> 5. Only expose color spaces for which the kernel can do the conversion
> and send the infoframe  

 I agree. We don't want to break or change existing behavior (that is
 used by userspace) and this will get us far without breaking things.
  
> 6. Work on the new API which is hidden behind a cap
> 
> Hi,
> 
> I agree on all that, too.
> 
>  

 I assume you 

Re: [PATCH v5 4/8] drm/i915/pxp: Add GSC-CS backend to send GSC fw messages

2023-02-15 Thread Teres Alexis, Alan Previn
On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote:
alan:snip
> +static int gsccs_send_message(struct intel_pxp *pxp,
> +   void *msg_in, size_t msg_in_size,
> +   void *msg_out, size_t msg_out_size_max,
> +   size_t *msg_out_len,
> +   u64 *gsc_msg_handle_retry)
> +{
> + struct intel_gt *gt = pxp->ctrl_gt;
> + struct drm_i915_private *i915 = gt->i915;
> + struct gsccs_session_resources *exec =  >gsccs_res;
> + struct intel_gsc_mtl_header *header = exec->pkt_vaddr;
> + struct intel_gsc_heci_non_priv_pkt pkt;
> + bool null_pkt = !msg_in && !msg_out;
> + size_t max_msg_size;
> + u32 reply_size;
> + int ret;
> +
> + if (!exec->ce)
> + return -ENODEV;
> +
> + max_msg_size = PXP43_MAX_HECI_IN_SIZE - sizeof(*header);
> +
> + if (msg_in_size > max_msg_size || msg_out_size_max > max_msg_size)
> + return -ENOSPC;
> +
> + mutex_lock(>tee_mutex);
> +
> + if (!exec->pkt_vma || !exec->bb_vma)
> + return -ENOENT;
> +
alan: nack - i need to move the tee_mutex to after this pkt_vma / bb_bma checks


RE: [PATCH] drm/i915/xelpmp: Consider GSI offset when doing MCR lookups

2023-02-15 Thread Sripada, Radhakrishna



> -Original Message-
> From: dri-devel  On Behalf Of Matt
> Roper
> Sent: Monday, February 13, 2023 4:19 PM
> To: intel-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Subject: [PATCH] drm/i915/xelpmp: Consider GSI offset when doing MCR
> lookups
> 
> MCR range tables use the final MMIO offset of a register (including the
> 0x38 GSI offset when applicable).  Since the i915_mcr_reg_t passed
> as a parameter during steering lookup does not include the GSI offset,
> we need to add it back in for GSI registers before searching the tables.
> 
> Fixes: a7ec65fc7e83 ("drm/i915/xelpmp: Add multicast steering for media GT")

LGTM,
Reviewed-by: Radhakrishna Sripada 

> Signed-off-by: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_mcr.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> index a4a8b8bc5737..03632df27de3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
> @@ -561,12 +561,15 @@ static bool reg_needs_read_steering(struct intel_gt
> *gt,
>   i915_mcr_reg_t reg,
>   enum intel_steering_type type)
>  {
> - const u32 offset = i915_mmio_reg_offset(reg);
> + u32 offset = i915_mmio_reg_offset(reg);
>   const struct intel_mmio_range *entry;
> 
>   if (likely(!gt->steering_table[type]))
>   return false;
> 
> + if (IS_GSI_REG(offset))
> + offset += gt->uncore->gsi_offset;
> +
>   for (entry = gt->steering_table[type]; entry->end; entry++) {
>   if (offset >= entry->start && offset <= entry->end)
>   return true;
> --
> 2.39.1



Re: [PATCH v2 8/9] dt-bindings: display/msm: dsi-controller-main: Add SM6115

2023-02-15 Thread Rob Herring


On Mon, 13 Feb 2023 13:10:11 +0100, Konrad Dybcio wrote:
> Add a compatible for the DSI on SM6115.
> 
> Signed-off-by: Konrad Dybcio 
> ---
>  .../devicetree/bindings/display/msm/dsi-controller-main.yaml| 2 ++
>  1 file changed, 2 insertions(+)
> 

Acked-by: Rob Herring 



Re: [PATCH v2 1/9] dt-bindings: display/msm: dsi-controller-main: Fix deprecated QCM2290 compatible

2023-02-15 Thread Rob Herring


On Mon, 13 Feb 2023 13:10:04 +0100, Konrad Dybcio wrote:
> The qcom, prefix was missed previously. Fix it.
> 
> Fixes: 0c0f65c6dd44 ("dt-bindings: msm: dsi-controller-main: Add compatible 
> strings for every current SoC")
> Signed-off-by: Konrad Dybcio 
> ---
>  .../devicetree/bindings/display/msm/dsi-controller-main.yaml| 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: Rob Herring 



Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Thomas Hellström
On Wed, 2023-02-15 at 19:30 +0100, Christian König wrote:
> Am 15.02.23 um 19:12 schrieb Thomas Hellström:
> > On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
> > > Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > > > When swapping out, we will split multi-order pages both in
> > > > order to
> > > > move them to the swap-cache and to be able to return memory to
> > > > the
> > > > swap cache as soon as possible on a page-by-page basis.
> > > > By reducing the page max order to the system PMD size, we can
> > > > be
> > > > nicer
> > > > to the system and avoid splitting gigantic pages.
> > > 
> > > > On top of this we also
> > > > include the 64K page size in the page sizes tried, since that
> > > > appears to
> > > > be a common size for GPU applications.
> > > Please completely drop that.
> > You mean the 64K page size, or the whole patch?
> 
> The 64K page size. This was an invention from Microsoft to
> standardize 
> GPU handling ~15-20years ago.
> 
> It turned out to be a complete shipwreck and by now 2MiB and 1GiB
> pages 
> or just flexible hardware which can handle everything seem to become 
> standard.
> 
> > > This is just nonsense spilling in from the
> > > Windows drivers.
> > Agreed, but IIRC on the last RFC you asked me not to drop the 64K
> > pages, so that's why they are here. I can remove them if needed.
> 
> We could keep it if it's in any way beneficial, but I'm pretty sure I
> must have been drunk to ask for that.
> 
> > The only reason for keeping them from a performance point of view
> > is
> > better efficiency on GPUs with 64K page size if not using a
> > coalescing
> > IOMMU for dma-mapping.
> 
> Are any of those still produced? As far as I know neither NVidia,
> Intel 
> nor AMD still assumes that page size in their hardware for quite a
> while 
> now.

Intel still supports 64K PTEs, so we use them where possible, otherwise
falling back to 4K. Typically we have coalescing IOMMU enabled when
testing, so can't really see the impact, but TBH I was surprised by the
number of 64K page allocations TTM spat out with this patch series, so
I definitely think there is a performance impact with !IOMMU, although
I can't quantify it ATM.

So then if it's OK with you I'll keep that size for now.

/Thomas



> 
> Regards,
> Christian.
> 
> > 
> > Let me know what you think is best and I'll adjust accordingly.
> > 
> > /Thomas
> > 
> > 
> > > Christian.
> > > 
> > > > Looking forward to when we might be able to swap out PMD size
> > > > folios
> > > > without splitting, this will also be a benefit.
> > > > 
> > > > Signed-off-by: Thomas Hellström
> > > > 
> > > > ---
> > > >    drivers/gpu/drm/ttm/ttm_pool.c | 58
> > > > ++---
> > > > -
> > > >    1 file changed, 45 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > index 1cc7591a9542..8787fb6a218b 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > @@ -31,6 +31,8 @@
> > > >     * cause they are rather slow compared to alloc_pages+map.
> > > >     */
> > > >    
> > > > +#define pr_fmt(fmt) "[TTM POOL] " fmt
> > > > +
> > > >    #include 
> > > >    #include 
> > > >    #include 
> > > > @@ -47,6 +49,18 @@
> > > >    
> > > >    #include "ttm_module.h"
> > > >    
> > > > +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
> > > > +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
> > > > +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
> > > > +#undef TTM_MAX_ORDER
> > > > +#define TTM_MAX_ORDER TTM_64K_ORDER
> > > > +#endif
> > > > +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
> > > > +#undef TTM_MAX_ORDER
> > > > +#define TTM_MAX_ORDER (MAX_ORDER - 1)
> > > > +#endif
> > > > +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
> > > > +
> > > >    /**
> > > >     * struct ttm_pool_dma - Helper object for coherent DMA
> > > > mappings
> > > >     *
> > > > @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
> > > >    
> > > >    static atomic_long_t allocated_pages;
> > > >    
> > > > -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> > > > -static struct ttm_pool_type global_uncached[MAX_ORDER];
> > > > +static struct ttm_pool_type
> > > > global_write_combined[TTM_DIM_ORDER];
> > > > +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
> > > >    
> > > > -static struct ttm_pool_type
> > > > global_dma32_write_combined[MAX_ORDER];
> > > > -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> > > > +static struct ttm_pool_type
> > > > global_dma32_write_combined[TTM_DIM_ORDER];
> > > > +static struct ttm_pool_type
> > > > global_dma32_uncached[TTM_DIM_ORDER];
> > > >    
> > > >    static spinlock_t shrinker_lock;
> > > >    static struct list_head shrinker_list;
> > > >    static struct shrinker mm_shrinker;
> > > >    
> > > > +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
> > > > +
> > > >    /* Allocate pages of size 1 << order with the 

Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path

2023-02-15 Thread Thomas Hellström
On Wed, 2023-02-15 at 19:26 +0100, Christian König wrote:
> Am 15.02.23 um 19:02 schrieb Thomas Hellström:
> > On Wed, 2023-02-15 at 18:31 +0100, Christian König wrote:
> > > Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > > > When hitting an error, the error path forgot to unmap dma
> > > > mappings
> > > > and
> > > I don't see where this happens?
> >  From what I can tell, ttm_pool_page_allocated() maps the page for
> > dma,
> > If we later hit an error, ttm_pool_free_page() will leak the
> > mapping.
> 
> Ah, I see. Good point.
> 
> > 
> > > > could call set_pages_wb() on already uncached pages.
> > > Yeah, but what's the problem?
> > Umm, at least if you try to set WC on an already WC'd page, the
> > set_pages_ code will spam dmesg with warnings.
> > Not sure if set_pages_wb() on WB pages does the same, nor if it
> > issues unnecessary global cache / tlb flushes or whether that will
> > change in the future.
> > The point of avoiding the set_pages_wb() when already WB is you
> > don't
> > have to check, and you don't have to care.
> 
> Please just open code the error handling then. That helper function 
> looks horrible complicated to me.
> 
> Alternatively we could have a free function for a range of pages.

OK, I'll see if this is doable without adding a tremendous amount of
code.

/Thomas


> 
> Regards,
> Christian.
> 
> 
> > 
> > That said, the __ttm_pool_free() is used also in upcoming patches.
> > 
> > /Thomas
> > 
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Fix this by introducing a common __ttm_pool_free() function
> > > > that
> > > > does the right thing.
> > > > 
> > > > Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool
> > > > v3")
> > > > Cc: Christian König 
> > > > Cc: Dave Airlie 
> > > > Cc: Madhav Chauhan 
> > > > Cc: Christian Koenig 
> > > > Cc: Huang Rui 
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Signed-off-by: Thomas Hellström
> > > > 
> > > > ---
> > > >    drivers/gpu/drm/ttm/ttm_pool.c | 74 +---
> > > > -
> > > > -
> > > >    1 file changed, 45 insertions(+), 29 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > index aa116a7bbae3..1cc7591a9542 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > @@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct
> > > > ttm_pool *pool, unsigned int order,
> > > >  return 0;
> > > >    }
> > > >    
> > > > +static void __ttm_pool_free(struct ttm_pool *pool, struct
> > > > ttm_tt
> > > > *tt,
> > > > +   struct page **caching_divide,
> > > > +   enum ttm_caching initial_caching,
> > > > +   enum ttm_caching subseq_caching,
> > > > +   pgoff_t num_pages)
> > > > +{
> > > > +   enum ttm_caching caching = subseq_caching;
> > > > +   struct page **pages = tt->pages;
> > > > +   unsigned int order;
> > > > +   pgoff_t i, nr;
> > > > +
> > > > +   if (pool && caching_divide)
> > > > +   caching = initial_caching;
> > > > +
> > > > +   for (i = 0; i < num_pages; i += nr, pages += nr) {
> > > > +   struct ttm_pool_type *pt = NULL;
> > > > +
> > > > +   if (unlikely(caching_divide == pages))
> > > > +   caching = subseq_caching;
> > > > +
> > > > +   order = ttm_pool_page_order(pool, *pages);
> > > > +   nr = (1UL << order);
> > > > +   if (tt->dma_address)
> > > > +   ttm_pool_unmap(pool, tt-
> > > > >dma_address[i],
> > > > nr);
> > > > +
> > > > +   pt = ttm_pool_select_type(pool, caching,
> > > > order);
> > > > +   if (pt)
> > > > +   ttm_pool_type_give(pt, *pages);
> > > > +   else
> > > > +   ttm_pool_free_page(pool, caching,
> > > > order,
> > > > *pages);
> > > > +   }
> > > > +}
> > > > +
> > > >    /**
> > > >     * ttm_pool_alloc - Fill a ttm_tt object
> > > >     *
> > > > @@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >  dma_addr_t *dma_addr = tt->dma_address;
> > > >  struct page **caching = tt->pages;
> > > >  struct page **pages = tt->pages;
> > > > +   enum ttm_caching page_caching;
> > > >  gfp_t gfp_flags = GFP_USER;
> > > > -   unsigned int i, order;
> > > > +   unsigned int order;
> > > >  struct page *p;
> > > >  int r;
> > > >    
> > > > @@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >   order = min_t(unsigned int, order,
> > > > __fls(num_pages)))
> > > > {
> > > >  struct ttm_pool_type *pt;
> > > >    
> > > > +   page_caching = tt->caching;
> > > >  pt = ttm_pool_select_type(pool, tt->caching,
> > > > 

Re: [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface

2023-02-15 Thread Christian König

Am 15.02.23 um 19:19 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 18:39 +0100, Christian König wrote:

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

Update the TTM swapout interfaces for better compatibility with a
shrinker.
- Replace number-of-pages int return with a long to better match
the
    kernel's shrinker interface.
- The gfp_flags parameter to ttm_xx_swapout() currently only takes
the
    GFP_KERNEL value and shouldn't really be needed since the
shrinker we
    hook up in upcoming patches sets a allocation context to match
reclaim.
- Introduce a shrink reason enumeration and a driver callback to
shrink
    buffer objects.

Is that really necessary? This is mid-layering once more.

If drivers want to implement driver specific shrinking they should
register their own shrinker callback.

Yes, a choice needs to be made here. If TTM registers the shrinker, the
driver needs to be called at least to unbind and to remove dma-
mappings.

If the driver registers the shrinker it can still (I think) use the
pool helpers, but needs TTM for LRU traversal and accounting.

I can have a look at the latter if yout think that will be a better
solution.


Yeah, that's what I had in mind as well. Something like the drivers 
registers the shrinker and TTM provides the function to give a candidate 
for eviction.


Christian.



/Thomas



Christian.



    The TTM_SHRINK_WATERMARK reason is going to still be handled
using the
    existing shmem copy, and will be used by pool types that don't
lend
    themselves well to shinking (dma_alloc pool) and when drivers
explicitly
    requests swapout.
    The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from
a
    shrinker and is to be handled by a new driver callback,
bo_shrink().
    Helpers for the new driver callback are provided in upcoming
patches.

Cc: linux-graphics-maintai...@vmware.com
Signed-off-by: Thomas Hellström 
---
   drivers/gpu/drm/ttm/ttm_bo.c    | 38 
   drivers/gpu/drm/ttm/ttm_device.c    | 55 +---
-
   drivers/gpu/drm/ttm/ttm_tt.c    | 23 ++--
   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
   include/drm/ttm/ttm_bo.h    |  4 +--
   include/drm/ttm/ttm_device.h    | 36 +--
   include/drm/ttm/ttm_tt.h    | 17 +++--
   7 files changed, 136 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index 882c2fa346f3..e5c0970564c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct
ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
   }
   EXPORT_SYMBOL(ttm_bo_wait_ctx);
   
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct

ttm_operation_ctx *ctx,
-  gfp_t gfp_flags)
+/**
+ * ttm_bo_swapout() - Swap out or purge a buffer object
+ * @bo: The buffer object.
+ * @ctx: The ttm operation context.
+ * @reason: The swapout reason.
+ *
+ * Try to swap out or purge the contents of a system memory backed
buffer
+ * object. The function needs to be called with the device's LRU
lock held.
+ *
+ * Return: -EBUSY if the bo lock could not be grabbed or the
object was
+ * otherwise busy. Otherwise the number of pages swapped out or
negative
+ * error code on error. Iff the function didn't return -EBUSY, the
+ * LRU lock was dropped, and LRU traversal needs to restart.
+ */
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct
ttm_operation_ctx *ctx,
+   enum ttm_shrink_reason reason)
   {
 struct ttm_place place;
 bool locked;
 long ret;
   
+   lockdep_assert_held(>bdev->lru_lock);

+
 /*
  * While the bo may already reside in SYSTEM placement, set
  * SYSTEM as new placement to cover also the move further
below.
@@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object
*bo, struct ttm_operation_ctx *ctx,
 }
   
 if (bo->deleted) {

+   long num_pages = bo->ttm->num_pages;
+
 ret = ttm_bo_cleanup_refs(bo, false, false,
locked);
 ttm_bo_put(bo);
+   if (!ret)
+   return num_pages;
 return ret == -EBUSY ? -ENOSPC : ret;
 }
   
@@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object

*bo, struct ttm_operation_ctx *ctx,
  * Swap out. Buffer will be swapped in again as soon as
  * anyone tries to access a ttm page.
  */
-   if (bo->bdev->funcs->swap_notify)
-   bo->bdev->funcs->swap_notify(bo);
+   if (bo->bdev->funcs->bo_shrink && reason !=
TTM_SHRINK_WATERMARK) {
+   ret = bo->bdev->funcs->bo_shrink(bo, ctx);
+   } else {
+   if (bo->bdev->funcs->swap_notify)
+   bo->bdev->funcs->swap_notify(bo);
+   ret = ttm_tt_swapout(bo->bdev, bo->ttm);
+   if (!ret)
+   ret = 

Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Christian König

Am 15.02.23 um 19:12 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can be
nicer
to the system and avoid splitting gigantic pages.



On top of this we also
include the 64K page size in the page sizes tried, since that
appears to
be a common size for GPU applications.

Please completely drop that.

You mean the 64K page size, or the whole patch?


The 64K page size. This was an invention from Microsoft to standardize 
GPU handling ~15-20years ago.


It turned out to be a complete shipwreck and by now 2MiB and 1GiB pages 
or just flexible hardware which can handle everything seem to become 
standard.



This is just nonsense spilling in from the
Windows drivers.

Agreed, but IIRC on the last RFC you asked me not to drop the 64K
pages, so that's why they are here. I can remove them if needed.


We could keep it if it's in any way beneficial, but I'm pretty sure I 
must have been drunk to ask for that.



The only reason for keeping them from a performance point of view is
better efficiency on GPUs with 64K page size if not using a coalescing
IOMMU for dma-mapping.


Are any of those still produced? As far as I know neither NVidia, Intel 
nor AMD still assumes that page size in their hardware for quite a while 
now.


Regards,
Christian.



Let me know what you think is best and I'll adjust accordingly.

/Thomas



Christian.


Looking forward to when we might be able to swap out PMD size
folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström 
---
   drivers/gpu/drm/ttm/ttm_pool.c | 58 ++---
-
   1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
    * cause they are rather slow compared to alloc_pages+map.
    */
   
+#define pr_fmt(fmt) "[TTM POOL] " fmt

+
   #include 
   #include 
   #include 
@@ -47,6 +49,18 @@
   
   #include "ttm_module.h"
   
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)

+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
   /**
    * struct ttm_pool_dma - Helper object for coherent DMA mappings
    *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
   
   static atomic_long_t allocated_pages;
   
-static struct ttm_pool_type global_write_combined[MAX_ORDER];

-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
   
-static struct ttm_pool_type

global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type
global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
   
   static spinlock_t shrinker_lock;

   static struct list_head shrinker_list;
   static struct shrinker mm_shrinker;
   
+static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};

+
   /* Allocate pages of size 1 << order with the given gfp_flags */
   static struct page *ttm_pool_alloc_page(struct ttm_pool *pool,
gfp_t gfp_flags,
 unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool
*pool, struct ttm_tt *tt,
 }
   }
   
+static unsigned int ttm_pool_select_order(unsigned int order,

pgoff_t num_pages)
+{
+   unsigned int *cur_order = ttm_pool_orders;
+
+   order = min_t(unsigned int, __fls(num_pages), order);
+   while (order < *cur_order)
+   ++cur_order;
+
+   return *cur_order;
+}
+
   /**
    * ttm_pool_alloc - Fill a ttm_tt object
    *
@@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
struct ttm_tt *tt,
 else
 gfp_flags |= GFP_HIGHUSER;
   
-   for (order = min_t(unsigned int, MAX_ORDER - 1,

__fls(num_pages));
-    num_pages;
-    order = min_t(unsigned int, order, __fls(num_pages)))
{
+   order = ttm_pool_select_order(ttm_pool_orders[0],
num_pages);
+   for (; num_pages; order = ttm_pool_select_order(order,
num_pages)) {
 struct ttm_pool_type *pt;
   
 page_caching = tt->caching;

@@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
struct device *dev,
   
 if (use_dma_alloc) {

   

Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path

2023-02-15 Thread Christian König

Am 15.02.23 um 19:02 schrieb Thomas Hellström:

On Wed, 2023-02-15 at 18:31 +0100, Christian König wrote:

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

When hitting an error, the error path forgot to unmap dma mappings
and

I don't see where this happens?

 From what I can tell, ttm_pool_page_allocated() maps the page for dma,
If we later hit an error, ttm_pool_free_page() will leak the mapping.


Ah, I see. Good point.




could call set_pages_wb() on already uncached pages.

Yeah, but what's the problem?

Umm, at least if you try to set WC on an already WC'd page, the
set_pages_ code will spam dmesg with warnings.
Not sure if set_pages_wb() on WB pages does the same, nor if it
issues unnecessary global cache / tlb flushes or whether that will
change in the future.
The point of avoiding the set_pages_wb() when already WB is you don't
have to check, and you don't have to care.


Please just open code the error handling then. That helper function 
looks horrible complicated to me.


Alternatively we could have a free function for a range of pages.

Regards,
Christian.




That said, the __ttm_pool_free() is used also in upcoming patches.

/Thomas



Regards,
Christian.


Fix this by introducing a common __ttm_pool_free() function that
does the right thing.

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König 
Cc: Dave Airlie 
Cc: Madhav Chauhan 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström 
---
   drivers/gpu/drm/ttm/ttm_pool.c | 74 +
-
   1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..1cc7591a9542 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct
ttm_pool *pool, unsigned int order,
 return 0;
   }
   
+static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt

*tt,
+   struct page **caching_divide,
+   enum ttm_caching initial_caching,
+   enum ttm_caching subseq_caching,
+   pgoff_t num_pages)
+{
+   enum ttm_caching caching = subseq_caching;
+   struct page **pages = tt->pages;
+   unsigned int order;
+   pgoff_t i, nr;
+
+   if (pool && caching_divide)
+   caching = initial_caching;
+
+   for (i = 0; i < num_pages; i += nr, pages += nr) {
+   struct ttm_pool_type *pt = NULL;
+
+   if (unlikely(caching_divide == pages))
+   caching = subseq_caching;
+
+   order = ttm_pool_page_order(pool, *pages);
+   nr = (1UL << order);
+   if (tt->dma_address)
+   ttm_pool_unmap(pool, tt->dma_address[i],
nr);
+
+   pt = ttm_pool_select_type(pool, caching, order);
+   if (pt)
+   ttm_pool_type_give(pt, *pages);
+   else
+   ttm_pool_free_page(pool, caching, order,
*pages);
+   }
+}
+
   /**
    * ttm_pool_alloc - Fill a ttm_tt object
    *
@@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool,
struct ttm_tt *tt,
 dma_addr_t *dma_addr = tt->dma_address;
 struct page **caching = tt->pages;
 struct page **pages = tt->pages;
+   enum ttm_caching page_caching;
 gfp_t gfp_flags = GFP_USER;
-   unsigned int i, order;
+   unsigned int order;
 struct page *p;
 int r;
   
@@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,

struct ttm_tt *tt,
  order = min_t(unsigned int, order, __fls(num_pages)))
{
 struct ttm_pool_type *pt;
   
+   page_caching = tt->caching;

 pt = ttm_pool_select_type(pool, tt->caching,
order);
 p = pt ? ttm_pool_type_take(pt) : NULL;
 if (p) {
@@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
struct ttm_tt *tt,
 if (r)
 goto error_free_page;
   
+   caching = pages;

 do {
 r = ttm_pool_page_allocated(pool,
order, p,

_addr,

@@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool,
struct ttm_tt *tt,
 if (r)
 goto error_free_page;
   
+   caching = pages;

 if (num_pages < (1 << order))
 break;
   
 p = ttm_pool_type_take(pt);

 } while (p);
-   caching = pages;
 }
   
+   page_caching = ttm_cached;

 while 

Re: [PATCH v5 1/2] dt-bindings: display: imx: Describe drm binding for fsl,imx-lcdc

2023-02-15 Thread Rob Herring


On Fri, 10 Feb 2023 19:00:13 +0100, Uwe Kleine-König wrote:
> Modify the existing (fb-like) binding to support the drm-like binding in
> parallel.
> 
> Signed-off-by: Uwe Kleine-König 
> ---
>  .../bindings/display/imx/fsl,imx-lcdc.yaml| 46 ++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 

Reviewed-by: Rob Herring 



Re: [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface

2023-02-15 Thread Thomas Hellström
On Wed, 2023-02-15 at 18:39 +0100, Christian König wrote:
> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > Update the TTM swapout interfaces for better compatibility with a
> > shrinker.
> > - Replace number-of-pages int return with a long to better match
> > the
> >    kernel's shrinker interface.
> > - The gfp_flags parameter to ttm_xx_swapout() currently only takes
> > the
> >    GFP_KERNEL value and shouldn't really be needed since the
> > shrinker we
> >    hook up in upcoming patches sets a allocation context to match
> > reclaim.
> 
> > - Introduce a shrink reason enumeration and a driver callback to
> > shrink
> >    buffer objects.
> 
> Is that really necessary? This is mid-layering once more.
> 
> If drivers want to implement driver specific shrinking they should 
> register their own shrinker callback.

Yes, a choice needs to be made here. If TTM registers the shrinker, the
driver needs to be called at least to unbind and to remove dma-
mappings.

If the driver registers the shrinker it can still (I think) use the
pool helpers, but needs TTM for LRU traversal and accounting.

I can have a look at the latter if yout think that will be a better
solution.

/Thomas


> 
> Christian.
> 
> 
> >    The TTM_SHRINK_WATERMARK reason is going to still be handled
> > using the
> >    existing shmem copy, and will be used by pool types that don't
> > lend
> >    themselves well to shinking (dma_alloc pool) and when drivers
> > explicitly
> >    requests swapout.
> >    The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from
> > a
> >    shrinker and is to be handled by a new driver callback,
> > bo_shrink().
> >    Helpers for the new driver callback are provided in upcoming
> > patches.
> > 
> > Cc: linux-graphics-maintai...@vmware.com
> > Signed-off-by: Thomas Hellström 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c    | 38 
> >   drivers/gpu/drm/ttm/ttm_device.c    | 55 +---
> > -
> >   drivers/gpu/drm/ttm/ttm_tt.c    | 23 ++--
> >   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
> >   include/drm/ttm/ttm_bo.h    |  4 +--
> >   include/drm/ttm/ttm_device.h    | 36 +--
> >   include/drm/ttm/ttm_tt.h    | 17 +++--
> >   7 files changed, 136 insertions(+), 40 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 882c2fa346f3..e5c0970564c0 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct
> > ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
> >   }
> >   EXPORT_SYMBOL(ttm_bo_wait_ctx);
> >   
> > -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct
> > ttm_operation_ctx *ctx,
> > -  gfp_t gfp_flags)
> > +/**
> > + * ttm_bo_swapout() - Swap out or purge a buffer object
> > + * @bo: The buffer object.
> > + * @ctx: The ttm operation context.
> > + * @reason: The swapout reason.
> > + *
> > + * Try to swap out or purge the contents of a system memory backed
> > buffer
> > + * object. The function needs to be called with the device's LRU
> > lock held.
> > + *
> > + * Return: -EBUSY if the bo lock could not be grabbed or the
> > object was
> > + * otherwise busy. Otherwise the number of pages swapped out or
> > negative
> > + * error code on error. Iff the function didn't return -EBUSY, the
> > + * LRU lock was dropped, and LRU traversal needs to restart.
> > + */
> > +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct
> > ttm_operation_ctx *ctx,
> > +   enum ttm_shrink_reason reason)
> >   {
> > struct ttm_place place;
> > bool locked;
> > long ret;
> >   
> > +   lockdep_assert_held(>bdev->lru_lock);
> > +
> > /*
> >  * While the bo may already reside in SYSTEM placement, set
> >  * SYSTEM as new placement to cover also the move further
> > below.
> > @@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > *bo, struct ttm_operation_ctx *ctx,
> > }
> >   
> > if (bo->deleted) {
> > +   long num_pages = bo->ttm->num_pages;
> > +
> > ret = ttm_bo_cleanup_refs(bo, false, false,
> > locked);
> > ttm_bo_put(bo);
> > +   if (!ret)
> > +   return num_pages;
> > return ret == -EBUSY ? -ENOSPC : ret;
> > }
> >   
> > @@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > *bo, struct ttm_operation_ctx *ctx,
> >  * Swap out. Buffer will be swapped in again as soon as
> >  * anyone tries to access a ttm page.
> >  */
> > -   if (bo->bdev->funcs->swap_notify)
> > -   bo->bdev->funcs->swap_notify(bo);
> > +   if (bo->bdev->funcs->bo_shrink && reason !=
> > TTM_SHRINK_WATERMARK) {
> > +   ret = bo->bdev->funcs->bo_shrink(bo, ctx);
> > +   } else {
> > +   if 

Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Thomas Hellström
On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > When swapping out, we will split multi-order pages both in order to
> > move them to the swap-cache and to be able to return memory to the
> > swap cache as soon as possible on a page-by-page basis.
> > By reducing the page max order to the system PMD size, we can be
> > nicer
> > to the system and avoid splitting gigantic pages.
> 
> 
> > On top of this we also
> > include the 64K page size in the page sizes tried, since that
> > appears to
> > be a common size for GPU applications.
> 
> Please completely drop that. 
You mean the 64K page size, or the whole patch?

> This is just nonsense spilling in from the 
> Windows drivers.

Agreed, but IIRC on the last RFC you asked me not to drop the 64K
pages, so that's why they are here. I can remove them if needed.

The only reason for keeping them from a performance point of view is
better efficiency on GPUs with 64K page size if not using a coalescing
IOMMU for dma-mapping.

Let me know what you think is best and I'll adjust accordingly.

/Thomas


> 
> Christian.
> 
> > 
> > Looking forward to when we might be able to swap out PMD size
> > folios
> > without splitting, this will also be a benefit.
> > 
> > Signed-off-by: Thomas Hellström 
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 58 ++---
> > -
> >   1 file changed, 45 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index 1cc7591a9542..8787fb6a218b 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -31,6 +31,8 @@
> >    * cause they are rather slow compared to alloc_pages+map.
> >    */
> >   
> > +#define pr_fmt(fmt) "[TTM POOL] " fmt
> > +
> >   #include 
> >   #include 
> >   #include 
> > @@ -47,6 +49,18 @@
> >   
> >   #include "ttm_module.h"
> >   
> > +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
> > +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
> > +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
> > +#undef TTM_MAX_ORDER
> > +#define TTM_MAX_ORDER TTM_64K_ORDER
> > +#endif
> > +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
> > +#undef TTM_MAX_ORDER
> > +#define TTM_MAX_ORDER (MAX_ORDER - 1)
> > +#endif
> > +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
> > +
> >   /**
> >    * struct ttm_pool_dma - Helper object for coherent DMA mappings
> >    *
> > @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
> >   
> >   static atomic_long_t allocated_pages;
> >   
> > -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> > -static struct ttm_pool_type global_uncached[MAX_ORDER];
> > +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
> > +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
> >   
> > -static struct ttm_pool_type
> > global_dma32_write_combined[MAX_ORDER];
> > -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> > +static struct ttm_pool_type
> > global_dma32_write_combined[TTM_DIM_ORDER];
> > +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
> >   
> >   static spinlock_t shrinker_lock;
> >   static struct list_head shrinker_list;
> >   static struct shrinker mm_shrinker;
> >   
> > +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
> > +
> >   /* Allocate pages of size 1 << order with the given gfp_flags */
> >   static struct page *ttm_pool_alloc_page(struct ttm_pool *pool,
> > gfp_t gfp_flags,
> > unsigned int order)
> > @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool
> > *pool, struct ttm_tt *tt,
> > }
> >   }
> >   
> > +static unsigned int ttm_pool_select_order(unsigned int order,
> > pgoff_t num_pages)
> > +{
> > +   unsigned int *cur_order = ttm_pool_orders;
> > +
> > +   order = min_t(unsigned int, __fls(num_pages), order);
> > +   while (order < *cur_order)
> > +   ++cur_order;
> > +
> > +   return *cur_order;
> > +}
> > +
> >   /**
> >    * ttm_pool_alloc - Fill a ttm_tt object
> >    *
> > @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > else
> > gfp_flags |= GFP_HIGHUSER;
> >   
> > -   for (order = min_t(unsigned int, MAX_ORDER - 1,
> > __fls(num_pages));
> > -    num_pages;
> > -    order = min_t(unsigned int, order, __fls(num_pages)))
> > {
> > +   order = ttm_pool_select_order(ttm_pool_orders[0],
> > num_pages);
> > +   for (; num_pages; order = ttm_pool_select_order(order,
> > num_pages)) {
> > struct ttm_pool_type *pt;
> >   
> > page_caching = tt->caching;
> > @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
> > struct device *dev,
> >   
> > if (use_dma_alloc) {
> > for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> > -   for (j = 0; j < MAX_ORDER; ++j)
> > +   for 

Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path

2023-02-15 Thread Thomas Hellström
On Wed, 2023-02-15 at 18:31 +0100, Christian König wrote:
> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > When hitting an error, the error path forgot to unmap dma mappings
> > and
> 
> I don't see where this happens?

From what I can tell, ttm_pool_page_allocated() maps the page for dma,
If we later hit an error, ttm_pool_free_page() will leak the mapping.

> 
> > could call set_pages_wb() on already uncached pages.
> 
> Yeah, but what's the problem?

Umm, at least if you try to set WC on an already WC'd page, the
set_pages_ code will spam dmesg with warnings. 
Not sure if set_pages_wb() on WB pages does the same, nor if it
issues unnecessary global cache / tlb flushes or whether that will
change in the future.
The point of avoiding the set_pages_wb() when already WB is you don't
have to check, and you don't have to care.

That said, the __ttm_pool_free() is used also in upcoming patches.

/Thomas


> 
> Regards,
> Christian.
> 
> > 
> > Fix this by introducing a common __ttm_pool_free() function that
> > does the right thing.
> > 
> > Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
> > Cc: Christian König 
> > Cc: Dave Airlie 
> > Cc: Madhav Chauhan 
> > Cc: Christian Koenig 
> > Cc: Huang Rui 
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Thomas Hellström 
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 74 +
> > -
> >   1 file changed, 45 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index aa116a7bbae3..1cc7591a9542 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct
> > ttm_pool *pool, unsigned int order,
> > return 0;
> >   }
> >   
> > +static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt
> > *tt,
> > +   struct page **caching_divide,
> > +   enum ttm_caching initial_caching,
> > +   enum ttm_caching subseq_caching,
> > +   pgoff_t num_pages)
> > +{
> > +   enum ttm_caching caching = subseq_caching;
> > +   struct page **pages = tt->pages;
> > +   unsigned int order;
> > +   pgoff_t i, nr;
> > +
> > +   if (pool && caching_divide)
> > +   caching = initial_caching;
> > +
> > +   for (i = 0; i < num_pages; i += nr, pages += nr) {
> > +   struct ttm_pool_type *pt = NULL;
> > +
> > +   if (unlikely(caching_divide == pages))
> > +   caching = subseq_caching;
> > +
> > +   order = ttm_pool_page_order(pool, *pages);
> > +   nr = (1UL << order);
> > +   if (tt->dma_address)
> > +   ttm_pool_unmap(pool, tt->dma_address[i],
> > nr);
> > +
> > +   pt = ttm_pool_select_type(pool, caching, order);
> > +   if (pt)
> > +   ttm_pool_type_give(pt, *pages);
> > +   else
> > +   ttm_pool_free_page(pool, caching, order,
> > *pages);
> > +   }
> > +}
> > +
> >   /**
> >    * ttm_pool_alloc - Fill a ttm_tt object
> >    *
> > @@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > dma_addr_t *dma_addr = tt->dma_address;
> > struct page **caching = tt->pages;
> > struct page **pages = tt->pages;
> > +   enum ttm_caching page_caching;
> > gfp_t gfp_flags = GFP_USER;
> > -   unsigned int i, order;
> > +   unsigned int order;
> > struct page *p;
> > int r;
> >   
> > @@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >  order = min_t(unsigned int, order, __fls(num_pages)))
> > {
> > struct ttm_pool_type *pt;
> >   
> > +   page_caching = tt->caching;
> > pt = ttm_pool_select_type(pool, tt->caching,
> > order);
> > p = pt ? ttm_pool_type_take(pt) : NULL;
> > if (p) {
> > @@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > if (r)
> > goto error_free_page;
> >   
> > +   caching = pages;
> > do {
> > r = ttm_pool_page_allocated(pool,
> > order, p,
> >    
> > _addr,
> > @@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> > if (r)
> > goto error_free_page;
> >   
> > +   caching = pages;
> > if (num_pages < (1 << order))
> > break;
> >   
> > p = ttm_pool_type_take(pt);
> >  

Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Christian König

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can be nicer
to the system and avoid splitting gigantic pages.




On top of this we also
include the 64K page size in the page sizes tried, since that appears to
be a common size for GPU applications.


Please completely drop that. This is just nonsense spilling in from the 
Windows drivers.


Christian.



Looking forward to when we might be able to swap out PMD size folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/ttm/ttm_pool.c | 58 ++
  1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
   * cause they are rather slow compared to alloc_pages+map.
   */
  
+#define pr_fmt(fmt) "[TTM POOL] " fmt

+
  #include 
  #include 
  #include 
@@ -47,6 +49,18 @@
  
  #include "ttm_module.h"
  
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)

+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
  /**
   * struct ttm_pool_dma - Helper object for coherent DMA mappings
   *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
  
  static atomic_long_t allocated_pages;
  
-static struct ttm_pool_type global_write_combined[MAX_ORDER];

-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
  
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];

-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
  
  static spinlock_t shrinker_lock;

  static struct list_head shrinker_list;
  static struct shrinker mm_shrinker;
  
+static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};

+
  /* Allocate pages of size 1 << order with the given gfp_flags */
  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t 
gfp_flags,
unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct 
ttm_tt *tt,
}
  }
  
+static unsigned int ttm_pool_select_order(unsigned int order, pgoff_t num_pages)

+{
+   unsigned int *cur_order = ttm_pool_orders;
+
+   order = min_t(unsigned int, __fls(num_pages), order);
+   while (order < *cur_order)
+   ++cur_order;
+
+   return *cur_order;
+}
+
  /**
   * ttm_pool_alloc - Fill a ttm_tt object
   *
@@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
else
gfp_flags |= GFP_HIGHUSER;
  
-	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));

-num_pages;
-order = min_t(unsigned int, order, __fls(num_pages))) {
+   order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
+   for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
struct ttm_pool_type *pt;
  
  		page_caching = tt->caching;

@@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device 
*dev,
  
  	if (use_dma_alloc) {

for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-   for (j = 0; j < MAX_ORDER; ++j)
+   for (j = 0; j < TTM_DIM_ORDER; ++j)
ttm_pool_type_init(>caching[i].orders[j],
   pool, i, j);
}
@@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
  
  	if (pool->use_dma_alloc) {

for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-   for (j = 0; j < MAX_ORDER; ++j)
+   for (j = 0; j < TTM_DIM_ORDER; ++j)
ttm_pool_type_fini(>caching[i].orders[j]);
}
  
@@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)

unsigned int i;
  
  	seq_puts(m, "\t ");

-   for (i = 0; i < MAX_ORDER; ++i)
+   for (i = 0; i < TTM_DIM_ORDER; ++i)
seq_printf(m, " ---%2u---", i);
seq_puts(m, "\n");
  }
@@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type 
*pt,
  {
unsigned int i;
  
-	for (i = 0; i < MAX_ORDER; ++i)

+   for (i = 0; 

Re: [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface

2023-02-15 Thread Christian König

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

Update the TTM swapout interfaces for better compatibility with a shrinker.
- Replace number-of-pages int return with a long to better match the
   kernel's shrinker interface.
- The gfp_flags parameter to ttm_xx_swapout() currently only takes the
   GFP_KERNEL value and shouldn't really be needed since the shrinker we
   hook up in upcoming patches sets a allocation context to match reclaim.



- Introduce a shrink reason enumeration and a driver callback to shrink
   buffer objects.


Is that really necessary? This is mid-layering once more.

If drivers want to implement driver specific shrinking they should 
register their own shrinker callback.


Christian.



   The TTM_SHRINK_WATERMARK reason is going to still be handled using the
   existing shmem copy, and will be used by pool types that don't lend
   themselves well to shinking (dma_alloc pool) and when drivers explicitly
   requests swapout.
   The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from a
   shrinker and is to be handled by a new driver callback, bo_shrink().
   Helpers for the new driver callback are provided in upcoming patches.

Cc: linux-graphics-maintai...@vmware.com
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/ttm/ttm_bo.c| 38 
  drivers/gpu/drm/ttm/ttm_device.c| 55 +
  drivers/gpu/drm/ttm/ttm_tt.c| 23 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
  include/drm/ttm/ttm_bo.h|  4 +--
  include/drm/ttm/ttm_device.h| 36 +--
  include/drm/ttm/ttm_tt.h| 17 +++--
  7 files changed, 136 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 882c2fa346f3..e5c0970564c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, 
struct ttm_operation_ctx *ctx)
  }
  EXPORT_SYMBOL(ttm_bo_wait_ctx);
  
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,

-  gfp_t gfp_flags)
+/**
+ * ttm_bo_swapout() - Swap out or purge a buffer object
+ * @bo: The buffer object.
+ * @ctx: The ttm operation context.
+ * @reason: The swapout reason.
+ *
+ * Try to swap out or purge the contents of a system memory backed buffer
+ * object. The function needs to be called with the device's LRU lock held.
+ *
+ * Return: -EBUSY if the bo lock could not be grabbed or the object was
+ * otherwise busy. Otherwise the number of pages swapped out or negative
+ * error code on error. Iff the function didn't return -EBUSY, the
+ * LRU lock was dropped, and LRU traversal needs to restart.
+ */
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx 
*ctx,
+   enum ttm_shrink_reason reason)
  {
struct ttm_place place;
bool locked;
long ret;
  
+	lockdep_assert_held(>bdev->lru_lock);

+
/*
 * While the bo may already reside in SYSTEM placement, set
 * SYSTEM as new placement to cover also the move further below.
@@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
  
  	if (bo->deleted) {

+   long num_pages = bo->ttm->num_pages;
+
ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
+   if (!ret)
+   return num_pages;
return ret == -EBUSY ? -ENOSPC : ret;
}
  
@@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,

 * Swap out. Buffer will be swapped in again as soon as
 * anyone tries to access a ttm page.
 */
-   if (bo->bdev->funcs->swap_notify)
-   bo->bdev->funcs->swap_notify(bo);
+   if (bo->bdev->funcs->bo_shrink && reason != TTM_SHRINK_WATERMARK) {
+   ret = bo->bdev->funcs->bo_shrink(bo, ctx);
+   } else {
+   if (bo->bdev->funcs->swap_notify)
+   bo->bdev->funcs->swap_notify(bo);
+   ret = ttm_tt_swapout(bo->bdev, bo->ttm);
+   if (!ret)
+   ret = bo->ttm->num_pages;
+   }
  
-	if (ttm_tt_is_populated(bo->ttm))

-   ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
  out:
-
/*
 * Unreserve without putting on LRU to avoid swapping out an
 * already swapped buffer.
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index ae2f19dc9f81..7eadea07027f 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -116,19 +116,28 @@ static int ttm_global_init(void)
return ret;
  }
  
-/*

- * A buffer object shrink method that tries to swap out the first
- * buffer object on the global::swap_lru list.
+/**
+ * ttm_global_swapout() - Select 

Re: [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs

2023-02-15 Thread Christian König




Am 15.02.23 um 17:13 schrieb Thomas Hellström:

New code is recommended to use the BIT macro instead of the explicit
shifts. Change the older defines so that we can keep the style consistent
with upcoming changes.

Signed-off-by: Thomas Hellström 
---
  include/drm/ttm/ttm_tt.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index b7d3f3843f1e..cc54be1912e1 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -83,12 +83,12 @@ struct ttm_tt {
 * set by TTM after ttm_tt_populate() has successfully returned, and is
 * then unset when TTM calls ttm_tt_unpopulate().
 */
-#define TTM_TT_FLAG_SWAPPED(1 << 0)
-#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1)
-#define TTM_TT_FLAG_EXTERNAL   (1 << 2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE  (1 << 3)
+#define TTM_TT_FLAG_SWAPPEDBIT(0)
+#define TTM_TT_FLAG_ZERO_ALLOC BIT(1)
+#define TTM_TT_FLAG_EXTERNAL   BIT(2)
+#define TTM_TT_FLAG_EXTERNAL_MAPPABLE  BIT(3)
  
-#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)

+#define TTM_TT_FLAG_PRIV_POPULATED BIT(31)


While at it please just use BIT(4) for this, there is actually nothing 
special about it.


Christian.


uint32_t page_flags;
/** @num_pages: Number of pages in the page array. */
uint32_t num_pages;




Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path

2023-02-15 Thread Christian König

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

When hitting an error, the error path forgot to unmap dma mappings and


I don't see where this happens?


could call set_pages_wb() on already uncached pages.


Yeah, but what's the problem?

Regards,
Christian.



Fix this by introducing a common __ttm_pool_free() function that
does the right thing.

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König 
Cc: Dave Airlie 
Cc: Madhav Chauhan 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/ttm/ttm_pool.c | 74 +-
  1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..1cc7591a9542 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, 
unsigned int order,
return 0;
  }
  
+static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,

+   struct page **caching_divide,
+   enum ttm_caching initial_caching,
+   enum ttm_caching subseq_caching,
+   pgoff_t num_pages)
+{
+   enum ttm_caching caching = subseq_caching;
+   struct page **pages = tt->pages;
+   unsigned int order;
+   pgoff_t i, nr;
+
+   if (pool && caching_divide)
+   caching = initial_caching;
+
+   for (i = 0; i < num_pages; i += nr, pages += nr) {
+   struct ttm_pool_type *pt = NULL;
+
+   if (unlikely(caching_divide == pages))
+   caching = subseq_caching;
+
+   order = ttm_pool_page_order(pool, *pages);
+   nr = (1UL << order);
+   if (tt->dma_address)
+   ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+   pt = ttm_pool_select_type(pool, caching, order);
+   if (pt)
+   ttm_pool_type_give(pt, *pages);
+   else
+   ttm_pool_free_page(pool, caching, order, *pages);
+   }
+}
+
  /**
   * ttm_pool_alloc - Fill a ttm_tt object
   *
@@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
dma_addr_t *dma_addr = tt->dma_address;
struct page **caching = tt->pages;
struct page **pages = tt->pages;
+   enum ttm_caching page_caching;
gfp_t gfp_flags = GFP_USER;
-   unsigned int i, order;
+   unsigned int order;
struct page *p;
int r;
  
@@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,

 order = min_t(unsigned int, order, __fls(num_pages))) {
struct ttm_pool_type *pt;
  
+		page_caching = tt->caching;

pt = ttm_pool_select_type(pool, tt->caching, order);
p = pt ? ttm_pool_type_take(pt) : NULL;
if (p) {
@@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
if (r)
goto error_free_page;
  
+			caching = pages;

do {
r = ttm_pool_page_allocated(pool, order, p,
_addr,
@@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt 
*tt,
if (r)
goto error_free_page;
  
+caching = pages;

if (num_pages < (1 << order))
break;
  
  p = ttm_pool_type_take(pt);

} while (p);
-   caching = pages;
}
  
+		page_caching = ttm_cached;

while (num_pages >= (1 << order) &&
   (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
  
@@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,

   tt->caching);
if (r)
goto error_free_page;
+   caching = pages;
}
r = ttm_pool_page_allocated(pool, order, p, _addr,
_pages, );
@@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt 
*tt,
return 0;
  
  error_free_page:

-   ttm_pool_free_page(pool, tt->caching, order, p);
+   ttm_pool_free_page(pool, page_caching, order, p);
  
  error_free_all:

num_pages = tt->num_pages - num_pages;
-   for (i = 0; i < num_pages; ) {
-   order = ttm_pool_page_order(pool, tt->pages[i]);
-   ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
- 

Re: [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference

2023-02-15 Thread Christian König

Am 15.02.23 um 17:13 schrieb Thomas Hellström:

The LRU mechanism may look up a resource in the process of being removed
from an object. The locking rules here are a bit unclear but it looks
currently like res->bo assignment is protected by the LRU lock, whereas
bo->resource is protected by the object lock, while *clearing* of
bo->resource is also protected by the LRU lock. This means that if
we check that bo->resource points to the LRU resource under the LRU
lock we should be safe.
So perform that check before deciding to swap out a bo. That avoids
dereferencing a NULL bo->resource in ttm_bo_swapout().

Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Philip Yang 
Cc: Qiang Yu 
Cc: Matthew Auld 
Cc: Nirmoy Das 
Cc: Tvrtko Ursulin 
Cc: "Thomas Hellström" 
Cc: Anshuman Gupta 
Cc: Ramalingam C 
Cc: Arunpravin Paneer Selvam 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/ttm/ttm_device.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index c7a1862f322a..ae2f19dc9f81 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
struct ttm_buffer_object *bo = res->bo;
uint32_t num_pages;
  
-			if (!bo)

+   if (!bo || bo->resource != res)
continue;
  
  			num_pages = PFN_UP(bo->base.size);




Re: [Intel-gfx] [PATCH] Revert "drm/i915/hwmon: Enable PL1 power limit"

2023-02-15 Thread Rodrigo Vivi
On Wed, Feb 15, 2023 at 08:24:51AM -0800, Dixit, Ashutosh wrote:
> On Wed, 15 Feb 2023 07:37:30 -0800, Jani Nikula wrote:
> >
> > On Wed, 08 Feb 2023, Rodrigo Vivi  wrote:
> > > On Wed, Feb 08, 2023 at 11:03:12AM -0800, Ashutosh Dixit wrote:
> > >> This reverts commit 0349c41b05968befaffa5fbb7e73d0ee6004f610.
> > >>
> > >> 0349c41b0596 ("drm/i915/hwmon: Enable PL1 power limit") is incorrect and
> > >> caused a major regression on ATSM. The change enabled the PL1 power limit
> > >> but FW sets the default value of the PL1 limit to 0 which implies HW now
> > >> works at minimum power and therefore the lowest effective frequency. This
> > >> means all workloads now run slower resulting in even GuC FW load 
> > >> operations
> > >> timing out, rendering ATSM unusable.
> > >>
> > >> A different solution to the original issue of the PL1 limit being 
> > >> disabled
> > >> on ATSM is needed but till that is developed, revert 0349c41b0596.
> > >>
> > >> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> > >
> > > pushed to drm-intel-next and removed from drm-intel-fixes.
> > >
> > > Thanks for the quick reaction.
> >
> > Please always add Fixes: tags also to reverts.
> >
> > I suppose we should fix dim to also detect reverts, but I ended up
> > cherry-picking and pushing the original commit out to
> > drm-intel-next-fixes before realizing it's been reverted.
> 
> Oops, sorry!

That's my mistake. I should had thought about this when pushing
and removing from the fixes. I just realized yet, when this patch
showed up in my -fixes cherry-pick again, but without the revert.

I'm sorry.


Re: [PATCH] drm: document expectations for GETFB2 handles

2023-02-15 Thread Simon Ser
On Wednesday, February 15th, 2023 at 14:41, Pekka Paalanen 
 wrote:

> I didn't know it was at all possible to have different GEM handles
> pointing to the same object. DMABUF import is guaranteed to return the
> existing GEM handle, right? Why is GETFB2 different? Why does it not
> have the same problem as what forced DMABUF import to return existing
> handles?

drm_gem_prime_fd_to_handle() explicitly checks whether the memory object
already has a GEM handle via drm_prime_lookup_buf_handle(). OTOH,
drm_mode_getfb() and drm_mode_getfb2_ioctl() just unconditionally call
drm_gem_handle_create().

Yes, it's a rather inconsistent detail. A detail which becomes very
important when ref'counting and trying not to leak GEM handles from
user-space. Fortunately GETFB/GETFB2 usage is pretty seldom.


Re: [Intel-gfx] [PATCH] Revert "drm/i915/hwmon: Enable PL1 power limit"

2023-02-15 Thread Dixit, Ashutosh
On Wed, 15 Feb 2023 07:37:30 -0800, Jani Nikula wrote:
>
> On Wed, 08 Feb 2023, Rodrigo Vivi  wrote:
> > On Wed, Feb 08, 2023 at 11:03:12AM -0800, Ashutosh Dixit wrote:
> >> This reverts commit 0349c41b05968befaffa5fbb7e73d0ee6004f610.
> >>
> >> 0349c41b0596 ("drm/i915/hwmon: Enable PL1 power limit") is incorrect and
> >> caused a major regression on ATSM. The change enabled the PL1 power limit
> >> but FW sets the default value of the PL1 limit to 0 which implies HW now
> >> works at minimum power and therefore the lowest effective frequency. This
> >> means all workloads now run slower resulting in even GuC FW load operations
> >> timing out, rendering ATSM unusable.
> >>
> >> A different solution to the original issue of the PL1 limit being disabled
> >> on ATSM is needed but till that is developed, revert 0349c41b0596.
> >>
> >> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> >
> > pushed to drm-intel-next and removed from drm-intel-fixes.
> >
> > Thanks for the quick reaction.
>
> Please always add Fixes: tags also to reverts.
>
> I suppose we should fix dim to also detect reverts, but I ended up
> cherry-picking and pushing the original commit out to
> drm-intel-next-fixes before realizing it's been reverted.

Oops, sorry!


[RFC PATCH 16/16] drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool

2023-02-15 Thread Thomas Hellström
Remove the external i915 TTM shmem pool and replace it with the
normal TTM page allocation. Also provide a callback for the TTM
shrinker functionality.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   6 -
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 273 +++---
 drivers/gpu/drm/i915/i915_gem.c   |   3 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |   6 +-
 drivers/gpu/drm/ttm/ttm_tt.c  |   3 -
 include/drm/ttm/ttm_tt.h  |  15 +-
 8 files changed, 53 insertions(+), 264 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index f9a8acbba715..f694b5d479e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -282,12 +282,6 @@ i915_gem_object_is_shrinkable(const struct 
drm_i915_gem_object *obj)
return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
 }
 
-static inline bool
-i915_gem_object_has_self_managed_shrink_list(const struct drm_i915_gem_object 
*obj)
-{
-   return i915_gem_object_type_has(obj, 
I915_GEM_OBJECT_SELF_MANAGED_SHRINK_LIST);
-}
-
 static inline bool
 i915_gem_object_is_proxy(const struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 19c9bdd8f905..511dc1384a9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -544,12 +544,6 @@ struct drm_i915_gem_object {
 */
atomic_t shrink_pin;
 
-   /**
-* @ttm_shrinkable: True when the object is using shmem pages
-* underneath. Protected by the object lock.
-*/
-   bool ttm_shrinkable;
-
/**
 * @unknown_state: Indicate that the object is effectively
 * borked. This is write-once and set if we somehow encounter a
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..c39d45661b84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -73,7 +73,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object 
*obj,
shrinkable = false;
}
 
-   if (shrinkable && !i915_gem_object_has_self_managed_shrink_list(obj)) {
+   if (shrinkable) {
struct list_head *list;
unsigned long flags;
 
@@ -216,8 +216,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object 
*obj)
if (i915_gem_object_is_volatile(obj))
obj->mm.madv = I915_MADV_WILLNEED;
 
-   if (!i915_gem_object_has_self_managed_shrink_list(obj))
-   i915_gem_object_make_unshrinkable(obj);
+   i915_gem_object_make_unshrinkable(obj);
 
if (obj->mm.mapping) {
unmap_object(obj, page_mask_bits(obj->mm.mapping));
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 341b94672abc..f9bd4f50d495 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -3,8 +3,6 @@
  * Copyright © 2021 Intel Corporation
  */
 
-#include 
-
 #include 
 #include 
 #include 
@@ -37,8 +35,6 @@
  * @ttm: The base TTM page vector.
  * @dev: The struct device used for dma mapping and unmapping.
  * @cached_rsgt: The cached scatter-gather table.
- * @is_shmem: Set if using shmem.
- * @filp: The shmem file, if using shmem backend.
  *
  * Note that DMA may be going on right up to the point where the page-
  * vector is unpopulated in delayed destroy. Hence keep the
@@ -50,9 +46,6 @@ struct i915_ttm_tt {
struct ttm_tt ttm;
struct device *dev;
struct i915_refct_sgt cached_rsgt;
-
-   bool is_shmem;
-   struct file *filp;
 };
 
 static const struct ttm_place sys_placement_flags = {
@@ -185,75 +178,6 @@ i915_ttm_placement_from_obj(const struct 
drm_i915_gem_object *obj,
placement->busy_placement = busy;
 }
 
-static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev,
- struct ttm_tt *ttm,
- struct ttm_operation_ctx *ctx)
-{
-   struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
-   struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM];
-   struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
-   const unsigned int max_segment = i915_sg_segment_size(i915->drm.dev);
-   const size_t size = (size_t)ttm->num_pages << PAGE_SHIFT;
-   struct file *filp = i915_tt->filp;
-   struct sgt_iter sgt_iter;
-   struct sg_table *st;
-   struct 

[RFC PATCH 15/16] drm/ttm: Use fault-injection to test error paths

2023-02-15 Thread Thomas Hellström
Use fault-injection to test partial TTM swapout and interrupted swapin.
Return -EINTR for swapin to test the callers ability to handle and
restart the swapin, and on swapout perform a partial swapout to test that
the swapin and release_shrunken functionality.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/Kconfig| 10 ++
 drivers/gpu/drm/ttm/ttm_pool.c | 17 -
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 1efd33411a92..a78eed9af2c1 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -202,6 +202,16 @@ config DRM_TTM
  GPU memory types. Will be enabled automatically if a device driver
  uses it.
 
+config DRM_TTM_SHRINK_FAULT_INJECT
+   bool "Enable fault injection during TTM shrinking"
+   depends on DRM_TTM
+   default n
+   help
+ Inject recoverable failures during TTM shrinking and recovery of
+ shrunken objects. For DRM driver developers only.
+
+ If in doubt, choose N.
+
 config DRM_BUDDY
tristate
depends on DRM
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 319998b4a325..d7c604593689 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -453,6 +453,7 @@ static bool ttm_pool_restore_valid(const struct 
ttm_pool_tt_restore *restore)
 static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
   struct ttm_operation_ctx *ctx)
 {
+   static unsigned long __maybe_unused swappedin;
unsigned int i, nr = 1 << restore->order;
int ret = 0;
 
@@ -468,6 +469,13 @@ static int ttm_pool_swapin(struct ttm_pool_tt_restore 
*restore,
if (swap.val == 0)
continue;
 
+   if (IS_ENABLED(CONFIG_DRM_TTM_SHRINK_FAULT_INJECT) &&
+   ctx->interruptible &&
+   ++swappedin % 100 == 0) {
+   ret = -EINTR;
+   break;
+   }
+
ret = swap_copy_folio(swap, restore->first_page[i], 0,
  ctx->interruptible);
if (ret)
@@ -905,7 +913,14 @@ long ttm_pool_shrink_tt(struct ttm_pool *pool, struct 
ttm_tt *ttm)
if (current_is_kswapd())
alloc_gfp |= __GFP_NOMEMALLOC;
 
-   for (i = 0; i < ttm->num_pages; ++i) {
+   num_pages = ttm->num_pages;
+
+   /* Pretend doing fault injection by shrinking only half of the pages. */
+
+   if (IS_ENABLED(CONFIG_DRM_TTM_SHRINK_FAULT_INJECT))
+   num_pages = DIV_ROUND_UP(num_pages, 2);
+
+   for (i = 0; i < num_pages; ++i) {
page = ttm->pages[i];
if (unlikely(!page))
continue;
-- 
2.34.1



[PATCH 15/17] drm/cirrus: Introduce struct cirrus_primary_plane_state

2023-02-15 Thread Thomas Zimmermann
The cirrus driver maintains plane state, format and pitch, in it's
device structure. Introduce a plane state for the primary plane to
store the values.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 59 ++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 8a1ae94d9106..ec6b918dce7b 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -74,6 +74,16 @@ struct cirrus_device {
 
 #define to_cirrus(_dev) container_of(_dev, struct cirrus_device, dev)
 
+struct cirrus_primary_plane_state {
+   struct drm_shadow_plane_state base;
+};
+
+static inline struct cirrus_primary_plane_state *
+to_cirrus_primary_plane_state(struct drm_plane_state *plane_state)
+{
+   return container_of(plane_state, struct cirrus_primary_plane_state, 
base.base);
+};
+
 /* -- */
 /*
  * The meat of this driver. The core passes us a mode and we have to program
@@ -406,11 +416,58 @@ static const struct drm_plane_helper_funcs 
cirrus_primary_plane_helper_funcs = {
.atomic_update = cirrus_primary_plane_helper_atomic_update,
 };
 
+static struct drm_plane_state *
+cirrus_primary_plane_atomic_duplicate_state(struct drm_plane *plane)
+{
+   struct drm_plane_state *plane_state = plane->state;
+   struct cirrus_primary_plane_state *new_primary_plane_state;
+   struct drm_shadow_plane_state *new_shadow_plane_state;
+
+   if (!plane_state)
+   return NULL;
+
+   new_primary_plane_state = kzalloc(sizeof(*new_primary_plane_state), 
GFP_KERNEL);
+   if (!new_primary_plane_state)
+   return NULL;
+   new_shadow_plane_state = _primary_plane_state->base;
+
+   __drm_gem_duplicate_shadow_plane_state(plane, new_shadow_plane_state);
+
+   return _shadow_plane_state->base;
+}
+
+static void cirrus_primary_plane_atomic_destroy_state(struct drm_plane *plane,
+ struct drm_plane_state 
*plane_state)
+{
+   struct cirrus_primary_plane_state *primary_plane_state =
+   to_cirrus_primary_plane_state(plane_state);
+
+   __drm_gem_destroy_shadow_plane_state(_plane_state->base);
+   kfree(primary_plane_state);
+}
+
+static void cirrus_reset_primary_plane(struct drm_plane *plane)
+{
+   struct cirrus_primary_plane_state *primary_plane_state;
+
+   if (plane->state) {
+   cirrus_primary_plane_atomic_destroy_state(plane, plane->state);
+   plane->state = NULL; /* must be set to NULL here */
+   }
+
+   primary_plane_state = kzalloc(sizeof(*primary_plane_state), GFP_KERNEL);
+   if (!primary_plane_state)
+   return;
+   __drm_gem_reset_shadow_plane(plane, _plane_state->base);
+}
+
 static const struct drm_plane_funcs cirrus_primary_plane_funcs = {
.update_plane = drm_atomic_helper_update_plane,
.disable_plane = drm_atomic_helper_disable_plane,
.destroy = drm_plane_cleanup,
-   DRM_GEM_SHADOW_PLANE_FUNCS,
+   .reset = cirrus_reset_primary_plane,
+   .atomic_duplicate_state = cirrus_primary_plane_atomic_duplicate_state,
+   .atomic_destroy_state = cirrus_primary_plane_atomic_destroy_state,
 };
 
 static int cirrus_crtc_helper_atomic_check(struct drm_crtc *crtc, struct 
drm_atomic_state *state)
-- 
2.39.1



[RFC PATCH 14/16] drm/ttm: Provide helpers for shrinking

2023-02-15 Thread Thomas Hellström
Provide a helper to be used by the driver bo_shrink() callback to either
insert the pages of a struct ttm_tt into the swap-cache or to purge them
if the struct ttm_tt is purgeable. For pages with write-combined or
uncached linear kernel map, that linear kernel map is first changed to
cached.

Release pages with as little intermediate memory allocation as
possible, however some memory might be allocated during swapout for the
swap space radix tree.

Due to swapout- or swapin errors, allow partially swapped out struct
ttm_tt's, although mark them as swapped out stopping them from being
swapped out a second time. More details in the ttm_pool.c DOC section.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/Kconfig|   1 +
 drivers/gpu/drm/ttm/ttm_pool.c | 403 +++--
 drivers/gpu/drm/ttm/ttm_tt.c   |  34 +++
 include/drm/ttm/ttm_pool.h |   4 +
 include/drm/ttm/ttm_tt.h   |  10 +
 5 files changed, 437 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index dc0f94f02a82..1efd33411a92 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -196,6 +196,7 @@ source "drivers/gpu/drm/display/Kconfig"
 config DRM_TTM
tristate
depends on DRM && MMU
+   select SWAP_BACKUP_FOLIO
help
  GPU memory management subsystem for devices with multiple
  GPU memory types. Will be enabled automatically if a device driver
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 8787fb6a218b..319998b4a325 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86
 #include 
@@ -72,6 +73,32 @@ struct ttm_pool_dma {
unsigned long vaddr;
 };
 
+/**
+ * struct ttm_pool_tt_restore - State representing restore from swap.
+ * @alloced_pages: Total number of already allocated pages for the ttm_tt.
+ * @restored_pages: Number of (sub) pages restored from swap for this
+ *  chunk of 1 << @order pages.
+ * @first_page: The ttm page ptr representing for @old_pages[0].
+ * @caching_divide: Page pointer where subsequent pages are cached.
+ * @old_pages: Backup copy of page pointers that were replaced by the new
+ *page allocation.
+ * @pool: The pool used for page allocation while restoring.
+ * @order: The order of the last page allocated while restoring.
+ *
+ * Recovery from swap space might fail when we've recovered less than the
+ * full ttm_tt. In order not to loose any data (yet), keep information
+ * around that allows us to restart a failed ttm swap-space recovery.
+ */
+struct ttm_pool_tt_restore {
+   pgoff_t alloced_pages;
+   pgoff_t restored_pages;
+   struct page **first_page;
+   struct page **caching_divide;
+   struct page *old_pages[1 << TTM_MAX_ORDER];
+   struct ttm_pool *pool;
+   unsigned int order;
+};
+
 static unsigned long page_pool_size;
 
 MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
@@ -91,6 +118,23 @@ static struct shrinker mm_shrinker;
 
 static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
 
+static struct page *ttm_pool_swap_to_page_ptr(swp_entry_t swap)
+{
+   return (struct page *)(swap.val << 1 | 1);
+}
+
+static swp_entry_t ttm_pool_page_ptr_to_swap(const struct page *p)
+{
+   swp_entry_t swap = {.val = ((unsigned long)p) >> 1};
+
+   return swap;
+}
+
+static bool ttm_pool_page_ptr_is_swap(const struct page *p)
+{
+   return ((unsigned long)p) & 1;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
unsigned int order)
@@ -361,11 +405,99 @@ static unsigned int ttm_pool_page_order(struct ttm_pool 
*pool, struct page *p)
return p->private;
 }
 
+/*
+ * To be able to insert single pages into the swap cache directly,
+ * we need to split multi-order page allocations and make them look
+ * like single page-allocations.
+ */
+static void ttm_pool_split_for_swap(struct ttm_pool *pool, struct page *p)
+{
+   unsigned int order = ttm_pool_page_order(pool, p);
+   pgoff_t nr;
+
+   if (!order)
+   return;
+
+   split_page(p, order);
+   nr = 1UL << order;
+   while (nr--)
+   (p++)->private = 0;
+}
+
+/**
+ * DOC: Partial shrinking and restoration of a struct ttm_tt.
+ *
+ * Swapout using swap_backup_folio() and swapin using swap_copy_folio() may 
fail.
+ * The former most likely due to lack of swap-space or memory, the latter due
+ * to lack of memory or because of signal interruption during waits.
+ *
+ * Swapout failure is easily handled by using a ttm_tt pages vector that holds
+ * both swap entries and page pointers. This has to be taken into account when
+ * restoring such a ttm_tt from swap, and when freeing it while swapped 

[PATCH 12/17] drm/cirrus: Remove size test from cirrus_fb_create()

2023-02-15 Thread Thomas Zimmermann
The DRM core implements a size check against the mode config's
limits when creating a framebuffer. [1] Remove the unnecessary
test from cirrus_fb_create() and remove the now-empty function.
Create framebuffers with drm_gem_fb_create_with_dirty().

Signed-off-by: Thomas Zimmermann 
Link: 
https://elixir.bootlin.com/linux/v6.1/source/drivers/gpu/drm/drm_framebuffer.c#L287
 # [1]
---
 drivers/gpu/drm/tiny/cirrus.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index c1ffbbe1d545..c2d7bb775629 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -555,17 +555,8 @@ static int cirrus_pipe_init(struct cirrus_device *cirrus)
 /* -- */
 /* cirrus framebuffers & mode config */
 
-static struct drm_framebuffer*
-cirrus_fb_create(struct drm_device *dev, struct drm_file *file_priv,
-const struct drm_mode_fb_cmd2 *mode_cmd)
-{
-   if (cirrus_check_size(mode_cmd->width, mode_cmd->height, NULL) < 0)
-   return ERR_PTR(-EINVAL);
-   return drm_gem_fb_create_with_dirty(dev, file_priv, mode_cmd);
-}
-
 static const struct drm_mode_config_funcs cirrus_mode_config_funcs = {
-   .fb_create = cirrus_fb_create,
+   .fb_create = drm_gem_fb_create_with_dirty,
.atomic_check = drm_atomic_helper_check,
.atomic_commit = drm_atomic_helper_commit,
 };
-- 
2.39.1



[PATCH 14/17] drm/cirrus: Inline cirrus_check_size() into primary-plane atomic_check

2023-02-15 Thread Thomas Zimmermann
Inline the framebuffer size check into the primary plane's atomic_check
cirrus_primary_plane_atomic_check(). No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 6c2be39d79a5..8a1ae94d9106 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -317,21 +317,6 @@ static void cirrus_pitch_set(struct cirrus_device *cirrus,
cirrus_set_start_address(cirrus, 0);
 }
 
-static int cirrus_check_size(int width, int height,
-struct drm_framebuffer *fb)
-{
-   int pitch = width * 2;
-
-   if (fb)
-   pitch = cirrus_pitch(fb);
-
-   if (pitch > CIRRUS_MAX_PITCH)
-   return -EINVAL;
-   if (pitch * height > CIRRUS_VRAM_SIZE)
-   return -EINVAL;
-   return 0;
-}
-
 /* -- */
 /* cirrus display pipe   */
 
@@ -354,6 +339,7 @@ static int cirrus_primary_plane_helper_atomic_check(struct 
drm_plane *plane,
struct drm_crtc *new_crtc = new_plane_state->crtc;
struct drm_crtc_state *new_crtc_state = NULL;
int ret;
+   unsigned int pitch;
 
if (new_crtc)
new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc);
@@ -367,7 +353,15 @@ static int cirrus_primary_plane_helper_atomic_check(struct 
drm_plane *plane,
else if (!new_plane_state->visible)
return 0;
 
-   return cirrus_check_size(fb->width, fb->height, fb);
+   pitch = cirrus_pitch(fb);
+
+   /* validate size constraints */
+   if (pitch > CIRRUS_MAX_PITCH)
+   return -EINVAL;
+   else if (pitch * fb->height > CIRRUS_VRAM_SIZE)
+   return -EINVAL;
+
+   return 0;
 }
 
 static void cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
-- 
2.39.1



[PATCH 16/17] drm/cirrus: Store HW format/pitch in primary-plane state

2023-02-15 Thread Thomas Zimmermann
The hardware settings for color format and pitch are state of the
primary plane. Store the values in the primary plane's state structure
struct cirrus_primary_plane_state. Adapt all callers.

All fields in struct cirrus_device are now considered immutable after
initialization. Plane updates consider the difference between the old
and the new plane state before updating format or pitch.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 51 +--
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index ec6b918dce7b..ad67fb895213 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -63,10 +63,6 @@ struct cirrus_device {
struct drm_encoder encoder;
struct drm_connector   connector;
 
-   /* HW scanout buffer */
-   const struct drm_format_info   *format;
-   unsigned int   pitch;
-
/* HW resources */
void __iomem   *vram;
void __iomem   *mmio;
@@ -76,6 +72,10 @@ struct cirrus_device {
 
 struct cirrus_primary_plane_state {
struct drm_shadow_plane_state base;
+
+   /* HW scanout buffer */
+   const struct drm_format_info   *format;
+   unsigned int   pitch;
 };
 
 static inline struct cirrus_primary_plane_state *
@@ -268,15 +268,14 @@ static void cirrus_mode_set(struct cirrus_device *cirrus,
 }
 
 static void cirrus_format_set(struct cirrus_device *cirrus,
- struct drm_framebuffer *fb)
+ const struct drm_format_info *format)
 {
u8 sr07, hdr;
 
sr07 = rreg_seq(cirrus, 0x07);
sr07 &= 0xe0;
 
-   cirrus->format = cirrus_format(fb);
-   switch (cirrus->format->format) {
+   switch (format->format) {
case DRM_FORMAT_C8:
sr07 |= 0x11;
hdr = 0x00;
@@ -308,20 +307,18 @@ static void cirrus_format_set(struct cirrus_device 
*cirrus,
wreg_hdr(cirrus, hdr);
 }
 
-static void cirrus_pitch_set(struct cirrus_device *cirrus,
-struct drm_framebuffer *fb)
+static void cirrus_pitch_set(struct cirrus_device *cirrus, unsigned int pitch)
 {
u8 cr13, cr1b;
 
/* Program the pitch */
-   cirrus->pitch = cirrus_pitch(fb);
-   cr13 = cirrus->pitch / 8;
+   cr13 = pitch / 8;
wreg_crt(cirrus, VGA_CRTC_OFFSET, cr13);
 
/* Enable extended blanking and pitch bits, and enable full memory */
cr1b = 0x22;
-   cr1b |= (cirrus->pitch >> 7) & 0x10;
-   cr1b |= (cirrus->pitch >> 6) & 0x40;
+   cr1b |= (pitch >> 7) & 0x10;
+   cr1b |= (pitch >> 6) & 0x40;
wreg_crt(cirrus, 0x1b, cr1b);
 
cirrus_set_start_address(cirrus, 0);
@@ -345,6 +342,8 @@ static int cirrus_primary_plane_helper_atomic_check(struct 
drm_plane *plane,
struct drm_atomic_state 
*state)
 {
struct drm_plane_state *new_plane_state = 
drm_atomic_get_new_plane_state(state, plane);
+   struct cirrus_primary_plane_state *new_primary_plane_state =
+   to_cirrus_primary_plane_state(new_plane_state);
struct drm_framebuffer *fb = new_plane_state->fb;
struct drm_crtc *new_crtc = new_plane_state->crtc;
struct drm_crtc_state *new_crtc_state = NULL;
@@ -371,6 +370,9 @@ static int cirrus_primary_plane_helper_atomic_check(struct 
drm_plane *plane,
else if (pitch * fb->height > CIRRUS_VRAM_SIZE)
return -EINVAL;
 
+   new_primary_plane_state->format = cirrus_format(fb);
+   new_primary_plane_state->pitch = pitch;
+
return 0;
 }
 
@@ -379,9 +381,15 @@ static void 
cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
 {
struct cirrus_device *cirrus = to_cirrus(plane->dev);
struct drm_plane_state *plane_state = 
drm_atomic_get_new_plane_state(state, plane);
+   struct cirrus_primary_plane_state *primary_plane_state =
+   to_cirrus_primary_plane_state(plane_state);
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(plane_state);
struct drm_framebuffer *fb = plane_state->fb;
+   const struct drm_format_info *format = primary_plane_state->format;
+   unsigned int pitch = primary_plane_state->pitch;
struct drm_plane_state *old_plane_state = 
drm_atomic_get_old_plane_state(state, plane);
+   struct cirrus_primary_plane_state *old_primary_plane_state =
+   to_cirrus_primary_plane_state(old_plane_state);
struct iosys_map vaddr = IOSYS_MAP_INIT_VADDR_IOMEM(cirrus->vram);
struct drm_atomic_helper_damage_iter iter;
struct drm_rect damage;
@@ -393,18 +401,17 @@ static void 
cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
if (!drm_dev_enter(>dev, ))
 

[PATCH 13/17] drm/cirrus: Test mode against video-memory size in device-wide mode_valid

2023-02-15 Thread Thomas Zimmermann
Test a display mode against the available amount of video memory in
struct drm_mode_config_funcs.mode_valid, which cirrus implements in
cirrus_mode_config_mode_valid(). This helper tests display modes against
device-wide limits. Remove the now-obsolete per-CRTC test.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index c2d7bb775629..6c2be39d79a5 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -419,15 +419,6 @@ static const struct drm_plane_funcs 
cirrus_primary_plane_funcs = {
DRM_GEM_SHADOW_PLANE_FUNCS,
 };
 
-static enum drm_mode_status cirrus_crtc_helper_mode_valid(struct drm_crtc 
*crtc,
- const struct 
drm_display_mode *mode)
-{
-   if (cirrus_check_size(mode->hdisplay, mode->vdisplay, NULL) < 0)
-   return MODE_BAD;
-
-   return MODE_OK;
-}
-
 static int cirrus_crtc_helper_atomic_check(struct drm_crtc *crtc, struct 
drm_atomic_state *state)
 {
struct drm_crtc_state *crtc_state = 
drm_atomic_get_new_crtc_state(state, crtc);
@@ -462,7 +453,6 @@ static void cirrus_crtc_helper_atomic_enable(struct 
drm_crtc *crtc,
 }
 
 static const struct drm_crtc_helper_funcs cirrus_crtc_helper_funcs = {
-   .mode_valid = cirrus_crtc_helper_mode_valid,
.atomic_check = cirrus_crtc_helper_atomic_check,
.atomic_enable = cirrus_crtc_helper_atomic_enable,
 };
@@ -555,8 +545,21 @@ static int cirrus_pipe_init(struct cirrus_device *cirrus)
 /* -- */
 /* cirrus framebuffers & mode config */
 
+static enum drm_mode_status cirrus_mode_config_mode_valid(struct drm_device 
*dev,
+ const struct 
drm_display_mode *mode)
+{
+   const struct drm_format_info *format = 
drm_format_info(DRM_FORMAT_XRGB);
+   uint64_t pitch = drm_format_info_min_pitch(format, 0, mode->hdisplay);
+
+   if (pitch * mode->vdisplay > CIRRUS_VRAM_SIZE)
+   return MODE_MEM;
+
+   return MODE_OK;
+}
+
 static const struct drm_mode_config_funcs cirrus_mode_config_funcs = {
.fb_create = drm_gem_fb_create_with_dirty,
+   .mode_valid = cirrus_mode_config_mode_valid,
.atomic_check = drm_atomic_helper_check,
.atomic_commit = drm_atomic_helper_commit,
 };
-- 
2.39.1



[PATCH 17/17] drm/cirrus: Use VGA macro constants to unblank

2023-02-15 Thread Thomas Zimmermann
Set the VGA bit for unblanking with macro constants instead of magic
values. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index ad67fb895213..594bc472862f 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -509,7 +509,7 @@ static void cirrus_crtc_helper_atomic_enable(struct 
drm_crtc *crtc,
cirrus_mode_set(cirrus, _state->mode);
 
/* Unblank (needed on S3 resume, vgabios doesn't do it then) */
-   outb(0x20, 0x3c0);
+   outb(VGA_AR_ENABLE_DISPLAY, VGA_ATT_W);
 
drm_dev_exit(idx);
 }
-- 
2.39.1



[PATCH 10/17] drm/cirrus: Inline cirrus_fb_blit_rect()

2023-02-15 Thread Thomas Zimmermann
Inline cirrus_fb_blit_rect into its only caller. While at it, update
the code to use IOSYS_MAP_INIT_OFFSET(), which is the ideomatic way
of initializing struct iosys_map with an offset.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 46c6aa34ba79..a483abc2e6ba 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -317,21 +317,6 @@ static void cirrus_pitch_set(struct cirrus_device *cirrus,
cirrus_set_start_address(cirrus, 0);
 }
 
-static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
-  const struct iosys_map *vmap,
-  struct drm_rect *rect)
-{
-   struct cirrus_device *cirrus = to_cirrus(fb->dev);
-   struct iosys_map dst;
-
-   iosys_map_set_vaddr_iomem(, cirrus->vram);
-   iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, fb->format, 
rect));
-
-   drm_fb_blit(, >pitch, cirrus->format->format, vmap, fb, 
rect);
-
-   return 0;
-}
-
 static int cirrus_check_size(int width, int height,
 struct drm_framebuffer *fb)
 {
@@ -393,6 +378,7 @@ static void 
cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(plane_state);
struct drm_framebuffer *fb = plane_state->fb;
struct drm_plane_state *old_plane_state = 
drm_atomic_get_old_plane_state(state, plane);
+   struct iosys_map vaddr = IOSYS_MAP_INIT_VADDR_IOMEM(cirrus->vram);
struct drm_atomic_helper_damage_iter iter;
struct drm_rect damage;
int idx;
@@ -410,7 +396,11 @@ static void 
cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
 
drm_atomic_helper_damage_iter_init(, old_plane_state, plane_state);
drm_atomic_for_each_plane_damage(, ) {
-   cirrus_fb_blit_rect(fb, _plane_state->data[0], );
+   unsigned int offset = drm_fb_clip_offset(cirrus->pitch, 
fb->format, );
+   struct iosys_map dst = IOSYS_MAP_INIT_OFFSET(, offset);
+
+   drm_fb_blit(, >pitch, cirrus->format->format,
+   _plane_state->data[0], fb, );
}
 
drm_dev_exit(idx);
-- 
2.39.1



[PATCH 08/17] drm/cirrus: Convert to regular atomic helpers

2023-02-15 Thread Thomas Zimmermann
Replace simple-KMS helpers with DRM's regular helpers for atomic
modesetting. Avoids the mid-layer and the additional wrappers around
GEM's shadow-plane helpers.

Most of the simple-KMS code is just wrappers around regular atomic
helpers. The conversion is therefore equivalent to pulling the
simple-KMS helpers into cirrus and removing all the intermediate
code and data structures between the driver and the atomic helpers.
As the simple-KMS helpers lump primary plan, CRTC and encoder into a
single data structure, the conversion to regular helpers allows to
split modesetting from plane updates and handle each individually.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 202 +++---
 1 file changed, 138 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 7ca6a897a2b2..af26de9ef329 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -24,6 +24,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,7 +44,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define DRIVER_NAME "cirrus"
 #define DRIVER_DESC "qemu cirrus vga"
@@ -56,10 +56,18 @@
 
 struct cirrus_device {
struct drm_device  dev;
-   struct drm_simple_display_pipe pipe;
+
+   /* modesetting pipeline */
+   struct drm_plane   primary_plane;
+   struct drm_crtccrtc;
+   struct drm_encoder encoder;
struct drm_connector   connector;
+
+   /* HW scanout buffer */
const struct drm_format_info   *format;
unsigned int   pitch;
+
+   /* HW resources */
void __iomem   *vram;
void __iomem   *mmio;
 };
@@ -324,18 +332,6 @@ static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
return 0;
 }
 
-static int cirrus_fb_blit_fullscreen(struct drm_framebuffer *fb,
-const struct iosys_map *map)
-{
-   struct drm_rect fullscreen = {
-   .x1 = 0,
-   .x2 = fb->width,
-   .y1 = 0,
-   .y2 = fb->height,
-   };
-   return cirrus_fb_blit_rect(fb, map, );
-}
-
 static int cirrus_check_size(int width, int height,
 struct drm_framebuffer *fb)
 {
@@ -365,78 +361,130 @@ static const uint64_t 
cirrus_primary_plane_format_modifiers[] = {
DRM_FORMAT_MOD_INVALID
 };
 
-static enum drm_mode_status cirrus_pipe_mode_valid(struct 
drm_simple_display_pipe *pipe,
-  const struct 
drm_display_mode *mode)
+static int cirrus_primary_plane_helper_atomic_check(struct drm_plane *plane,
+   struct drm_atomic_state 
*state)
 {
-   if (cirrus_check_size(mode->hdisplay, mode->vdisplay, NULL) < 0)
-   return MODE_BAD;
-   return MODE_OK;
-}
+   struct drm_plane_state *new_plane_state = 
drm_atomic_get_new_plane_state(state, plane);
+   struct drm_framebuffer *fb = new_plane_state->fb;
+   struct drm_crtc *new_crtc = new_plane_state->crtc;
+   struct drm_crtc_state *new_crtc_state = NULL;
+   int ret;
 
-static int cirrus_pipe_check(struct drm_simple_display_pipe *pipe,
-struct drm_plane_state *plane_state,
-struct drm_crtc_state *crtc_state)
-{
-   struct drm_framebuffer *fb = plane_state->fb;
+   if (new_crtc)
+   new_crtc_state = drm_atomic_get_new_crtc_state(state, new_crtc);
 
-   if (!fb)
+   ret = drm_atomic_helper_check_plane_state(new_plane_state, 
new_crtc_state,
+ DRM_PLANE_NO_SCALING,
+ DRM_PLANE_NO_SCALING,
+ false, false);
+   if (ret)
+   return ret;
+   else if (!new_plane_state->visible)
return 0;
+
return cirrus_check_size(fb->width, fb->height, fb);
 }
 
-static void cirrus_pipe_enable(struct drm_simple_display_pipe *pipe,
-  struct drm_crtc_state *crtc_state,
-  struct drm_plane_state *plane_state)
+static void cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
+ struct drm_atomic_state 
*state)
 {
-   struct cirrus_device *cirrus = to_cirrus(pipe->crtc.dev);
+   struct cirrus_device *cirrus = to_cirrus(plane->dev);
+   struct drm_plane_state *plane_state = 
drm_atomic_get_new_plane_state(state, plane);
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(plane_state);
+   struct drm_framebuffer *fb = plane_state->fb;
+   struct drm_plane_state *old_plane_state = 
drm_atomic_get_old_plane_state(state, plane);
+   struct 

[RFC PATCH 12/16] mm: Add interfaces to back up and recover folio contents using swap

2023-02-15 Thread Thomas Hellström
GPU drivers have traditionally used shmem to back up GPU buffer contents
for swap on physical memory shortage. Some integrated GPU drivers use
shmem files as the backing storage for their GPU buffers, other drivers,
in particular drivers that need a Write-Combining caching strategy on
system pages, (but also drivers for discrete gpus in general) need to copy
to shmem on anticipated memory shortage.

The latter strategy does not lend itself very well to shrinker usage,
since shmem memory needs to be allocated and page trylocking of pagecache
pages need to be performed from reclaim context and both are prone to
failures. That makes the approach very fragile at best.

Add interfaces for GPU drivers to directly insert pages into the
swap-cache, thereby bypassing shmem and avoiding the shmem page
allocation and locking at shrink time completely, as well as the
content copy.

Also add a kunit test for experimenting with the interface functionality,
currently it seems PMD size folios doesn't work properly. Needs
further investigation if this is a viable approach.

Cc: Andrew Morton 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Miaohe Lin 
Cc: David Hildenbrand 
Cc: Johannes Weiner 
Cc: Peter Xu 
Cc: NeilBrown 
Cc: linux...@kvack.org

Signed-off-by: Thomas Hellström 
---
 include/linux/swap.h|  10 ++
 mm/Kconfig  |  18 
 mm/Makefile |   2 +
 mm/swap_backup_folio.c  | 178 
 mm/swap_backup_folio_test.c | 111 ++
 5 files changed, 319 insertions(+)
 create mode 100644 mm/swap_backup_folio.c
 create mode 100644 mm/swap_backup_folio_test.c

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0ceed49516ad..fc38c72fe9ab 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -706,5 +706,15 @@ static inline bool mem_cgroup_swap_full(struct folio 
*folio)
 }
 #endif
 
+#ifdef CONFIG_SWAP_BACKUP_FOLIO
+swp_entry_t swap_backup_folio(struct folio *folio, bool writeback,
+ gfp_t folio_gfp, gfp_t alloc_gfp);
+
+int swap_copy_folio(swp_entry_t swap, struct page *page, unsigned long index,
+   bool killable);
+
+void swap_drop_folio(swp_entry_t swap);
+#endif
+
 #endif /* __KERNEL__*/
 #endif /* _LINUX_SWAP_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..b9e0a40e9e1a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -191,6 +191,10 @@ config ZSMALLOC_STAT
  information to userspace via debugfs.
  If unsure, say N.
 
+config SWAP_BACKUP_FOLIO
+   bool
+   default n
+
 menu "SLAB allocator options"
 
 choice
@@ -1183,6 +1187,20 @@ config LRU_GEN_STATS
  This option has a per-memcg and per-node memory overhead.
 # }
 
+config SWAP_BACKUP_FOLIO_KUNIT_TEST
+   tristate "KUnit tests for swap_backup_folio() functionality" if 
!KUNIT_ALL_TESTS
+   depends on SWAP && KUNIT && SWAP_BACKUP_FOLIO
+   help
+This builds unit tests for the swap_backup_folio_functionality().
+This option is not useful for distributions or general kernels,
+but only for kernel developers working on MM swap functionality.
+
+For more information on KUnit and unit tests in general,
+please refer to the KUnit documentation in
+Documentation/dev-tools/kunit/.
+
+If in doubt, say "N".
+
 source "mm/damon/Kconfig"
 
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 8e105e5b3e29..91cb9c73e16e 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -138,3 +138,5 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o
 obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
 obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
+obj-$(CONFIG_SWAP_BACKUP_FOLIO) += swap_backup_folio.o
+obj-$(CONFIG_SWAP_BACKUP_FOLIO_KUNIT_TEST) += swap_backup_folio_test.o
diff --git a/mm/swap_backup_folio.c b/mm/swap_backup_folio.c
new file mode 100644
index ..f77ca478e625
--- /dev/null
+++ b/mm/swap_backup_folio.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "swap.h"
+
+/**
+ * swap_backup_folio() - Insert an isolated folio into the swap-cache.
+ * @folio: The folio to insert.
+ * @writeback: Whether to perform immediate writeback.
+ * @folio_gfp: The gfp value used when the folio was allocated. Used for
+ * cgroup charging only.
+ * @alloc_fgp: The gfp value used for swap cache radix tree memory allocations.
+ *
+ * Insert a folio into the swap cache and get a swp_entry_t back as a 
reference.
+ * If the swap cache folio should be subject of immediate writeback to
+ * a swap device, @writeback should be set to true.
+ * After a call to swap_backup_folio() the caller can
+ * drop its folio reference and use swap_copy_folio() to get the folio
+ * content back, or swap_drop_folio() to drop it completely.
+ * Currently only PAGE_SIZE folios work, or if CONFIG_THP_SWAP is
+ * enabled, HPAGE_PMD_NR*PAGE_SIZE may 

[PATCH 09/17] drm/cirrus: Enable damage clipping on primary plane

2023-02-15 Thread Thomas Zimmermann
Enable damage clipping on the primary plane and iterate over small
areas of reported framebuffer damage. Avoid the overhead of permanent
full-screen updates that cirrus currently implements.

This problem is indicated by the warning

  drm_plane_enable_fb_damage_clips() not called

in the kernel's log. Without damage clipping, drivers do full updates
of the screen area. This is costly as many screen updates, such as
cursor movement or command-line input, only change a small portion
of the output. Damage clipping allows renderers to inform drivers about
the changed areas.

With the damage information known, cirrus now iterates over a list of
change areas and only flushes those to the hardware's scanout buffer.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index af26de9ef329..46c6aa34ba79 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -393,7 +393,8 @@ static void 
cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(plane_state);
struct drm_framebuffer *fb = plane_state->fb;
struct drm_plane_state *old_plane_state = 
drm_atomic_get_old_plane_state(state, plane);
-   struct drm_rect rect;
+   struct drm_atomic_helper_damage_iter iter;
+   struct drm_rect damage;
int idx;
 
if (!fb)
@@ -407,8 +408,10 @@ static void 
cirrus_primary_plane_helper_atomic_update(struct drm_plane *plane,
if (cirrus->pitch != cirrus_pitch(fb))
cirrus_pitch_set(cirrus, fb);
 
-   if (drm_atomic_helper_damage_merged(old_plane_state, plane_state, 
))
-   cirrus_fb_blit_rect(fb, _plane_state->data[0], );
+   drm_atomic_helper_damage_iter_init(, old_plane_state, plane_state);
+   drm_atomic_for_each_plane_damage(, ) {
+   cirrus_fb_blit_rect(fb, _plane_state->data[0], );
+   }
 
drm_dev_exit(idx);
 }
@@ -529,6 +532,7 @@ static int cirrus_pipe_init(struct cirrus_device *cirrus)
if (ret)
return ret;
drm_plane_helper_add(primary_plane, _primary_plane_helper_funcs);
+   drm_plane_enable_fb_damage_clips(primary_plane);
 
crtc = >crtc;
ret = drm_crtc_init_with_planes(dev, crtc, primary_plane, NULL,
-- 
2.39.1



[RFC PATCH 13/16] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting

2023-02-15 Thread Thomas Hellström
When swapping in, or under memory pressure ttm_tt_populate() may sleep
for a substantiable amount of time. Allow interrupts during the sleep.
This will also allow us to inject -EINTR errors during swapin in upcoming
patches.

Also avoid returning VM_FAULT_OOM, since that will confuse the core
mm, making it print out a confused message and retrying the fault.
Return VM_FAULT_SIGBUS also under OOM conditions.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 3ecda6db24b8..80f106bfe385 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -218,14 +218,21 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
prot = ttm_io_prot(bo, bo->resource, prot);
if (!bo->resource->bus.is_iomem) {
struct ttm_operation_ctx ctx = {
-   .interruptible = false,
+   .interruptible = true,
.no_wait_gpu = false,
.force_alloc = true
};
 
ttm = bo->ttm;
-   if (ttm_tt_populate(bdev, bo->ttm, ))
-   return VM_FAULT_OOM;
+   err = ttm_tt_populate(bdev, bo->ttm, );
+   if (err) {
+   if (err == -EINTR || err == -ERESTARTSYS ||
+   err == -EAGAIN)
+   return VM_FAULT_NOPAGE;
+
+   pr_debug("TTM fault hit %pe.\n", ERR_PTR(err));
+   return VM_FAULT_SIGBUS;
+   }
} else {
/* Iomem should not be marked encrypted */
prot = pgprot_decrypted(prot);
-- 
2.34.1



[PATCH 11/17] drm/cirrus: Remove format test from cirrus_fb_create()

2023-02-15 Thread Thomas Zimmermann
The DRM core implements a format check when setting a framebuffer
for a plane. [1] Remove the unnecessary test from cirrus_fb_create().

Signed-off-by: Thomas Zimmermann 
Link: 
https://elixir.bootlin.com/linux/v6.1/source/drivers/gpu/drm/drm_atomic.c#L629 
# [1]
---
 drivers/gpu/drm/tiny/cirrus.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index a483abc2e6ba..c1ffbbe1d545 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -559,10 +559,6 @@ static struct drm_framebuffer*
 cirrus_fb_create(struct drm_device *dev, struct drm_file *file_priv,
 const struct drm_mode_fb_cmd2 *mode_cmd)
 {
-   if (mode_cmd->pixel_format != DRM_FORMAT_RGB565 &&
-   mode_cmd->pixel_format != DRM_FORMAT_RGB888 &&
-   mode_cmd->pixel_format != DRM_FORMAT_XRGB)
-   return ERR_PTR(-EINVAL);
if (cirrus_check_size(mode_cmd->width, mode_cmd->height, NULL) < 0)
return ERR_PTR(-EINVAL);
return drm_gem_fb_create_with_dirty(dev, file_priv, mode_cmd);
-- 
2.39.1



[PATCH 07/17] drm/cirrus: Move primary-plane format arrays

2023-02-15 Thread Thomas Zimmermann
Move the primary plane's format and modifier arrays within the
source file and adapt naming slightly. No functional changes.

Done in preparation of converting cirrus to regular atomic helpers.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index cc1d45ea1f62..7ca6a897a2b2 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -354,6 +354,17 @@ static int cirrus_check_size(int width, int height,
 /* -- */
 /* cirrus display pipe   */
 
+static const uint32_t cirrus_primary_plane_formats[] = {
+   DRM_FORMAT_RGB565,
+   DRM_FORMAT_RGB888,
+   DRM_FORMAT_XRGB,
+};
+
+static const uint64_t cirrus_primary_plane_format_modifiers[] = {
+   DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_INVALID
+};
+
 static enum drm_mode_status cirrus_pipe_mode_valid(struct 
drm_simple_display_pipe *pipe,
   const struct 
drm_display_mode *mode)
 {
@@ -428,17 +439,6 @@ static const struct drm_simple_display_pipe_funcs 
cirrus_pipe_funcs = {
DRM_GEM_SIMPLE_DISPLAY_PIPE_SHADOW_PLANE_FUNCS,
 };
 
-static const uint32_t cirrus_formats[] = {
-   DRM_FORMAT_RGB565,
-   DRM_FORMAT_RGB888,
-   DRM_FORMAT_XRGB,
-};
-
-static const uint64_t cirrus_modifiers[] = {
-   DRM_FORMAT_MOD_LINEAR,
-   DRM_FORMAT_MOD_INVALID
-};
-
 static int cirrus_connector_helper_get_modes(struct drm_connector *connector)
 {
int count;
@@ -478,9 +478,9 @@ static int cirrus_pipe_init(struct cirrus_device *cirrus)
return drm_simple_display_pipe_init(dev,
>pipe,
_pipe_funcs,
-   cirrus_formats,
-   ARRAY_SIZE(cirrus_formats),
-   cirrus_modifiers,
+   cirrus_primary_plane_formats,
+   
ARRAY_SIZE(cirrus_primary_plane_formats),
+   
cirrus_primary_plane_format_modifiers,
connector);
 }
 
-- 
2.39.1



[PATCH 05/17] drm/cirrus: Split cirrus_mode_set() into smaller functions

2023-02-15 Thread Thomas Zimmermann
Split cirrus_mode_set() into smaller functions that set the display
mode, color format and scnaline pitch individually. Better reflects
the design of the DRM modesetting pipeline.

Done in preparation of converting cirrus to regular atomic helpers.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 63 +--
 1 file changed, 38 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 0b02244bd9f1..60488e49bdb5 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -178,14 +178,12 @@ static void cirrus_set_start_address(struct cirrus_device 
*cirrus, u32 offset)
wreg_crt(cirrus, 0x1d, tmp);
 }
 
-static int cirrus_mode_set(struct cirrus_device *cirrus,
-  struct drm_display_mode *mode,
-  struct drm_framebuffer *fb)
+static void cirrus_mode_set(struct cirrus_device *cirrus,
+   struct drm_display_mode *mode)
 {
int hsyncstart, hsyncend, htotal, hdispend;
int vtotal, vdispend;
int tmp;
-   int sr07 = 0, hdr = 0;
 
htotal = mode->htotal / 8;
hsyncend = mode->hsync_end / 8;
@@ -249,15 +247,21 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
 
/* Disable Hercules/CGA compatibility */
wreg_crt(cirrus, VGA_CRTC_MODE, 0x03);
+}
+
+static void cirrus_format_set(struct cirrus_device *cirrus,
+ struct drm_framebuffer *fb)
+{
+   u8 sr07, hdr;
 
sr07 = rreg_seq(cirrus, 0x07);
sr07 &= 0xe0;
-   hdr = 0;
 
cirrus->format = cirrus_format(fb);
switch (cirrus->format->format) {
case DRM_FORMAT_C8:
sr07 |= 0x11;
+   hdr = 0x00;
break;
case DRM_FORMAT_RGB565:
sr07 |= 0x17;
@@ -272,22 +276,11 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
hdr = 0xc5;
break;
default:
-   return -1;
+   return;
}
 
wreg_seq(cirrus, 0x7, sr07);
 
-   /* Program the pitch */
-   cirrus->pitch = cirrus_pitch(fb);
-   tmp = cirrus->pitch / 8;
-   wreg_crt(cirrus, VGA_CRTC_OFFSET, tmp);
-
-   /* Enable extended blanking and pitch bits, and enable full memory */
-   tmp = 0x22;
-   tmp |= (cirrus->pitch >> 7) & 0x10;
-   tmp |= (cirrus->pitch >> 6) & 0x40;
-   wreg_crt(cirrus, 0x1b, tmp);
-
/* Enable high-colour modes */
wreg_gfx(cirrus, VGA_GFX_MODE, 0x40);
 
@@ -295,13 +288,25 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
wreg_gfx(cirrus, VGA_GFX_MISC, 0x01);
 
wreg_hdr(cirrus, hdr);
+}
 
-   cirrus_set_start_address(cirrus, 0);
+static void cirrus_pitch_set(struct cirrus_device *cirrus,
+struct drm_framebuffer *fb)
+{
+   u8 cr13, cr1b;
 
-   /* Unblank (needed on S3 resume, vgabios doesn't do it then) */
-   outb(0x20, 0x3c0);
+   /* Program the pitch */
+   cirrus->pitch = cirrus_pitch(fb);
+   cr13 = cirrus->pitch / 8;
+   wreg_crt(cirrus, VGA_CRTC_OFFSET, cr13);
 
-   return 0;
+   /* Enable extended blanking and pitch bits, and enable full memory */
+   cr1b = 0x22;
+   cr1b |= (cirrus->pitch >> 7) & 0x10;
+   cr1b |= (cirrus->pitch >> 6) & 0x40;
+   wreg_crt(cirrus, 0x1b, cr1b);
+
+   cirrus_set_start_address(cirrus, 0);
 }
 
 static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
@@ -413,9 +418,14 @@ static void cirrus_pipe_enable(struct 
drm_simple_display_pipe *pipe,
if (!drm_dev_enter(>dev, ))
return;
 
-   cirrus_mode_set(cirrus, _state->mode, plane_state->fb);
+   cirrus_mode_set(cirrus, _state->mode);
+   cirrus_format_set(cirrus, plane_state->fb);
+   cirrus_pitch_set(cirrus, plane_state->fb);
cirrus_fb_blit_fullscreen(plane_state->fb, 
_plane_state->data[0]);
 
+   /* Unblank (needed on S3 resume, vgabios doesn't do it then) */
+   outb(0x20, 0x3c0);
+
drm_dev_exit(idx);
 }
 
@@ -425,15 +435,18 @@ static void cirrus_pipe_update(struct 
drm_simple_display_pipe *pipe,
struct cirrus_device *cirrus = to_cirrus(pipe->crtc.dev);
struct drm_plane_state *state = pipe->plane.state;
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(state);
-   struct drm_crtc *crtc = >crtc;
struct drm_rect rect;
int idx;
 
if (!drm_dev_enter(>dev, ))
return;
 
-   if (state->fb && cirrus->format != cirrus_format(state->fb))
-   cirrus_mode_set(cirrus, >mode, state->fb);
+   if (state->fb) {
+   if (cirrus->format != cirrus_format(state->fb))
+   cirrus_format_set(cirrus, state->fb);
+   if (cirrus->pitch != cirrus_pitch(state->fb))
+   

[PATCH 06/17] drm/cirrus: Integrate connector into pipeline code

2023-02-15 Thread Thomas Zimmermann
Integrate the connector with the rest of the pipeline setup code.
Move some helpers within the file and adapt naming slightly. No
functional changes.

Done in preparation of converting cirrus to regular atomic helpers.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 80 +--
 1 file changed, 38 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 60488e49bdb5..cc1d45ea1f62 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -57,7 +57,7 @@
 struct cirrus_device {
struct drm_device  dev;
struct drm_simple_display_pipe pipe;
-   struct drm_connector   conn;
+   struct drm_connector   connector;
const struct drm_format_info   *format;
unsigned int   pitch;
void __iomem   *vram;
@@ -352,41 +352,7 @@ static int cirrus_check_size(int width, int height,
 }
 
 /* -- */
-/* cirrus connector  */
-
-static int cirrus_conn_get_modes(struct drm_connector *conn)
-{
-   int count;
-
-   count = drm_add_modes_noedid(conn,
-conn->dev->mode_config.max_width,
-conn->dev->mode_config.max_height);
-   drm_set_preferred_mode(conn, 1024, 768);
-   return count;
-}
-
-static const struct drm_connector_helper_funcs cirrus_conn_helper_funcs = {
-   .get_modes = cirrus_conn_get_modes,
-};
-
-static const struct drm_connector_funcs cirrus_conn_funcs = {
-   .fill_modes = drm_helper_probe_single_connector_modes,
-   .destroy = drm_connector_cleanup,
-   .reset = drm_atomic_helper_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
-};
-
-static int cirrus_conn_init(struct cirrus_device *cirrus)
-{
-   drm_connector_helper_add(>conn, _conn_helper_funcs);
-   return drm_connector_init(>dev, >conn,
- _conn_funcs, DRM_MODE_CONNECTOR_VGA);
-
-}
-
-/* -- */
-/* cirrus (simple) display pipe  */
+/* cirrus display pipe   */
 
 static enum drm_mode_status cirrus_pipe_mode_valid(struct 
drm_simple_display_pipe *pipe,
   const struct 
drm_display_mode *mode)
@@ -473,15 +439,49 @@ static const uint64_t cirrus_modifiers[] = {
DRM_FORMAT_MOD_INVALID
 };
 
+static int cirrus_connector_helper_get_modes(struct drm_connector *connector)
+{
+   int count;
+
+   count = drm_add_modes_noedid(connector,
+connector->dev->mode_config.max_width,
+connector->dev->mode_config.max_height);
+   drm_set_preferred_mode(connector, 1024, 768);
+   return count;
+}
+
+static const struct drm_connector_helper_funcs cirrus_connector_helper_funcs = 
{
+   .get_modes = cirrus_connector_helper_get_modes,
+};
+
+static const struct drm_connector_funcs cirrus_connector_funcs = {
+   .fill_modes = drm_helper_probe_single_connector_modes,
+   .destroy = drm_connector_cleanup,
+   .reset = drm_atomic_helper_connector_reset,
+   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
+   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+};
+
 static int cirrus_pipe_init(struct cirrus_device *cirrus)
 {
-   return drm_simple_display_pipe_init(>dev,
+   struct drm_device *dev = >dev;
+   struct drm_connector *connector;
+   int ret;
+
+   connector = >connector;
+   ret = drm_connector_init(>dev, connector, 
_connector_funcs,
+DRM_MODE_CONNECTOR_VGA);
+   if (ret)
+   return ret;
+   drm_connector_helper_add(connector, _connector_helper_funcs);
+
+   return drm_simple_display_pipe_init(dev,
>pipe,
_pipe_funcs,
cirrus_formats,
ARRAY_SIZE(cirrus_formats),
cirrus_modifiers,
-   >conn);
+   connector);
 }
 
 /* -- */
@@ -584,10 +584,6 @@ static int cirrus_pci_probe(struct pci_dev *pdev,
if (ret)
return ret;
 
-   ret = cirrus_conn_init(cirrus);
-   if (ret < 0)
-   return ret;
-
ret = cirrus_pipe_init(cirrus);
if (ret < 0)
   

[RFC PATCH 11/16] drm/ttm: Add a simple api to set / clear purgeable ttm_tt content

2023-02-15 Thread Thomas Hellström
In the absence of free swap space, a shrinker could still efficiently
free memory the content of which is no longer needed, and graphics
drivers typically has an interface to mark buffer object content as
no longer needed.

Add a possibility to propagate this to TTM, so that the shrinker
accounting and shrinker actions can be updated accordingly.

Moving forward, we will probably want this interface on the bo level and
have bo move support for it, but for now we strictly only need it for
the shrinker. Another option would be to have the drivers do the
purgeable vs shrinkable accounting.

This still leaves the responsibility to the driver to assign proper
LRU priority to purgeable buffer object so that the shrinker finds those
objects early during LRU traversal.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 59 
 include/drm/ttm/ttm_tt.h |  3 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index a39c617c7a8e..c63be8f5ed2a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -105,6 +105,65 @@ void ttm_tt_set_unpinned(const struct ttm_device *bdev, 
const struct ttm_tt *tt)
ttm_tt_mod_shrinkable_pages(tt->num_pages, 0);
 }
 
+/**
+ * ttm_tt_set_dontneed() - Mark ttm_tt content as not needed.
+ * @bdev: The ttm device.
+ * @tt: The struct ttm_tt.
+ *
+ * Mark the ttm_tt content as not needed for the shrinker accounting.
+ * This also means that the content will not be backed up on shrinking,
+ * but rather freed immediately.
+ *
+ * Return: 0 if successful, -EALREADY if content was never present or
+ * already backed up and was purged by this call.
+ */
+int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt)
+{
+   if (ttm_tt_is_populated(tt)) {
+   if (!ttm_tt_purgeable(tt)) {
+   tt->page_flags |= TTM_TT_FLAG_DONTNEED;
+   if (ttm_tt_shrinkable(bdev, tt))
+   
ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages,
+   tt->num_pages);
+   }
+   return 0;
+   }
+
+   if (tt->swap_storage)
+   fput(tt->swap_storage);
+   tt->swap_storage = NULL;
+
+   return -EALREADY;
+}
+EXPORT_SYMBOL(ttm_tt_set_dontneed);
+
+/**
+ * ttm_tt_set_willneed() - Mark tt_tt content as needed.
+ * @bdev: The ttm device.
+ * @tt: The struct ttm_tt.
+ *
+ * Mark the ttm_tt content as needed and update the shrinker accounting
+ * accordingly.
+ *
+ * Return: 0 if successful, -EALREADY if content was never present or
+ * was already purged.
+ */
+int ttm_tt_set_willneed(const struct ttm_device *bdev, struct ttm_tt *tt)
+{
+   if (ttm_tt_is_populated(tt)) {
+   if (ttm_tt_purgeable(tt)) {
+   tt->page_flags &= ~TTM_TT_FLAG_DONTNEED;
+   if (ttm_tt_shrinkable(bdev, tt))
+   ttm_tt_mod_shrinkable_pages(tt->num_pages,
+   
-(long)tt->num_pages);
+   }
+   return 0;
+   }
+
+   return -EALREADY;
+}
+EXPORT_SYMBOL(ttm_tt_set_willneed);
+
 /*
  * Allocates a ttm structure for the given BO.
  */
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 69467671c2dd..abb17527f76c 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -241,6 +241,9 @@ static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
 void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
 
 void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt 
*tt);
+int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt);
+
+int ttm_tt_set_willneed(const struct ttm_device *bdev, struct ttm_tt *tt);
 
 #if IS_ENABLED(CONFIG_AGP)
 #include 
-- 
2.34.1



[PATCH 03/17] drm/cirrus: Use drm_fb_blit() to update scanout buffer

2023-02-15 Thread Thomas Zimmermann
Cirrus' blit helper reimplements code from the shared blit helper
drm_fb_blit(). Use the helper instead.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 67e83fa42a32..71fa07535298 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -329,20 +329,7 @@ static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
iosys_map_set_vaddr_iomem(, cirrus->vram);
iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, fb->format, 
rect));
 
-   if (cirrus->format == fb->format) {
-   drm_fb_memcpy(, fb->pitches, vmap, fb, rect);
-
-   } else if (fb->format->format == DRM_FORMAT_XRGB &&
-  cirrus->format->format == DRM_FORMAT_RGB565) {
-   drm_fb_xrgb_to_rgb565(, >pitch, vmap, fb, rect, 
false);
-
-   } else if (fb->format->format == DRM_FORMAT_XRGB &&
-  cirrus->format->format == DRM_FORMAT_RGB565) {
-   drm_fb_xrgb_to_rgb888(, >pitch, vmap, fb, rect);
-
-   } else {
-   WARN_ON_ONCE("cpp mismatch");
-   }
+   drm_fb_blit(, >pitch, cirrus->format->format, vmap, fb, 
rect);
 
drm_dev_exit(idx);
 
-- 
2.39.1



[PATCH 04/17] drm/cirrus: Move drm_dev_{enter, exit}() into DRM helpers

2023-02-15 Thread Thomas Zimmermann
Call drm_dev_enter() and drm_dev_exit() immediately after entering
cirrus' DRM helper functions. Remove these calls from other functions.
Each enter/exit block in the DRM helpers covers the full hardware
update. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 71fa07535298..0b02244bd9f1 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -159,13 +159,9 @@ static int cirrus_pitch(struct drm_framebuffer *fb)
 
 static void cirrus_set_start_address(struct cirrus_device *cirrus, u32 offset)
 {
-   int idx;
u32 addr;
u8 tmp;
 
-   if (!drm_dev_enter(>dev, ))
-   return;
-
addr = offset >> 2;
wreg_crt(cirrus, 0x0c, (u8)((addr >> 8) & 0xff));
wreg_crt(cirrus, 0x0d, (u8)(addr & 0xff));
@@ -180,8 +176,6 @@ static void cirrus_set_start_address(struct cirrus_device 
*cirrus, u32 offset)
tmp &= 0x7f;
tmp |= (addr >> 12) & 0x80;
wreg_crt(cirrus, 0x1d, tmp);
-
-   drm_dev_exit(idx);
 }
 
 static int cirrus_mode_set(struct cirrus_device *cirrus,
@@ -190,12 +184,9 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
 {
int hsyncstart, hsyncend, htotal, hdispend;
int vtotal, vdispend;
-   int tmp, idx;
+   int tmp;
int sr07 = 0, hdr = 0;
 
-   if (!drm_dev_enter(>dev, ))
-   return -1;
-
htotal = mode->htotal / 8;
hsyncend = mode->hsync_end / 8;
hsyncstart = mode->hsync_start / 8;
@@ -281,7 +272,6 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
hdr = 0xc5;
break;
default:
-   drm_dev_exit(idx);
return -1;
}
 
@@ -311,7 +301,6 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
/* Unblank (needed on S3 resume, vgabios doesn't do it then) */
outb(0x20, 0x3c0);
 
-   drm_dev_exit(idx);
return 0;
 }
 
@@ -321,18 +310,12 @@ static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
 {
struct cirrus_device *cirrus = to_cirrus(fb->dev);
struct iosys_map dst;
-   int idx;
-
-   if (!drm_dev_enter(>dev, ))
-   return -ENODEV;
 
iosys_map_set_vaddr_iomem(, cirrus->vram);
iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, fb->format, 
rect));
 
drm_fb_blit(, >pitch, cirrus->format->format, vmap, fb, 
rect);
 
-   drm_dev_exit(idx);
-
return 0;
 }
 
@@ -425,9 +408,15 @@ static void cirrus_pipe_enable(struct 
drm_simple_display_pipe *pipe,
 {
struct cirrus_device *cirrus = to_cirrus(pipe->crtc.dev);
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(plane_state);
+   int idx;
+
+   if (!drm_dev_enter(>dev, ))
+   return;
 
cirrus_mode_set(cirrus, _state->mode, plane_state->fb);
cirrus_fb_blit_fullscreen(plane_state->fb, 
_plane_state->data[0]);
+
+   drm_dev_exit(idx);
 }
 
 static void cirrus_pipe_update(struct drm_simple_display_pipe *pipe,
@@ -438,12 +427,18 @@ static void cirrus_pipe_update(struct 
drm_simple_display_pipe *pipe,
struct drm_shadow_plane_state *shadow_plane_state = 
to_drm_shadow_plane_state(state);
struct drm_crtc *crtc = >crtc;
struct drm_rect rect;
+   int idx;
+
+   if (!drm_dev_enter(>dev, ))
+   return;
 
if (state->fb && cirrus->format != cirrus_format(state->fb))
cirrus_mode_set(cirrus, >mode, state->fb);
 
if (drm_atomic_helper_damage_merged(old_state, state, ))
cirrus_fb_blit_rect(state->fb, _plane_state->data[0], 
);
+
+   drm_dev_exit(idx);
 }
 
 static const struct drm_simple_display_pipe_funcs cirrus_pipe_funcs = {
-- 
2.39.1



[PATCH 02/17] drm/cirrus: Replace cpp value with format

2023-02-15 Thread Thomas Zimmermann
Using components per pixel to describe a color format is obsolete.
Use the format info and 4CC value instead.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 50 ++-
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index 7fb21db8416d..67e83fa42a32 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -58,7 +58,7 @@ struct cirrus_device {
struct drm_device  dev;
struct drm_simple_display_pipe pipe;
struct drm_connector   conn;
-   unsigned int   cpp;
+   const struct drm_format_info   *format;
unsigned int   pitch;
void __iomem   *vram;
void __iomem   *mmio;
@@ -126,34 +126,34 @@ static void wreg_hdr(struct cirrus_device *cirrus, u8 val)
iowrite8(val, cirrus->mmio + VGA_DAC_MASK);
 }
 
-static int cirrus_convert_to(struct drm_framebuffer *fb)
+static const struct drm_format_info *cirrus_convert_to(struct drm_framebuffer 
*fb)
 {
-   if (fb->format->cpp[0] == 4 && fb->pitches[0] > CIRRUS_MAX_PITCH) {
+   if (fb->format->format == DRM_FORMAT_XRGB && fb->pitches[0] > 
CIRRUS_MAX_PITCH) {
if (fb->width * 3 <= CIRRUS_MAX_PITCH)
/* convert from XR24 to RG24 */
-   return 3;
+   return drm_format_info(DRM_FORMAT_RGB888);
else
/* convert from XR24 to RG16 */
-   return 2;
+   return drm_format_info(DRM_FORMAT_RGB565);
}
-   return 0;
+   return NULL;
 }
 
-static int cirrus_cpp(struct drm_framebuffer *fb)
+static const struct drm_format_info *cirrus_format(struct drm_framebuffer *fb)
 {
-   int convert_cpp = cirrus_convert_to(fb);
+   const struct drm_format_info *format = cirrus_convert_to(fb);
 
-   if (convert_cpp)
-   return convert_cpp;
-   return fb->format->cpp[0];
+   if (format)
+   return format;
+   return fb->format;
 }
 
 static int cirrus_pitch(struct drm_framebuffer *fb)
 {
-   int convert_cpp = cirrus_convert_to(fb);
+   const struct drm_format_info *format = cirrus_convert_to(fb);
 
-   if (convert_cpp)
-   return convert_cpp * fb->width;
+   if (format)
+   return drm_format_info_min_pitch(format, 0, fb->width);
return fb->pitches[0];
 }
 
@@ -263,20 +263,20 @@ static int cirrus_mode_set(struct cirrus_device *cirrus,
sr07 &= 0xe0;
hdr = 0;
 
-   cirrus->cpp = cirrus_cpp(fb);
-   switch (cirrus->cpp * 8) {
-   case 8:
+   cirrus->format = cirrus_format(fb);
+   switch (cirrus->format->format) {
+   case DRM_FORMAT_C8:
sr07 |= 0x11;
break;
-   case 16:
+   case DRM_FORMAT_RGB565:
sr07 |= 0x17;
hdr = 0xc1;
break;
-   case 24:
+   case DRM_FORMAT_RGB888:
sr07 |= 0x15;
hdr = 0xc5;
break;
-   case 32:
+   case DRM_FORMAT_XRGB:
sr07 |= 0x19;
hdr = 0xc5;
break;
@@ -329,13 +329,15 @@ static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
iosys_map_set_vaddr_iomem(, cirrus->vram);
iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, fb->format, 
rect));
 
-   if (cirrus->cpp == fb->format->cpp[0]) {
+   if (cirrus->format == fb->format) {
drm_fb_memcpy(, fb->pitches, vmap, fb, rect);
 
-   } else if (fb->format->cpp[0] == 4 && cirrus->cpp == 2) {
+   } else if (fb->format->format == DRM_FORMAT_XRGB &&
+  cirrus->format->format == DRM_FORMAT_RGB565) {
drm_fb_xrgb_to_rgb565(, >pitch, vmap, fb, rect, 
false);
 
-   } else if (fb->format->cpp[0] == 4 && cirrus->cpp == 3) {
+   } else if (fb->format->format == DRM_FORMAT_XRGB &&
+  cirrus->format->format == DRM_FORMAT_RGB565) {
drm_fb_xrgb_to_rgb888(, >pitch, vmap, fb, rect);
 
} else {
@@ -450,7 +452,7 @@ static void cirrus_pipe_update(struct 
drm_simple_display_pipe *pipe,
struct drm_crtc *crtc = >crtc;
struct drm_rect rect;
 
-   if (state->fb && cirrus->cpp != cirrus_cpp(state->fb))
+   if (state->fb && cirrus->format != cirrus_format(state->fb))
cirrus_mode_set(cirrus, >mode, state->fb);
 
if (drm_atomic_helper_damage_merged(old_state, state, ))
-- 
2.39.1



[PATCH 01/17] drm/cirrus: Compute blit destination offset in single location

2023-02-15 Thread Thomas Zimmermann
The calculation for the scanout-buffer blit offset is independent
from the color format. In the one case where the current code uses
fb->pitches[0] instead of cirrus->pitch, their values are identical.
Hence merge all into a single line.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/tiny/cirrus.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c
index cf35b6090503..7fb21db8416d 100644
--- a/drivers/gpu/drm/tiny/cirrus.c
+++ b/drivers/gpu/drm/tiny/cirrus.c
@@ -327,17 +327,15 @@ static int cirrus_fb_blit_rect(struct drm_framebuffer *fb,
return -ENODEV;
 
iosys_map_set_vaddr_iomem(, cirrus->vram);
+   iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, fb->format, 
rect));
 
if (cirrus->cpp == fb->format->cpp[0]) {
-   iosys_map_incr(, drm_fb_clip_offset(fb->pitches[0], 
fb->format, rect));
drm_fb_memcpy(, fb->pitches, vmap, fb, rect);
 
} else if (fb->format->cpp[0] == 4 && cirrus->cpp == 2) {
-   iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, 
fb->format, rect));
drm_fb_xrgb_to_rgb565(, >pitch, vmap, fb, rect, 
false);
 
} else if (fb->format->cpp[0] == 4 && cirrus->cpp == 3) {
-   iosys_map_incr(, drm_fb_clip_offset(cirrus->pitch, 
fb->format, rect));
drm_fb_xrgb_to_rgb888(, >pitch, vmap, fb, rect);
 
} else {
-- 
2.39.1



[PATCH 00/17] cirrus: Modernize the cirrus driver

2023-02-15 Thread Thomas Zimmermann
Update the cirrus driver to follow current best practices. While the
driver's hardware is obsolete, the cirrus driver is still one of the
go-to modules to learn about writing a DRM driver. So keep it in good
shape.

Patches 1 to 3 simplify blitting and convert it to the DRM's current
helpers.

Patches 4 to 8 replace simple-KMS helpers with DRM's regular atomic
helpers. The former are midlayers on top of the latter, and should
be replaced entirely.

Patches 9 and 10 further improve blitting. This enables damage clipping
for userspace and the console. Until now, cirrus' mandatory fullscreen
updates have added unnecessary overhead to every screen update.

Patches 11 to 14 simplify mode and framebuffer tests. With the use
of regular helpers, these tests can now be implemented in the places
they belong.

Patches 15 and 16 move hardware color format and pitch into plane
state of the primary plane. These fields have been kept in the device
structure itself, where thy don't belong.

Patch 17 replaces two magic values by macro constants. There are
more such cases within cirrus, but those two values stuck out as
specifically hard to interpret.

Tested with qemu's cirrus emulation.

Thomas Zimmermann (17):
  drm/cirrus: Compute blit destination offset in single location
  drm/cirrus: Replace cpp value with format
  drm/cirrus: Use drm_fb_blit() to update scanout buffer
  drm/cirrus: Move drm_dev_{enter,exit}() into DRM helpers
  drm/cirrus: Split cirrus_mode_set() into smaller functions
  drm/cirrus: Integrate connector into pipeline code
  drm/cirrus: Move primary-plane format arrays
  drm/cirrus: Convert to regular atomic helpers
  drm/cirrus: Enable damage clipping on primary plane
  drm/cirrus: Inline cirrus_fb_blit_rect()
  drm/cirrus: Remove format test from cirrus_fb_create()
  drm/cirrus: Remove size test from cirrus_fb_create()
  drm/cirrus: Test mode against video-memory size in device-wide
mode_valid
  drm/cirrus: Inline cirrus_check_size() into primary-plane atomic_check
  drm/cirrus: Introduce struct cirrus_primary_plane_state
  drm/cirrus: Store HW format/pitch in primary-plane state
  drm/cirrus: Use VGA macro constants to unblank

 drivers/gpu/drm/tiny/cirrus.c | 499 +-
 1 file changed, 305 insertions(+), 194 deletions(-)

-- 
2.39.1



[RFC PATCH 10/16] drm/ttm: Remove pinned bos from shrinkable accounting

2023-02-15 Thread Thomas Hellström
Pinned bos aren't shinkable and needs to be removed from the shrinkable
accounting. Do that, and in the process constify the tt argument to
ttm_tt_is_populated.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo.c |  7 +++
 drivers/gpu/drm/ttm/ttm_tt.c | 22 ++
 include/drm/ttm/ttm_tt.h |  6 +-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e5c0970564c0..e59e2a4605d0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -650,6 +650,10 @@ void ttm_bo_pin(struct ttm_buffer_object *bo)
 {
dma_resv_assert_held(bo->base.resv);
WARN_ON_ONCE(!kref_read(>kref));
+
+   if (!bo->pin_count && bo->ttm)
+   ttm_tt_set_pinned(bo->bdev, bo->ttm);
+
spin_lock(>bdev->lru_lock);
if (bo->resource)
ttm_resource_del_bulk_move(bo->resource, bo);
@@ -671,6 +675,9 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo)
if (WARN_ON_ONCE(!bo->pin_count))
return;
 
+   if (bo->pin_count == 1 && bo->ttm)
+   ttm_tt_set_unpinned(bo->bdev, bo->ttm);
+
spin_lock(>bdev->lru_lock);
--bo->pin_count;
if (bo->resource)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 848adf2a623e..a39c617c7a8e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -83,6 +83,28 @@ static void ttm_tt_mod_shrinkable_pages(long shrinkable, 
long purgeable)
write_unlock(_lock);
 }
 
+/**
+ * ttm_tt_set_pinned() - Modify the shinkable accounting when pinning a bo.
+ * @bdev: The TTM device.
+ * @tt: The struct tt_tt used by the pinned bo.
+ */
+void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
+{
+   if (ttm_tt_shrinkable(bdev, tt) && ttm_tt_is_populated(tt))
+   ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages, 0);
+}
+
+/**
+ * ttm_tt_set_unpinned() - Modify the shinkable accounting when unpinning a bo.
+ * @bdev: The TTM device.
+ * @tt: The struct tt_tt used by the no longer pinned bo.
+ */
+void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt 
*tt)
+{
+   if (ttm_tt_shrinkable(bdev, tt) && ttm_tt_is_populated(tt))
+   ttm_tt_mod_shrinkable_pages(tt->num_pages, 0);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 3f99787e2b93..69467671c2dd 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -118,7 +118,7 @@ struct ttm_kmap_iter_tt {
pgprot_t prot;
 };
 
-static inline bool ttm_tt_is_populated(struct ttm_tt *tt)
+static inline bool ttm_tt_is_populated(const struct ttm_tt *tt)
 {
return tt->page_flags & TTM_TT_FLAG_PRIV_POPULATED;
 }
@@ -238,6 +238,10 @@ static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
return tt->page_flags & TTM_TT_FLAG_DONTNEED;
 }
 
+void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+
+void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt 
*tt);
+
 #if IS_ENABLED(CONFIG_AGP)
 #include 
 
-- 
2.34.1



[RFC PATCH 09/16] drm/ttm: Introduce shrink throttling.

2023-02-15 Thread Thomas Hellström
Since pages are not immediately freed by the TTM shrinker but rather
inserted into the swap cache, the system will keep on calling the
shrinker rapidly filling the swap cache which has a negative impact
on system performance.

When shrinking, throttle on the number of pages present in the swap
cache.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5a57117c21ec..848adf2a623e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -432,6 +432,42 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker 
*shrink,
return num_pages ? num_pages : SHRINK_EMPTY;
 }
 
+#define TTM_SWAP_MIN_SWAP_PAGES (SZ_128M >> PAGE_SHIFT)
+#define TTM_SWAP_MAX_SWAPCACHE_PAGES (SZ_1G >> PAGE_SHIFT)
+static unsigned long ttm_tt_shrinker_throttle(unsigned long pages)
+{
+   unsigned long
+   tmp = get_nr_swap_pages();
+
+   /*
+* Draining available swap space too far will trigger
+* systemd-oomd even if there are a huge number of dirty pages
+* available for laundry and free in the swap cache. Don't drain
+* the available swap-space too far.
+*/
+   if (tmp > TTM_SWAP_MIN_SWAP_PAGES)
+   tmp -= TTM_SWAP_MIN_SWAP_PAGES;
+   else
+   tmp = 0;
+
+   pages = min(tmp, pages);
+
+   /*
+* Our shrinker doesn't immediately free pages unless they belong
+* to purgeable objects. Rather they are inserted into the swap-cache.
+* But the system doesn't really get this and continues to call our
+* shrinker thinking it's still out of memory, when it could just
+* laundry pages in the swap cache and free them. So throttle on the
+* number of pages in the swap cache.
+*/
+
+   tmp = total_swapcache_pages();
+   if (tmp > TTM_SWAP_MAX_SWAPCACHE_PAGES)
+   pages = 0;
+
+   return pages;
+}
+
 static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
  struct shrink_control *sc)
 {
@@ -459,6 +495,10 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker 
*shrink,
nr_to_scan -= freed;
else
nr_to_scan = 0;
+
+   if (nr_to_scan)
+   nr_to_scan = ttm_tt_shrinker_throttle(nr_to_scan);
+
if (!nr_to_scan)
return freed ? freed : SHRINK_STOP;
 
-- 
2.34.1



[RFC PATCH 08/16] drm/ttm: Add a shrinker and shrinker accounting

2023-02-15 Thread Thomas Hellström
Register a TTM system memory-backed object shrinker and add
accounting for shrinkable and purgeable pages. For the shrinker to work,
the driver needs to register the bo_shrink callback which is responsible
for unbinding from GPU and the dma layer if needed. Helpers for that
callback to actually perform shrinking will be introduced in upcoming
patches.

Note that we can't lock the ttm_global_mutex from within the shrinker
scan() function as that might cause a deadlock issue. To fix that, add and
use a mutex which is used for global device list manipulation only and
make sure it isn't held when registering the shrinker.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_device.c |  26 ---
 drivers/gpu/drm/ttm/ttm_tt.c | 112 +--
 include/drm/ttm/ttm_tt.h |   2 +
 3 files changed, 125 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e0a2be3ed13d..ce98752d2d32 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -36,10 +36,10 @@
 
 #include "ttm_module.h"
 
-/*
- * ttm_global_mutex - protecting the global state
- */
+/* ttm_global_mutex - protects the global state init and fini. */
 static DEFINE_MUTEX(ttm_global_mutex);
+/* ttm_global_list_mutex - protects the device list. */
+static DEFINE_MUTEX(ttm_global_list_mutex);
 static unsigned ttm_glob_use_count;
 struct ttm_global ttm_glob;
 EXPORT_SYMBOL(ttm_glob);
@@ -54,6 +54,7 @@ static void ttm_global_release(void)
if (--ttm_glob_use_count > 0)
goto out;
 
+   ttm_tt_mgr_fini();
ttm_pool_mgr_fini();
debugfs_remove(ttm_debugfs_root);
 
@@ -102,7 +103,10 @@ static int ttm_global_init(void)
goto out;
}
 
+   mutex_lock(_global_list_mutex);
INIT_LIST_HEAD(>device_list);
+   mutex_unlock(_global_list_mutex);
+
atomic_set(>bo_count, 0);
 
debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
@@ -135,7 +139,7 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
struct ttm_device *bdev;
long ret = 0;
 
-   mutex_lock(_global_mutex);
+   mutex_lock(_global_list_mutex);
list_for_each_entry(bdev, >device_list, device_list) {
ret = ttm_device_swapout(bdev, ctx, reason);
if (ret > 0) {
@@ -143,7 +147,7 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
break;
}
}
-   mutex_unlock(_global_mutex);
+   mutex_unlock(_global_list_mutex);
return ret;
 }
 
@@ -247,9 +251,9 @@ int ttm_device_init(struct ttm_device *bdev, struct 
ttm_device_funcs *funcs,
spin_lock_init(>lru_lock);
INIT_LIST_HEAD(>pinned);
bdev->dev_mapping = mapping;
-   mutex_lock(_global_mutex);
+   mutex_lock(_global_list_mutex);
list_add_tail(>device_list, >device_list);
-   mutex_unlock(_global_mutex);
+   mutex_unlock(_global_list_mutex);
 
return 0;
 }
@@ -260,14 +264,14 @@ void ttm_device_fini(struct ttm_device *bdev)
struct ttm_resource_manager *man;
unsigned i;
 
+   mutex_lock(_global_list_mutex);
+   list_del(>device_list);
+   mutex_unlock(_global_list_mutex);
+
man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
ttm_resource_manager_set_used(man, false);
ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
 
-   mutex_lock(_global_mutex);
-   list_del(>device_list);
-   mutex_unlock(_global_mutex);
-
drain_workqueue(bdev->wq);
destroy_workqueue(bdev->wq);
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 771e5f3c2fee..5a57117c21ec 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "ttm_module.h"
@@ -54,6 +55,11 @@ module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, 
ulong, 0644);
 static atomic_long_t ttm_pages_allocated;
 static atomic_long_t ttm_dma32_pages_allocated;
 
+static long shrinkable_pages;
+static long purgeable_pages;
+static DEFINE_RWLOCK(shrinkable_lock);
+static struct shrinker mm_shrinker;
+
 static bool ttm_tt_shrinkable(const struct ttm_device *bdev,
  const struct ttm_tt *tt)
 {
@@ -69,6 +75,14 @@ static void ttm_tt_mod_allocated(bool dma32, long value)
atomic_long_add(value, _dma32_pages_allocated);
 }
 
+static void ttm_tt_mod_shrinkable_pages(long shrinkable, long purgeable)
+{
+   write_lock(_lock);
+   shrinkable_pages += shrinkable;
+   purgeable_pages += purgeable;
+   write_unlock(_lock);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
@@ -352,6 +366,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
}
}
 
+   if (ttm_tt_shrinkable(bdev, ttm))
+   

[RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages

2023-02-15 Thread Thomas Hellström
When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can be nicer
to the system and avoid splitting gigantic pages. On top of this we also
include the 64K page size in the page sizes tried, since that appears to
be a common size for GPU applications.

Looking forward to when we might be able to swap out PMD size folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 58 ++
 1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
  * cause they are rather slow compared to alloc_pages+map.
  */
 
+#define pr_fmt(fmt) "[TTM POOL] " fmt
+
 #include 
 #include 
 #include 
@@ -47,6 +49,18 @@
 
 #include "ttm_module.h"
 
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
 /**
  * struct ttm_pool_dma - Helper object for coherent DMA mappings
  *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
+static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct 
ttm_tt *tt,
}
 }
 
+static unsigned int ttm_pool_select_order(unsigned int order, pgoff_t 
num_pages)
+{
+   unsigned int *cur_order = ttm_pool_orders;
+
+   order = min_t(unsigned int, __fls(num_pages), order);
+   while (order < *cur_order)
+   ++cur_order;
+
+   return *cur_order;
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
else
gfp_flags |= GFP_HIGHUSER;
 
-   for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
-num_pages;
-order = min_t(unsigned int, order, __fls(num_pages))) {
+   order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
+   for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
struct ttm_pool_type *pt;
 
page_caching = tt->caching;
@@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device 
*dev,
 
if (use_dma_alloc) {
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-   for (j = 0; j < MAX_ORDER; ++j)
+   for (j = 0; j < TTM_DIM_ORDER; ++j)
ttm_pool_type_init(>caching[i].orders[j],
   pool, i, j);
}
@@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
if (pool->use_dma_alloc) {
for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-   for (j = 0; j < MAX_ORDER; ++j)
+   for (j = 0; j < TTM_DIM_ORDER; ++j)
ttm_pool_type_fini(>caching[i].orders[j]);
}
 
@@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
unsigned int i;
 
seq_puts(m, "\t ");
-   for (i = 0; i < MAX_ORDER; ++i)
+   for (i = 0; i < TTM_DIM_ORDER; ++i)
seq_printf(m, " ---%2u---", i);
seq_puts(m, "\n");
 }
@@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type 
*pt,
 {
unsigned int i;
 
-   for (i = 0; i < MAX_ORDER; ++i)
+   for (i = 0; i < TTM_DIM_ORDER; ++i)
seq_printf(m, " %8u", ttm_pool_type_count([i]));
seq_puts(m, "\n");
 }
@@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned 

[RFC PATCH 06/16] drm/ttm: Don't use watermark accounting on shrinkable pools

2023-02-15 Thread Thomas Hellström
Clarify the meaning of the ttm_tt pages_limit watermarks as the max
number of pages not accessible by shrinkers, and update accordingly so that
memory allocated by TTM devices that support shrinking is not
accounted against those limits. In particular this means that devices
using the dma_alloc pool will still be using the watermark method.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
 drivers/gpu/drm/ttm/ttm_tt.c | 43 +++-
 include/drm/ttm/ttm_pool.h   | 15 +++
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index a3cac42bb456..e0a2be3ed13d 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -168,7 +168,8 @@ long ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
unsigned i;
long ret;
 
-   if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs->bo_shrink)
+   if (reason != TTM_SHRINK_WATERMARK &&
+   (!bdev->funcs->bo_shrink || !ttm_pool_can_shrink(>pool)))
return 0;
 
spin_lock(>lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index a68c14de0161..771e5f3c2fee 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -54,6 +54,21 @@ module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, 
ulong, 0644);
 static atomic_long_t ttm_pages_allocated;
 static atomic_long_t ttm_dma32_pages_allocated;
 
+static bool ttm_tt_shrinkable(const struct ttm_device *bdev,
+ const struct ttm_tt *tt)
+{
+   return !!bdev->funcs->bo_shrink &&
+   ttm_pool_can_shrink(>pool) &&
+   !(tt->page_flags & TTM_TT_FLAG_EXTERNAL);
+}
+
+static void ttm_tt_mod_allocated(bool dma32, long value)
+{
+   atomic_long_add(value, _pages_allocated);
+   if (dma32)
+   atomic_long_add(value, _dma32_pages_allocated);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
@@ -304,12 +319,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
if (ttm_tt_is_populated(ttm))
return 0;
 
-   if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-   atomic_long_add(ttm->num_pages, _pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_add(ttm->num_pages,
-   _dma32_pages_allocated);
-   }
+   if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+   !ttm_tt_shrinkable(bdev, ttm))
+   ttm_tt_mod_allocated(bdev->pool.use_dma32, ttm->num_pages);
 
while (atomic_long_read(_pages_allocated) > ttm_pages_limit ||
   atomic_long_read(_dma32_pages_allocated) >
@@ -343,12 +355,10 @@ int ttm_tt_populate(struct ttm_device *bdev,
return 0;
 
 error:
-   if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-   atomic_long_sub(ttm->num_pages, _pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_sub(ttm->num_pages,
-   _dma32_pages_allocated);
-   }
+   if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+   !ttm_tt_shrinkable(bdev, ttm))
+   ttm_tt_mod_allocated(bdev->pool.use_dma32, 
-(long)ttm->num_pages);
+
return ret;
 }
 EXPORT_SYMBOL(ttm_tt_populate);
@@ -363,12 +373,9 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct 
ttm_tt *ttm)
else
ttm_pool_free(>pool, ttm);
 
-   if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-   atomic_long_sub(ttm->num_pages, _pages_allocated);
-   if (bdev->pool.use_dma32)
-   atomic_long_sub(ttm->num_pages,
-   _dma32_pages_allocated);
-   }
+   if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+   !ttm_tt_shrinkable(bdev, ttm))
+   ttm_tt_mod_allocated(bdev->pool.use_dma32, 
-(long)ttm->num_pages);
 
ttm->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED;
 }
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index ef09b23d29e3..c1200552892e 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -89,4 +89,19 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file 
*m);
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
+/**
+ * ttm_pool_can_shrink - Whether page allocations from this pool are shrinkable
+ * @pool: The pool.
+ *
+ * Return: true if shrinkable, false if not.
+ */
+static inline bool ttm_pool_can_shrink(const struct ttm_pool *pool)
+{
+   /*
+* The dma_alloc pool pages can't be inserted into the
+* swap cache. Nor can they be split.
+*/
+   return !pool->use_dma_alloc;
+}
+
 #endif
-- 
2.34.1



[RFC PATCH 05/16] drm/ttm: Unexport ttm_global_swapout()

2023-02-15 Thread Thomas Hellström
Unexport ttm_global_swapout() since it is not used outside of TTM.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 7eadea07027f..a3cac42bb456 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -146,7 +146,6 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
mutex_unlock(_global_mutex);
return ret;
 }
-EXPORT_SYMBOL(ttm_global_swapout);
 
 /**
  * ttm_device_swapout() - Select and swap out a system-memory-backed bo.
-- 
2.34.1



[RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface

2023-02-15 Thread Thomas Hellström
Update the TTM swapout interfaces for better compatibility with a shrinker.
- Replace number-of-pages int return with a long to better match the
  kernel's shrinker interface.
- The gfp_flags parameter to ttm_xx_swapout() currently only takes the
  GFP_KERNEL value and shouldn't really be needed since the shrinker we
  hook up in upcoming patches sets a allocation context to match reclaim.
- Introduce a shrink reason enumeration and a driver callback to shrink
  buffer objects.
  The TTM_SHRINK_WATERMARK reason is going to still be handled using the
  existing shmem copy, and will be used by pool types that don't lend
  themselves well to shinking (dma_alloc pool) and when drivers explicitly
  requests swapout.
  The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from a
  shrinker and is to be handled by a new driver callback, bo_shrink().
  Helpers for the new driver callback are provided in upcoming patches.

Cc: linux-graphics-maintai...@vmware.com
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo.c| 38 
 drivers/gpu/drm/ttm/ttm_device.c| 55 +
 drivers/gpu/drm/ttm/ttm_tt.c| 23 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
 include/drm/ttm/ttm_bo.h|  4 +--
 include/drm/ttm/ttm_device.h| 36 +--
 include/drm/ttm/ttm_tt.h| 17 +++--
 7 files changed, 136 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 882c2fa346f3..e5c0970564c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, 
struct ttm_operation_ctx *ctx)
 }
 EXPORT_SYMBOL(ttm_bo_wait_ctx);
 
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
-  gfp_t gfp_flags)
+/**
+ * ttm_bo_swapout() - Swap out or purge a buffer object
+ * @bo: The buffer object.
+ * @ctx: The ttm operation context.
+ * @reason: The swapout reason.
+ *
+ * Try to swap out or purge the contents of a system memory backed buffer
+ * object. The function needs to be called with the device's LRU lock held.
+ *
+ * Return: -EBUSY if the bo lock could not be grabbed or the object was
+ * otherwise busy. Otherwise the number of pages swapped out or negative
+ * error code on error. Iff the function didn't return -EBUSY, the
+ * LRU lock was dropped, and LRU traversal needs to restart.
+ */
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx 
*ctx,
+   enum ttm_shrink_reason reason)
 {
struct ttm_place place;
bool locked;
long ret;
 
+   lockdep_assert_held(>bdev->lru_lock);
+
/*
 * While the bo may already reside in SYSTEM placement, set
 * SYSTEM as new placement to cover also the move further below.
@@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
}
 
if (bo->deleted) {
+   long num_pages = bo->ttm->num_pages;
+
ret = ttm_bo_cleanup_refs(bo, false, false, locked);
ttm_bo_put(bo);
+   if (!ret)
+   return num_pages;
return ret == -EBUSY ? -ENOSPC : ret;
}
 
@@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct 
ttm_operation_ctx *ctx,
 * Swap out. Buffer will be swapped in again as soon as
 * anyone tries to access a ttm page.
 */
-   if (bo->bdev->funcs->swap_notify)
-   bo->bdev->funcs->swap_notify(bo);
+   if (bo->bdev->funcs->bo_shrink && reason != TTM_SHRINK_WATERMARK) {
+   ret = bo->bdev->funcs->bo_shrink(bo, ctx);
+   } else {
+   if (bo->bdev->funcs->swap_notify)
+   bo->bdev->funcs->swap_notify(bo);
+   ret = ttm_tt_swapout(bo->bdev, bo->ttm);
+   if (!ret)
+   ret = bo->ttm->num_pages;
+   }
 
-   if (ttm_tt_is_populated(bo->ttm))
-   ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
 out:
-
/*
 * Unreserve without putting on LRU to avoid swapping out an
 * already swapped buffer.
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index ae2f19dc9f81..7eadea07027f 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -116,19 +116,28 @@ static int ttm_global_init(void)
return ret;
 }
 
-/*
- * A buffer object shrink method that tries to swap out the first
- * buffer object on the global::swap_lru list.
+/**
+ * ttm_global_swapout() - Select and swap out a system-memory-backed bo.
+ * @ctx: The operation context.
+ * @reason: The reason for swapout.
+ *
+ * Select, based on round-robin a TTM device and traverse the LRUs of
+ * that specific device until a suitable bo backed by system memory 

[RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs

2023-02-15 Thread Thomas Hellström
New code is recommended to use the BIT macro instead of the explicit
shifts. Change the older defines so that we can keep the style consistent
with upcoming changes.

Signed-off-by: Thomas Hellström 
---
 include/drm/ttm/ttm_tt.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index b7d3f3843f1e..cc54be1912e1 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -83,12 +83,12 @@ struct ttm_tt {
 * set by TTM after ttm_tt_populate() has successfully returned, and is
 * then unset when TTM calls ttm_tt_unpopulate().
 */
-#define TTM_TT_FLAG_SWAPPED(1 << 0)
-#define TTM_TT_FLAG_ZERO_ALLOC (1 << 1)
-#define TTM_TT_FLAG_EXTERNAL   (1 << 2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE  (1 << 3)
+#define TTM_TT_FLAG_SWAPPEDBIT(0)
+#define TTM_TT_FLAG_ZERO_ALLOC BIT(1)
+#define TTM_TT_FLAG_EXTERNAL   BIT(2)
+#define TTM_TT_FLAG_EXTERNAL_MAPPABLE  BIT(3)
 
-#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
+#define TTM_TT_FLAG_PRIV_POPULATED BIT(31)
uint32_t page_flags;
/** @num_pages: Number of pages in the page array. */
uint32_t num_pages;
-- 
2.34.1



[RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path

2023-02-15 Thread Thomas Hellström
When hitting an error, the error path forgot to unmap dma mappings and
could call set_pages_wb() on already uncached pages.

Fix this by introducing a common __ttm_pool_free() function that
does the right thing.

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König 
Cc: Dave Airlie 
Cc: Madhav Chauhan 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 74 +-
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..1cc7591a9542 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, 
unsigned int order,
return 0;
 }
 
+static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
+   struct page **caching_divide,
+   enum ttm_caching initial_caching,
+   enum ttm_caching subseq_caching,
+   pgoff_t num_pages)
+{
+   enum ttm_caching caching = subseq_caching;
+   struct page **pages = tt->pages;
+   unsigned int order;
+   pgoff_t i, nr;
+
+   if (pool && caching_divide)
+   caching = initial_caching;
+
+   for (i = 0; i < num_pages; i += nr, pages += nr) {
+   struct ttm_pool_type *pt = NULL;
+
+   if (unlikely(caching_divide == pages))
+   caching = subseq_caching;
+
+   order = ttm_pool_page_order(pool, *pages);
+   nr = (1UL << order);
+   if (tt->dma_address)
+   ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+   pt = ttm_pool_select_type(pool, caching, order);
+   if (pt)
+   ttm_pool_type_give(pt, *pages);
+   else
+   ttm_pool_free_page(pool, caching, order, *pages);
+   }
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
dma_addr_t *dma_addr = tt->dma_address;
struct page **caching = tt->pages;
struct page **pages = tt->pages;
+   enum ttm_caching page_caching;
gfp_t gfp_flags = GFP_USER;
-   unsigned int i, order;
+   unsigned int order;
struct page *p;
int r;
 
@@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 order = min_t(unsigned int, order, __fls(num_pages))) {
struct ttm_pool_type *pt;
 
+   page_caching = tt->caching;
pt = ttm_pool_select_type(pool, tt->caching, order);
p = pt ? ttm_pool_type_take(pt) : NULL;
if (p) {
@@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
if (r)
goto error_free_page;
 
+   caching = pages;
do {
r = ttm_pool_page_allocated(pool, order, p,
_addr,
@@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt 
*tt,
if (r)
goto error_free_page;
 
+   caching = pages;
if (num_pages < (1 << order))
break;
 
p = ttm_pool_type_take(pt);
} while (p);
-   caching = pages;
}
 
+   page_caching = ttm_cached;
while (num_pages >= (1 << order) &&
   (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
 
@@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
   tt->caching);
if (r)
goto error_free_page;
+   caching = pages;
}
r = ttm_pool_page_allocated(pool, order, p, _addr,
_pages, );
@@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt 
*tt,
return 0;
 
 error_free_page:
-   ttm_pool_free_page(pool, tt->caching, order, p);
+   ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
num_pages = tt->num_pages - num_pages;
-   for (i = 0; i < num_pages; ) {
-   order = ttm_pool_page_order(pool, tt->pages[i]);
-   ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
-   i += 1 << order;
-   }
+   

[RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference

2023-02-15 Thread Thomas Hellström
The LRU mechanism may look up a resource in the process of being removed
from an object. The locking rules here are a bit unclear but it looks
currently like res->bo assignment is protected by the LRU lock, whereas
bo->resource is protected by the object lock, while *clearing* of
bo->resource is also protected by the LRU lock. This means that if
we check that bo->resource points to the LRU resource under the LRU
lock we should be safe.
So perform that check before deciding to swap out a bo. That avoids
dereferencing a NULL bo->resource in ttm_bo_swapout().

Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Philip Yang 
Cc: Qiang Yu 
Cc: Matthew Auld 
Cc: Nirmoy Das 
Cc: Tvrtko Ursulin 
Cc: "Thomas Hellström" 
Cc: Anshuman Gupta 
Cc: Ramalingam C 
Cc: Arunpravin Paneer Selvam 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index c7a1862f322a..ae2f19dc9f81 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct 
ttm_operation_ctx *ctx,
struct ttm_buffer_object *bo = res->bo;
uint32_t num_pages;
 
-   if (!bo)
+   if (!bo || bo->resource != res)
continue;
 
num_pages = PFN_UP(bo->base.size);
-- 
2.34.1



[RFC PATCH 00/16] Add a TTM shrinker

2023-02-15 Thread Thomas Hellström
This series introduces a TTM shrinker.

Currently the TTM subsystem allows a certain watermark fraction of
system memory to be pinned by GPUs. Any allocation beyond that will
cause TTM to attempt to copy memory to shmem objects for possible
later swapout so that that fraction is fulfilled. That unnecessarily
happens also on systems where swapping is not available, but still
works reasonably well in many cases.

However there is no way for the system to swap out all of graphics
memory even in situatons where graphics processes are suspended.

So add a TTM shrinker capable of moving graphics memory pages to the
swap cache for later laundring and free, and, in the case there is no
swap available, freeing graphics memory that is kept around for
caching purposes.

For devices where the shrinker is active, the watermark
fraction is disabled, but for devices not (yet) supporting shrinking
or using dma_alloced memory which we can't insert into the swap-cache,
keep it around.

Each driver needs to implement a callback to enable the shrinker for
its devices. Enable it for i915 as a POC. Will also be used by the
new Intel xe driver if accepted.

The parts of the series mostly needing consideration and feecback is

*) The mm part, inserting pages into the swap-cache. Is it acceptable and,
   if so, correct? It *might* be possible we can do without this part,
   but then we'd have to be able to call read_mapping_page() and
   trylock_page() on non-isolated shmem pages from reclaim context,
   and need to be able to recover from failures.

*) The TTM driver callback for shrinking

*) The additional TTM functions to mark buffer-objects as not needed, but
   good to have around for caching purposes.

*) Swapin doesn't lose content on error and is also interruptible or at
   least killable ATM. This complicates helpers. Should we
   drop this and just drop content on error, and wait for swapin
   uninterruptible? The TTM pool code could indeed do without additional
   complication...

*) Is there a better way to do shrink throttling to avoid filling the
   swap-cache completely.

*) Is it good enough for real-world workloads?

The series has been tested using the i915 driver with a 4GiB
VRAM DG1 on a system with 14GiB system memory and 16GiB SSD Swap, and using
an old igt-gpu-tools version, 8c0bb07b7b4d, of gem_lmem_swapping
which overcommits system memory quite extensively

Patch walkthrough:

Initial bugfixes, could be decoupled from the series.
drm/ttm: Fix a NULL pointer dereference.
drm/ttm/pool: Fix ttm_pool_alloc error path.

Cleanups and restructuring:
drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
drm/ttm, drm/vmwgfx: Update the TTM swapout interface
drm/ttm: Unexport ttm_global_swapout()

Adding shrinker without enabling it:
drm/ttm: Don't use watermark accounting on shrinkable pools
drm/ttm: Reduce the number of used allocation orders for TTM pages
drm/ttm: Add a shrinker and shrinker accounting
drm/ttm: Introduce shrink throttling
drm/ttm: Remove pinned bos from shrinkable accounting
drm/ttm: Add a simple api to set/ clear purgeable ttm_tt content

Adding the core mm part to insert and read-back pages from the swap-cache:
mm: Add interfaces to back up and recover folio contents using swap.

TTM helpers for shrinking:
drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting.
drm/ttm: Provide helpers for shrinking.
drm/ttm: Use fault-injection to test error paths.

Enable i915:
drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool

Any feedback greatly appreciated.
Thomas

Cc: Andrew Morton 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Miaohe Lin 
Cc: David Hildenbrand 
Cc: Johannes Weiner 
Cc: Peter Xu 
Cc: NeilBrown 
Cc: Daniel Vetter 
Cc: Christian Koenig 
Cc: Dave Airlie 
Cc: 
Cc: 
Cc: 


Thomas Hellström (16):
  drm/ttm: Fix a NULL pointer dereference
  drm/ttm/pool: Fix ttm_pool_alloc error path
  drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  drm/ttm, drm/vmwgfx: Update the TTM swapout interface
  drm/ttm: Unexport ttm_global_swapout()
  drm/ttm: Don't use watermark accounting on shrinkable pools
  drm/ttm: Reduce the number of used allocation orders for TTM pages
  drm/ttm: Add a shrinker and shrinker accounting
  drm/ttm: Introduce shrink throttling.
  drm/ttm: Remove pinned bos from shrinkable accounting
  drm/ttm: Add a simple api to set / clear purgeable ttm_tt content
  mm: Add interfaces to back up and recover folio contents using swap
  drm/ttm: Make the call to ttm_tt_populate() interruptible when
faulting
  drm/ttm: Provide helpers for shrinking
  drm/ttm: Use fault-injection to test error paths
  drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem
pool

 drivers/gpu/drm/Kconfig   |  11 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   6 -
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 273 ++---
 

Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver

2023-02-15 Thread Jeffrey Hugo

On 2/14/2023 4:08 AM, Jacek Lawrynowicz wrote:

Hi,


Thank you for the review.


On 06.02.2023 16:41, Jeffrey Hugo wrote:

The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
accelerator PCIe card.  It contains a number of components both in the
SoC and on the card which facilitate running workloads:

QSM: management processor
NSPs: workload compute units
DMA Bridge: dedicated data mover for the workloads
MHI: multiplexed communication channels
DDR: workload storage and memory

The Linux kernel driver for AIC100 is called "QAIC" and is located in the
accel subsystem.

Signed-off-by: Jeffrey Hugo 
Reviewed-by: Carl Vanderlip 
---
  Documentation/accel/index.rst   |   1 +
  Documentation/accel/qaic/aic100.rst | 498 
  Documentation/accel/qaic/index.rst  |  13 +
  Documentation/accel/qaic/qaic.rst   | 169 
  4 files changed, 681 insertions(+)
  create mode 100644 Documentation/accel/qaic/aic100.rst
  create mode 100644 Documentation/accel/qaic/index.rst
  create mode 100644 Documentation/accel/qaic/qaic.rst

diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
index 2b43c9a..e94a016 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -8,6 +8,7 @@ Compute Accelerators
 :maxdepth: 1
  
 introduction

+   qaic/index
  
  .. only::  subproject and html
  
diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst

new file mode 100644
index 000..773aa54
--- /dev/null
+++ b/Documentation/accel/qaic/aic100.rst
@@ -0,0 +1,498 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+===
+ Qualcomm Cloud AI 100 (AIC100)
+===
+
+Overview
+
+
+The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part 
of
+Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
+the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
+inference workloads.  They are AI accelerators.


There are multiple double spaces in this document like this one above.


I presume you are referring to the double space after peroid. 
Universally, that was the recommended style (APA guidebook, etc) until a 
little while ago.  Old habits are hard to break.  Will scrub.





+The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
+(x8).  An individual SoC on a card can have up to 16 NSPs for running 
workloads.
+Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
+
+Multiple AIC100 cards can be hosted in a single system to scale overall
+performance.
+
+Hardware Description
+
+
+An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
+peripherals (PMICs, etc).
+
+An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
+or a Dual M.2 card.  Both use PCIe to connect to the host system.


Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
motherboard with x4 lanes from two connectors combined as a single PCIe device, 
right?


Yes.  There is a specification for this, although it hasn't gotten 
widespread adoption.  In addition to more lanes, you also get to draw 
more power.  Sincle M.2 is around 11W.  Dual M.2 is capped at 25W.


It tends to be a handy form factor for "edge" applications where the 
physical size and power draw of a "normal" PCIe slot (what you'd find on 
a regular ATX motherboard) is not desirerable.





+As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
+DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
+uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
+AIC100 DID (0xa100).


Maybe "SKUs" would fit better here then "instances".


Sure.




+AIC100 does not implement FLR (function level reset).
+
+AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
+operate (1 for MHI, 16 for the DMA Bridge).
+
+As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
+hardware.  AIC100 provides 3, 64-bit BARs.
+
+* The first BAR is 4K in size, and exposes the MHI interface to the host.
+
+* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
+  host.
+
+* The third BAR is variable in size based on an individual AIC100's
+  configuration, but defaults to 64K.  This BAR currently has no purpose.
+
+From the host perspective, AIC100 has several key hardware components-


Typo in "components-".


?
You want "components -"?




+* QSM (QAIC Service Manager)
+* NSPs (Neural Signal Processor)
+* DMA Bridge
+* DDR
+* MHI (Modem Host Interface)
+
+QSM
+---
+
+QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
+firmware of the card and performs on-card management tasks.  It also
+communicates with the host via MHI.  Each AIC100 has one of
+these.


I would put description of MHI at the top 

Re: [Intel-gfx] [PATCH] Revert "drm/i915/hwmon: Enable PL1 power limit"

2023-02-15 Thread Jani Nikula
On Wed, 08 Feb 2023, Rodrigo Vivi  wrote:
> On Wed, Feb 08, 2023 at 11:03:12AM -0800, Ashutosh Dixit wrote:
>> This reverts commit 0349c41b05968befaffa5fbb7e73d0ee6004f610.
>> 
>> 0349c41b0596 ("drm/i915/hwmon: Enable PL1 power limit") is incorrect and
>> caused a major regression on ATSM. The change enabled the PL1 power limit
>> but FW sets the default value of the PL1 limit to 0 which implies HW now
>> works at minimum power and therefore the lowest effective frequency. This
>> means all workloads now run slower resulting in even GuC FW load operations
>> timing out, rendering ATSM unusable.
>> 
>> A different solution to the original issue of the PL1 limit being disabled
>> on ATSM is needed but till that is developed, revert 0349c41b0596.
>> 
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
>
> pushed to drm-intel-next and removed from drm-intel-fixes.
>
> Thanks for the quick reaction.

Please always add Fixes: tags also to reverts.

I suppose we should fix dim to also detect reverts, but I ended up
cherry-picking and pushing the original commit out to
drm-intel-next-fixes before realizing it's been reverted.


BR,
Jani.


>
>> Signed-off-by: Ashutosh Dixit 
>> ---
>>  drivers/gpu/drm/i915/i915_hwmon.c | 5 -
>>  1 file changed, 5 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
>> b/drivers/gpu/drm/i915/i915_hwmon.c
>> index 4683a5b96eff1..1225bc432f0d5 100644
>> --- a/drivers/gpu/drm/i915/i915_hwmon.c
>> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
>> @@ -687,11 +687,6 @@ hwm_get_preregistration_info(struct drm_i915_private 
>> *i915)
>>  for_each_gt(gt, i915, i)
>>  hwm_energy(>ddat_gt[i], );
>>  }
>> -
>> -/* Enable PL1 power limit */
>> -if (i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
>> -hwm_locked_with_pm_intel_uncore_rmw(ddat, 
>> hwmon->rg.pkg_rapl_limit,
>> -PKG_PWR_LIM_1_EN, 
>> PKG_PWR_LIM_1_EN);
>>  }
>>  
>>  void i915_hwmon_register(struct drm_i915_private *i915)
>> -- 
>> 2.38.0
>> 

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: Question: partial transfers of DMABUFs

2023-02-15 Thread Christian König

Am 15.02.23 um 14:52 schrieb Paul Cercueil:

Le mercredi 15 février 2023 à 14:46 +0100, Christian König a écrit :

Am 15.02.23 um 14:24 schrieb Paul Cercueil:

Hi Christian,

Le mercredi 15 février 2023 à 13:58 +0100, Christian König a
écrit :

Hi Paul,

Am 15.02.23 um 11:48 schrieb Paul Cercueil:

Hi,

I am working on adding support for DMABUFs in the IIO
subsystem.

One thing we want there, is the ability to specify the number
of
bytes
to transfer (while still defaulting to the DMABUF size).

Since dma_buf_map_attachment() returns a sg_table,

Please don't assume that this is an sg_table. We just used it as
container for DMA addresses, but this has proven to be a mistake.

TL/DR, why was it a mistake? Just curious.

The sg_table should have just contained DMA addresses, but we had
multiple people who tried to use the pages instead.

This works to some extend, but goes boom as soon as somebody messes
with
the pages reference counts or tries to map it into an address space
or
something like that.

We got so far that we now intentionally mangle the page addresses in
the
sg_table to prevent people from using it:
https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-buf.c#L763

Isn't that breaking the chains though? I'd expect page_link to be
mangled only if !sg_is_chain(sg).


Those are filtered out by for_each_sgtable_sg if I'm not completely 
mistaken.



There is work underway to replace the sg_table with (for example)
just
an array of DMA addresses.

Ok, so I believe at some point we will need an equivalent of
dmaengine_prep_slave_sg() which takes an array of DMA addresses.

Well we will probably come up with a new container for this, but
yeah.

Understood.

You said there was work underway, could you point me to the
corresponding mailing list threads and/or code?


That's not really released yet. We just discussed it a bit when Daniel 
added the sg_table mangling after this went boom for the third time so :)


Just use git blame to find the patch of the mangling and read up on the 
mailing list discussion around that.


Regards,
Christian.




Regards,
Christian.

Cheers,
-Paul


I basically have two options, and I can't decide which one is
the
best (or the less ugly):

- Either I add a new API function similar to
dmaengine_prep_slave_sg(),
which still takes a scatterlist as argument but also takes the
number
of bytes as argument;

- Or I add a function to duplicate the scatterlist and then
shrink
it
manually, which doesn't sound like a good idea either.

What would be the recommended way?

I strongly recommend to come up with a new function which only
takes
DMA
addresses and separate segment length.

Alright, thanks for your input.

So I would add a new dma_device.dma_prep_slave_dma_array() callback
with a corresponding API function, and then the drivers can be
converted from using .dma_prep_slave_sg() to this new function in
due
time.

Vinod, that works for you?

Cheers,
-Paul




Re: Question: partial transfers of DMABUFs

2023-02-15 Thread Paul Cercueil
Le mercredi 15 février 2023 à 14:46 +0100, Christian König a écrit :
> Am 15.02.23 um 14:24 schrieb Paul Cercueil:
> > Hi Christian,
> > 
> > Le mercredi 15 février 2023 à 13:58 +0100, Christian König a
> > écrit :
> > > Hi Paul,
> > > 
> > > Am 15.02.23 um 11:48 schrieb Paul Cercueil:
> > > > Hi,
> > > > 
> > > > I am working on adding support for DMABUFs in the IIO
> > > > subsystem.
> > > > 
> > > > One thing we want there, is the ability to specify the number
> > > > of
> > > > bytes
> > > > to transfer (while still defaulting to the DMABUF size).
> > > > 
> > > > Since dma_buf_map_attachment() returns a sg_table,
> > > Please don't assume that this is an sg_table. We just used it as
> > > container for DMA addresses, but this has proven to be a mistake.
> > TL/DR, why was it a mistake? Just curious.
> 
> The sg_table should have just contained DMA addresses, but we had 
> multiple people who tried to use the pages instead.
> 
> This works to some extend, but goes boom as soon as somebody messes
> with 
> the pages reference counts or tries to map it into an address space
> or 
> something like that.
> 
> We got so far that we now intentionally mangle the page addresses in
> the 
> sg_table to prevent people from using it: 
> https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-buf.c#L763

Isn't that breaking the chains though? I'd expect page_link to be
mangled only if !sg_is_chain(sg).

> > > There is work underway to replace the sg_table with (for example)
> > > just
> > > an array of DMA addresses.
> > Ok, so I believe at some point we will need an equivalent of
> > dmaengine_prep_slave_sg() which takes an array of DMA addresses.
> 
> Well we will probably come up with a new container for this, but
> yeah.

Understood.

You said there was work underway, could you point me to the
corresponding mailing list threads and/or code?

> Regards,
> Christian.

Cheers,
-Paul

> > 
> > > > I basically have two options, and I can't decide which one is
> > > > the
> > > > best (or the less ugly):
> > > > 
> > > > - Either I add a new API function similar to
> > > > dmaengine_prep_slave_sg(),
> > > > which still takes a scatterlist as argument but also takes the
> > > > number
> > > > of bytes as argument;
> > > > 
> > > > - Or I add a function to duplicate the scatterlist and then
> > > > shrink
> > > > it
> > > > manually, which doesn't sound like a good idea either.
> > > > 
> > > > What would be the recommended way?
> > > I strongly recommend to come up with a new function which only
> > > takes
> > > DMA
> > > addresses and separate segment length.
> > Alright, thanks for your input.
> > 
> > So I would add a new dma_device.dma_prep_slave_dma_array() callback
> > with a corresponding API function, and then the drivers can be
> > converted from using .dma_prep_slave_sg() to this new function in
> > due
> > time.
> > 
> > Vinod, that works for you?
> > 
> > Cheers,
> > -Paul
> 



Re: Question: partial transfers of DMABUFs

2023-02-15 Thread Christian König

Am 15.02.23 um 14:24 schrieb Paul Cercueil:

Hi Christian,

Le mercredi 15 février 2023 à 13:58 +0100, Christian König a écrit :

Hi Paul,

Am 15.02.23 um 11:48 schrieb Paul Cercueil:

Hi,

I am working on adding support for DMABUFs in the IIO subsystem.

One thing we want there, is the ability to specify the number of
bytes
to transfer (while still defaulting to the DMABUF size).

Since dma_buf_map_attachment() returns a sg_table,

Please don't assume that this is an sg_table. We just used it as
container for DMA addresses, but this has proven to be a mistake.

TL/DR, why was it a mistake? Just curious.


The sg_table should have just contained DMA addresses, but we had 
multiple people who tried to use the pages instead.


This works to some extend, but goes boom as soon as somebody messes with 
the pages reference counts or tries to map it into an address space or 
something like that.


We got so far that we now intentionally mangle the page addresses in the 
sg_table to prevent people from using it: 
https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-buf.c#L763



There is work underway to replace the sg_table with (for example)
just
an array of DMA addresses.

Ok, so I believe at some point we will need an equivalent of
dmaengine_prep_slave_sg() which takes an array of DMA addresses.


Well we will probably come up with a new container for this, but yeah.

Regards,
Christian.




I basically have two options, and I can't decide which one is the
best (or the less ugly):

- Either I add a new API function similar to
dmaengine_prep_slave_sg(),
which still takes a scatterlist as argument but also takes the
number
of bytes as argument;

- Or I add a function to duplicate the scatterlist and then shrink
it
manually, which doesn't sound like a good idea either.

What would be the recommended way?

I strongly recommend to come up with a new function which only takes
DMA
addresses and separate segment length.

Alright, thanks for your input.

So I would add a new dma_device.dma_prep_slave_dma_array() callback
with a corresponding API function, and then the drivers can be
converted from using .dma_prep_slave_sg() to this new function in due
time.

Vinod, that works for you?

Cheers,
-Paul




Re: [PATCH] drm: document expectations for GETFB2 handles

2023-02-15 Thread Pekka Paalanen
On Wed, 15 Feb 2023 12:42:00 +
Simon Ser  wrote:

> There are two important details missing from the docs:
> 
> - If the memory object backing the FB already has a GEM handle,
>   it's not re-used, a new one is generated.
> - Aliased planes will return the same GEM handle.
> 
> Signed-off-by: Simon Ser 
> Cc: Daniel Vetter 
> Cc: Pekka Paalanen 
> ---
>  include/uapi/drm/drm.h | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> index 642808520d92..4cb956a52aee 100644
> --- a/include/uapi/drm/drm.h
> +++ b/include/uapi/drm/drm.h
> @@ -1104,8 +1104,13 @@ extern "C" {
>   * struct as the output.
>   *
>   * If the client is DRM master or has _SYS_ADMIN, 
> _mode_fb_cmd2.handles
> - * will be filled with GEM buffer handles. Planes are valid until one has a
> - * zero handle -- this can be used to compute the number of planes.
> + * will be filled with GEM buffer handles. Fresh new GEM handles are always
> + * returned, even if another GEM handle referring to the same memory object
> + * already exists on the DRM file description. The caller is responsible for
> + * removing the new handles, e.g. via the _IOCTL_GEM_CLOSE IOCTL. The 
> same
> + * new handle will be returned for multiple planes in case they use the same
> + * memory object. Planes are valid until one has a zero handle -- this can be
> + * used to compute the number of planes.
>   *
>   * Otherwise, _mode_fb_cmd2.handles will be zeroed and planes are valid
>   * until one has a zero _mode_fb_cmd2.pitches.

It is well-written, clear, and a surprise to me.

Acked-by: Pekka Paalanen 

I didn't know it was at all possible to have different GEM handles
pointing to the same object. DMABUF import is guaranteed to return the
existing GEM handle, right? Why is GETFB2 different? Why does it not
have the same problem as what forced DMABUF import to return existing
handles?


Thanks,
pq


pgpDAS1U6QMWM.pgp
Description: OpenPGP digital signature


Re: Question: partial transfers of DMABUFs

2023-02-15 Thread Paul Cercueil
Hi Christian,

Le mercredi 15 février 2023 à 13:58 +0100, Christian König a écrit :
> Hi Paul,
> 
> Am 15.02.23 um 11:48 schrieb Paul Cercueil:
> > Hi,
> > 
> > I am working on adding support for DMABUFs in the IIO subsystem.
> > 
> > One thing we want there, is the ability to specify the number of
> > bytes
> > to transfer (while still defaulting to the DMABUF size).
> > 
> > Since dma_buf_map_attachment() returns a sg_table,
> 
> Please don't assume that this is an sg_table. We just used it as 
> container for DMA addresses, but this has proven to be a mistake.

TL/DR, why was it a mistake? Just curious.

> There is work underway to replace the sg_table with (for example)
> just 
> an array of DMA addresses.

Ok, so I believe at some point we will need an equivalent of
dmaengine_prep_slave_sg() which takes an array of DMA addresses.

> > I basically have two options, and I can't decide which one is the
> > best (or the less ugly):
> > 
> > - Either I add a new API function similar to
> > dmaengine_prep_slave_sg(),
> > which still takes a scatterlist as argument but also takes the
> > number
> > of bytes as argument;
> > 
> > - Or I add a function to duplicate the scatterlist and then shrink
> > it
> > manually, which doesn't sound like a good idea either.
> > 
> > What would be the recommended way?
> 
> I strongly recommend to come up with a new function which only takes
> DMA 
> addresses and separate segment length.

Alright, thanks for your input.

So I would add a new dma_device.dma_prep_slave_dma_array() callback
with a corresponding API function, and then the drivers can be
converted from using .dma_prep_slave_sg() to this new function in due
time.

Vinod, that works for you?

Cheers,
-Paul


Re: [PATCH v2 2/4] drm/bridge: imx: add bridge wrapper driver for i.MX8MP DWC HDMI

2023-02-15 Thread Adam Ford
On Sat, Dec 17, 2022 at 2:30 AM Liu Ying  wrote:

> On Fri, 2022-12-16 at 22:07 +0100, Lucas Stach wrote:
> > Add a simple wrapper driver for the DWC HDMI bridge driver that
> > implements the few bits that are necessary to abstract the i.MX8MP
> > SoC integration.
> >
> > Signed-off-by: Lucas Stach 
> > Reviewed-by: Laurent Pinchart 
> > Tested-by: Marek Vasut 
>

Tested-by: Adam Ford  #imx8mp-beacon


> > ---
> >  drivers/gpu/drm/bridge/imx/Kconfig   |   9 ++
> >  drivers/gpu/drm/bridge/imx/Makefile  |   2 +
> >  drivers/gpu/drm/bridge/imx/imx8mp-hdmi.c | 140
> > +++
> >  3 files changed, 151 insertions(+)
> >  create mode 100644 drivers/gpu/drm/bridge/imx/imx8mp-hdmi.c
> >
>
> Can you please provide a changelog since this is v2?
>
> > diff --git a/drivers/gpu/drm/bridge/imx/Kconfig
> > b/drivers/gpu/drm/bridge/imx/Kconfig
> > index 608f47f41bcd..d828d8bfd893 100644
> > --- a/drivers/gpu/drm/bridge/imx/Kconfig
> > +++ b/drivers/gpu/drm/bridge/imx/Kconfig
> > @@ -44,4 +44,13 @@ config DRM_IMX8QXP_PIXEL_LINK_TO_DPI
> > Choose this to enable pixel link to display pixel
> > interface(PXL2DPI)
> > found in Freescale i.MX8qxp processor.
> >
> > +config DRM_IMX8MP_DW_HDMI_BRIDGE
>
> Sort the config names alphabetically please.
>
> > + tristate "i.MX8MP HDMI bridge support"
>
> To show the prompts in this Kconfig file in a consistent fashion,
> please add 'Freescale' before 'i.MX8MP'.
>
> > + depends on OF
> > + depends on COMMON_CLK
> > + select DRM_DW_HDMI
> > + help
> > +   Choose this to enable support for the internal HDMI encoder
> > found
> > +   on the i.MX8MP SoC.
> > +
> >  endif # ARCH_MXC || COMPILE_TEST
> > diff --git a/drivers/gpu/drm/bridge/imx/Makefile
> > b/drivers/gpu/drm/bridge/imx/Makefile
> > index aa90ec8d5433..03b0074ae538 100644
> > --- a/drivers/gpu/drm/bridge/imx/Makefile
> > +++ b/drivers/gpu/drm/bridge/imx/Makefile
> > @@ -7,3 +7,5 @@ obj-$(CONFIG_DRM_IMX8QXP_LDB) += imx8qxp-ldb.o
> >  obj-$(CONFIG_DRM_IMX8QXP_PIXEL_COMBINER) += imx8qxp-pixel-combiner.o
> >  obj-$(CONFIG_DRM_IMX8QXP_PIXEL_LINK) += imx8qxp-pixel-link.o
> >  obj-$(CONFIG_DRM_IMX8QXP_PIXEL_LINK_TO_DPI) += imx8qxp-pxl2dpi.o
> > +
> > +obj-$(CONFIG_DRM_IMX8MP_DW_HDMI_BRIDGE) += imx8mp-hdmi.o
>
> Sort the config names alphabetically.
>
> > diff --git a/drivers/gpu/drm/bridge/imx/imx8mp-hdmi.c
> > b/drivers/gpu/drm/bridge/imx/imx8mp-hdmi.c
> > new file mode 100644
> > index ..06849b817aed
> > --- /dev/null
> > +++ b/drivers/gpu/drm/bridge/imx/imx8mp-hdmi.c
> > @@ -0,0 +1,140 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +
> > +/*
> > + * Copyright (C) 2022 Pengutronix, Lucas Stach <
> > ker...@pengutronix.de>
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
>
> Header files in linux/ come before those in drm/.
>
> > +
> > +struct imx8mp_hdmi {
> > + struct dw_hdmi_plat_data plat_data;
> > + struct dw_hdmi *dw_hdmi;
> > + struct clk *pixclk;
> > + struct clk *fdcc;
> > +};
> > +
> > +static enum drm_mode_status
> > +imx8mp_hdmi_mode_valid(struct dw_hdmi *dw_hdmi, void *data,
> > +const struct drm_display_info *info,
> > +const struct drm_display_mode *mode)
> > +{
> > + struct imx8mp_hdmi *hdmi = (struct imx8mp_hdmi *)data;
> > +
> > + if (mode->clock < 13500)
> > + return MODE_CLOCK_LOW;
> > +
> > + if (mode->clock > 297000)
> > + return MODE_CLOCK_HIGH;
> > +
> > + if (clk_round_rate(hdmi->pixclk, mode->clock * 1000) !=
> > + mode->clock * 1000)
> > + return MODE_CLOCK_RANGE;
> > +
> > + /* We don't support double-clocked and Interlaced modes */
> > + if ((mode->flags & DRM_MODE_FLAG_DBLCLK) ||
> > + (mode->flags & DRM_MODE_FLAG_INTERLACE))
> > + return MODE_BAD;
> > +
> > + return MODE_OK;
> > +}
> > +
> > +static int imx8mp_hdmi_phy_init(struct dw_hdmi *dw_hdmi, void *data,
> > + const struct drm_display_info *display,
> > + const struct drm_display_mode *mode)
> > +{
> > + return 0;
> > +}
> > +
> > +static void imx8mp_hdmi_phy_disable(struct dw_hdmi *dw_hdmi, void
> > *data)
> > +{
> > +}
> > +
> > +static void im8mp_hdmi_phy_setup_hpd(struct dw_hdmi *hdmi, void
> > *data)
> > +{
> > + /*
> > +  * Just release PHY core from reset, all other power management
> > is done
> > +  * by the PHY driver.
> > +  */
> > + dw_hdmi_phy_gen1_reset(hdmi);
> > +
> > + dw_hdmi_phy_setup_hpd(hdmi, data);
> > +}
> > +
> > +static const struct dw_hdmi_phy_ops imx8mp_hdmi_phy_ops = {
> > + .init   = imx8mp_hdmi_phy_init,
> > + .disable= imx8mp_hdmi_phy_disable,
> > + .setup_hpd  = im8mp_hdmi_phy_setup_hpd,
> > + .read_hpd   = dw_hdmi_phy_read_hpd,
> > + .update_hpd = dw_hdmi_phy_update_hpd,
> > +};
> > +
> > +static int 

Re: Question: partial transfers of DMABUFs

2023-02-15 Thread Paul Cercueil
Le mercredi 15 février 2023 à 13:13 +0100, Maarten Lankhorst a écrit :
> 
> On 2023-02-15 13:00, Paul Cercueil wrote:
> > Hi Maarten,
> > 
> > Le mercredi 15 février 2023 à 12:52 +0100, Maarten Lankhorst a
> > écrit :
> > > Hey,
> > > 
> > > On 2023-02-15 12:47, Paul Cercueil wrote:
> > > > Hi Maarten,
> > > > 
> > > > Le mercredi 15 février 2023 à 12:30 +0100, Maarten Lankhorst a
> > > > écrit :
> > > > > Hey,
> > > > > 
> > > > > On 2023-02-15 11:48, Paul Cercueil wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I am working on adding support for DMABUFs in the IIO
> > > > > > subsystem.
> > > > > > 
> > > > > > One thing we want there, is the ability to specify the
> > > > > > number
> > > > > > of
> > > > > > bytes
> > > > > > to transfer (while still defaulting to the DMABUF size).
> > > > > > 
> > > > > > Since dma_buf_map_attachment() returns a sg_table, I
> > > > > > basically
> > > > > > have
> > > > > > two
> > > > > > options, and I can't decide which one is the best (or the
> > > > > > less
> > > > > > ugly):
> > > > > > 
> > > > > > - Either I add a new API function similar to
> > > > > > dmaengine_prep_slave_sg(),
> > > > > > which still takes a scatterlist as argument but also takes
> > > > > > the
> > > > > > number
> > > > > > of bytes as argument;
> > > > > > 
> > > > > > - Or I add a function to duplicate the scatterlist and then
> > > > > > shrink
> > > > > > it
> > > > > > manually, which doesn't sound like a good idea either.
> > > > > > 
> > > > > > What would be the recommended way?
> > > > > Does this need an api change? If you create a DMA-BUF of size
> > > > > X,
> > > > > it
> > > > > has
> > > > > to be of size X. You can pad with a dummy page probably if
> > > > > you
> > > > > know
> > > > > it
> > > > > in advance. But after it has been imported, it cannot change
> > > > > size.
> > > > Yes, the sizes are fixed.
> > > > 
> > > > > You don´t have to write the entire dma-buf either, so if you
> > > > > want
> > > > > to
> > > > > create a 1GB buf and only use the first 4K, that is allowed.
> > > > > The
> > > > > contents of  the remainder of the DMA-BUF are undefined. It's
> > > > > up
> > > > > to
> > > > > userspace to assign a meaning to it.
> > > > > 
> > > > > I think I'm missing something here that makes the whole
> > > > > question
> > > > > m,ake
> > > > > more sense.
> > > > I want my userspace to be able to specify how much of the
> > > > DMABUF is
> > > > to
> > > > be read from or written to.
> > > > 
> > > > So in my new "dmabuf enqueue" IOCTL that I want to add to IIO,
> > > > I
> > > > added
> > > > a parameter to specify the number of bytes to transfer (where 0
> > > > means
> > > > the whole buffer).
> > > > 
> > > > The problem I have now, is that the current dmaengine core does
> > > > not
> > > > have a API function that takes a scatterlist (returned by
> > > > dma_map_attachment()) and a transfer size in bytes, it will
> > > > always
> > > > transfer the whole scatterlist.
> > > > 
> > > > So my two options would be to add a new API function to support
> > > > specifying a bytes count, or add a mechanism to duplicate a
> > > > scatterlist, so that I can tweak it to the right size.
> > > This doesn't have to happen through DMA-BUF. Presumably you are
> > > both
> > > the
> > > importer and the exporter, so after you know how much is read,
> > > you
> > > can
> > > tell this to the importer that X number of bytes can be read from
> > > DMA-BUF Y.
> > Yes, I do that already as it is an argument in my ioctl.
> > 
> > > In your case, when enqueing you will get a full SG list, but if
> > > you
> > > know
> > > only X bytes are read/written you only have to map the first X
> > > bytes
> > > to
> > > your IIO device. The rest of the SG list could be ignored safely.
> > Yes. But I don't know how to "ignore the rest of the SG list".
> > 
> > - dma_buf_map_attachment() does not have a parameter to specify
> > that I
> > only need the first X bytes mapped;
> > 
> > - if I map the whole thing, dmaengine_prep_slave_sg() does not have
> > an
> > option to specify that I only want the first X bytes transferred.
> 
> sg_split apppears to allow you to split it? I'm not 100% sure whether
> it 
> leaves the original SG untouched, but you can try to put it in
> between 
> those 2 calls to get a smaller SG to pass to prep_slave_sg.

I overlooked sg_split. It looks like it could work for me.

Thanks!

Cheers,
-Paul


Re: Question: partial transfers of DMABUFs

2023-02-15 Thread Christian König

Hi Paul,

Am 15.02.23 um 11:48 schrieb Paul Cercueil:

Hi,

I am working on adding support for DMABUFs in the IIO subsystem.

One thing we want there, is the ability to specify the number of bytes
to transfer (while still defaulting to the DMABUF size).

Since dma_buf_map_attachment() returns a sg_table,


Please don't assume that this is an sg_table. We just used it as 
container for DMA addresses, but this has proven to be a mistake.


There is work underway to replace the sg_table with (for example) just 
an array of DMA addresses.



I basically have two options, and I can't decide which one is the best (or the 
less ugly):

- Either I add a new API function similar to dmaengine_prep_slave_sg(),
which still takes a scatterlist as argument but also takes the number
of bytes as argument;

- Or I add a function to duplicate the scatterlist and then shrink it
manually, which doesn't sound like a good idea either.

What would be the recommended way?


I strongly recommend to come up with a new function which only takes DMA 
addresses and separate segment length.


Regards,
Christian.



Cheers,
-Paul




[PATCH] drm: document expectations for GETFB2 handles

2023-02-15 Thread Simon Ser
There are two important details missing from the docs:

- If the memory object backing the FB already has a GEM handle,
  it's not re-used, a new one is generated.
- Aliased planes will return the same GEM handle.

Signed-off-by: Simon Ser 
Cc: Daniel Vetter 
Cc: Pekka Paalanen 
---
 include/uapi/drm/drm.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 642808520d92..4cb956a52aee 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -1104,8 +1104,13 @@ extern "C" {
  * struct as the output.
  *
  * If the client is DRM master or has _SYS_ADMIN, _mode_fb_cmd2.handles
- * will be filled with GEM buffer handles. Planes are valid until one has a
- * zero handle -- this can be used to compute the number of planes.
+ * will be filled with GEM buffer handles. Fresh new GEM handles are always
+ * returned, even if another GEM handle referring to the same memory object
+ * already exists on the DRM file description. The caller is responsible for
+ * removing the new handles, e.g. via the _IOCTL_GEM_CLOSE IOCTL. The same
+ * new handle will be returned for multiple planes in case they use the same
+ * memory object. Planes are valid until one has a zero handle -- this can be
+ * used to compute the number of planes.
  *
  * Otherwise, _mode_fb_cmd2.handles will be zeroed and planes are valid
  * until one has a zero _mode_fb_cmd2.pitches.
-- 
2.39.1




Re: [Intel-gfx] [PATCH] drm/i915: Don't use stolen memory for ring buffers

2023-02-15 Thread Tvrtko Ursulin



On 15/02/2023 01:56, Ceraolo Spurio, Daniele wrote:



On 2/14/2023 3:48 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

Direction from hardware is that stolen memory should never be used for
ring buffer allocations. There are too many caching pitfalls due to the
way stolen memory accesses are routed. So it is safest to just not use
it.


I'm wondering if this applies to machines in ringbuffer mode as well, as 
some of the caching stuff that according to the HW team may not work 
properly with stolen mem accesses from the CS (mocs, ppat) came with 
gen8/gen9.
Maybe limit this change to gen8+, to avoid changing the behavior for 
very old platforms?


If Gen8+ can have bugs due this then:

Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
Cc:  # v4.9+

Or even before:

Fixes: ebc052e0c65f ("drm/i915: Allocate ringbuffers from stolen memory")
Cc:  # v3.9+

Hm lets see when BDW when out of force probe:

Fixes: babb1903511f ("drm/i915/bdw: remove preliminary_hw_support flag from 
BDW")
Cc:  # v3.14+

Depends also how the problem statement interacts with LLC. If !LLC platforms 
are okay then the first one from the above list is enough.

Because

Regards,

Tvrtko



Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_ring.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c

index 15ec64d881c44..d1a47e1ae6452 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -116,8 +116,6 @@ static struct i915_vma *create_ring_vma(struct 
i915_ggtt *ggtt, int size)
  obj = i915_gem_object_create_lmem(i915, size, 
I915_BO_ALLOC_VOLATILE |

    I915_BO_ALLOC_PM_VOLATILE);
-    if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
-    obj = i915_gem_object_create_stolen(i915, size);


There is code in ring_pin/unpin() that only applies to rings in stolen 
memory, so you need to remove that as well if you drop stolen for rings 
on all platforms.


Daniele


  if (IS_ERR(obj))
  obj = i915_gem_object_create_internal(i915, size);
  if (IS_ERR(obj))




  1   2   >