Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-09-13 Thread Felix Kuehling
On 2018-09-13 04:52 PM, Philip Yang wrote:
> Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
> DRM_AMDGPU_USERPTR Kconfig.
>
> It supports both KFD userptr and gfx userptr paths.
>
> This depends on several HMM patchset from Jérôme Glisse queued for
> upstream.
>
> Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
>  drivers/gpu/drm/amd/amdgpu/Makefile|   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 
> ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
>  4 files changed, 56 insertions(+), 75 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
> b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 9221e54..960a633 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
>  config DRM_AMDGPU_USERPTR
>   bool "Always enable userptr write support"
>   depends on DRM_AMDGPU
> - select MMU_NOTIFIER
> + select HMM_MIRROR
>   help
> -   This option selects CONFIG_MMU_NOTIFIER if it isn't already
> -   selected to enabled full userptr support.
> +   This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
> +   isn't already selected to enabled full userptr support.
>  
>  config DRM_AMDGPU_GART_DEBUGFS
>   bool "Allow GART access through debugfs"
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 138cb78..c1e5d43 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -171,7 +171,7 @@ endif
>  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
>  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
>  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
> -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
> +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
>  
>  include $(FULL_AMD_PATH)/powerplay/Makefile
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index e55508b..ad52f34 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -45,7 +45,7 @@
>  
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -66,6 +66,7 @@

Need to remove @mn documentation.

>   * @objects: interval tree containing amdgpu_mn_nodes
>   * @read_lock: mutex for recursive locking of @lock
>   * @recursion: depth of recursion
> + * @mirror: HMM mirror function support
>   *
>   * Data for each amdgpu device and process address space.
>   */
> @@ -73,7 +74,6 @@ struct amdgpu_mn {
>   /* constant after initialisation */
>   struct amdgpu_device*adev;
>   struct mm_struct*mm;
> - struct mmu_notifier mn;
>   enum amdgpu_mn_type type;
>  
>   /* only used on destruction */
> @@ -87,6 +87,9 @@ struct amdgpu_mn {
>   struct rb_root_cached   objects;
>   struct mutexread_lock;
>   atomic_trecursion;
> +
> + /* HMM mirror */
> + struct hmm_mirror   mirror;
>  };
>  
>  /**
> @@ -103,7 +106,7 @@ struct amdgpu_mn_node {
>  };
>  
>  /**
> - * amdgpu_mn_destroy - destroy the MMU notifier
> + * amdgpu_mn_destroy - destroy the HMM mirror
>   *
>   * @work: previously sheduled work item
>   *
> @@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct work_struct *work)
>   }
>   up_write(&amn->lock);
>   mutex_unlock(&adev->mn_lock);
> - mmu_notifier_unregister_no_release(&amn->mn, amn->mm);
> + hmm_mirror_unregister(&amn->mirror);
> +
>   kfree(amn);
>  }
>  
>  /**
>   * amdgpu_mn_release - callback to notify about mm destruction

Update the function name in the comment.

>   *
> - * @mn: our notifier
> - * @mm: the mm this callback is about
> + * @mirror: the HMM mirror (mm) this callback is about
>   *
> - * Shedule a work item to lazy destroy our notifier.
> + * Shedule a work item to lazy destroy HMM mirror.
>   */
> -static void amdgpu_mn_release(struct mmu_notifier *mn,
> -   struct mm_struct *mm)
> +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
>  {
> - struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
> + struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
>  
>   INIT_WORK(&amn->work, amdgpu_mn_destroy);
>   schedule_work(&amn->work);
>  }
>  
> -
>  /**
>   * amdgpu_mn_lock - take the write side lock for this notifier
>   *
> @@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct 
> amdgpu_mn_node *node,
>  /**
>   * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change
>   *
> - * @mn: our notifier
> - * @mm: the mm this callback is about
> - * @start: start of updated range
> - * @end: end of updated range
> + * @mirror: the hmm_mirror (mm) is abo

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák
To be fair, since we have only 7 user VMIDs and 8 chunks of GDS, we
can make the 8th GDS chunk global and allocatable and use it based on
a CS flag. It would need more work and a lot of testing though. I
don't think we can do the testing part now because of the complexity
of interactions between per-VMID GDS and global GDS, but it's
certainly something that people could add in the future.

Marek

On Thu, Sep 13, 2018 at 3:04 PM, Marek Olšák  wrote:
> I was thinking about that too, but it would be too much trouble for
> something we don't need.
>
> Marek
>
> On Thu, Sep 13, 2018 at 2:57 PM, Deucher, Alexander
>  wrote:
>> Why don't we just fix up the current GDS code so it works the same as vram
>> and then we can add a new CS or context flag to ignore the current static
>> allocation for gfx.  We can ignore data persistence if it's too much
>> trouble.  Assume you always have to init the memory before you use it.
>> That's already the case.
>>
>>
>> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdgpu:No action when VCN PG state is unchanged

2018-09-13 Thread James Zhu
When VCN PG state is unchanged, it is unnecessary to reset power
gate state

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c   | 12 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index 0b0b863..d2219ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -69,6 +69,7 @@ struct amdgpu_vcn {
struct amdgpu_ring  ring_jpeg;
struct amdgpu_irq_src   irq;
unsignednum_enc_rings;
+   enum amd_powergating_state cur_state;
 };
 
 int amdgpu_vcn_sw_init(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 2664bb2..2cde0b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1633,12 +1633,20 @@ static int vcn_v1_0_set_powergating_state(void *handle,
 * revisit this when there is a cleaner line between
 * the smc and the hw blocks
 */
+   int ret;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   if(state == adev->vcn.cur_state)
+   return 0;
+
if (state == AMD_PG_STATE_GATE)
-   return vcn_v1_0_stop(adev);
+   ret = vcn_v1_0_stop(adev);
else
-   return vcn_v1_0_start(adev);
+   ret = vcn_v1_0_start(adev);
+
+   if(!ret)
+   adev->vcn.cur_state = state;
+   return ret;
 }
 
 static const struct amd_ip_funcs vcn_v1_0_ip_funcs = {
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-09-13 Thread Philip Yang
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream.

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
 drivers/gpu/drm/amd/amdgpu/Makefile|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
 4 files changed, 56 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
 config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
-   select MMU_NOTIFIER
+   select HMM_MIRROR
help
- This option selects CONFIG_MMU_NOTIFIER if it isn't already
- selected to enabled full userptr support.
+ This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+ isn't already selected to enabled full userptr support.
 
 config DRM_AMDGPU_GART_DEBUGFS
bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
 amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
 amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
 amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
 
 include $(FULL_AMD_PATH)/powerplay/Makefile
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index e55508b..ad52f34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -45,7 +45,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -66,6 +66,7 @@
  * @objects: interval tree containing amdgpu_mn_nodes
  * @read_lock: mutex for recursive locking of @lock
  * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
  *
  * Data for each amdgpu device and process address space.
  */
@@ -73,7 +74,6 @@ struct amdgpu_mn {
/* constant after initialisation */
struct amdgpu_device*adev;
struct mm_struct*mm;
-   struct mmu_notifier mn;
enum amdgpu_mn_type type;
 
/* only used on destruction */
@@ -87,6 +87,9 @@ struct amdgpu_mn {
struct rb_root_cached   objects;
struct mutexread_lock;
atomic_trecursion;
+
+   /* HMM mirror */
+   struct hmm_mirror   mirror;
 };
 
 /**
@@ -103,7 +106,7 @@ struct amdgpu_mn_node {
 };
 
 /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
  *
  * @work: previously sheduled work item
  *
@@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct work_struct *work)
}
up_write(&amn->lock);
mutex_unlock(&adev->mn_lock);
-   mmu_notifier_unregister_no_release(&amn->mn, amn->mm);
+   hmm_mirror_unregister(&amn->mirror);
+
kfree(amn);
 }
 
 /**
  * amdgpu_mn_release - callback to notify about mm destruction
  *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
  *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
  */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
 {
-   struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+   struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
 
INIT_WORK(&amn->work, amdgpu_mn_destroy);
schedule_work(&amn->work);
 }
 
-
 /**
  * amdgpu_mn_lock - take the write side lock for this notifier
  *
@@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct 
amdgpu_mn_node *node,
 /**
  * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change
  *
- * @mn: our notifier
- * @mm: the mm this callback is about
- * @start: start of updated range
- * @end: end of updated range
+ * @mirror: the hmm_mirror (mm) is about to update
+ * @update: the update start, end address
  *
  * Block for operations on BOs to finish and mark pages as accessed and
  * potentially dirty.
  */
-static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
-struct mm_struct *mm,
-

Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v3

2018-09-13 Thread Philip Yang

On 2018-09-13 02:24 PM, Christian König wrote:

Am 13.09.2018 um 20:00 schrieb Philip Yang:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream. See
http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD 
intranet)


Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig |  6 +--
  drivers/gpu/drm/amd/amdgpu/Makefile    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 88 
+++---

  3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
  bool "Always enable userptr write support"
  depends on DRM_AMDGPU
-    select MMU_NOTIFIER
+    select HMM_MIRROR
  help
-  This option selects CONFIG_MMU_NOTIFIER if it isn't already
-  selected to enabled full userptr support.
+  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+  isn't already selected to enabled full userptr support.
    config DRM_AMDGPU_GART_DEBUGFS
  bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
    include $(FULL_AMD_PATH)/powerplay/Makefile
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c

index e55508b..ea8671f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -46,6 +46,7 @@
  #include 
  #include 
  #include 
+#include 


Can we now drop including linux/mmu_notifier.h?

Yes, will use hmm_mirror_ops to replace gfx and kfd mmu_notifier_ops

  #include 
  #include 
  #include 
@@ -66,6 +67,7 @@
   * @objects: interval tree containing amdgpu_mn_nodes
   * @read_lock: mutex for recursive locking of @lock
   * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
   *
   * Data for each amdgpu device and process address space.
   */
@@ -87,6 +89,9 @@ struct amdgpu_mn {
  struct rb_root_cached    objects;
  struct mutex    read_lock;
  atomic_t    recursion;
+
+    /* HMM mirror */
+    struct hmm_mirror    mirror;
  };
    /**
@@ -103,7 +108,7 @@ struct amdgpu_mn_node {
  };
    /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
   *
   * @work: previously sheduled work item
   *
@@ -129,28 +134,27 @@ static void amdgpu_mn_destroy(struct 
work_struct *work)

  }
  up_write(&amn->lock);
  mutex_unlock(&adev->mn_lock);
-    mmu_notifier_unregister_no_release(&amn->mn, amn->mm);
+
+    hmm_mirror_unregister(&amn->mirror);
+
  kfree(amn);
  }
    /**
   * amdgpu_mn_release - callback to notify about mm destruction
   *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
   *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
   */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
-  struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
  {
-    struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+    struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, 
mirror);

    INIT_WORK(&amn->work, amdgpu_mn_destroy);
  schedule_work(&amn->work);
  }
  -
  /**
   * amdgpu_mn_lock - take the write side lock for this notifier
   *
@@ -355,12 +359,10 @@ static void 
amdgpu_mn_invalidate_range_end(struct mmu_notifier *mn,

    static const struct mmu_notifier_ops amdgpu_mn_ops[] = {
  [AMDGPU_MN_TYPE_GFX] = {
-    .release = amdgpu_mn_release,
  .invalidate_range_start = 
amdgpu_mn_invalidate_range_start_gfx,

  .invalidate_range_end = amdgpu_mn_invalidate_range_end,
  },
  [AMDGPU_MN_TYPE_HSA] = {
-    .release = amdgpu_mn_release,
  .invalidate_range_start = 
amdgpu_mn_invalidate_range_start_hsa,

  .invalidate_range_end = amdgpu_mn_invalidate_range_end,
  },
@@ -373,12 +375,63 @@ static const struct mmu_notifier_ops 
amdgpu_mn_ops[] = {

  #define AMDGPU_MN_KEY(mm, type) ((unsigned long)(mm) + (type))
    /**
- *

[PATCH] drm/amdgpu: simplify Raven, Raven2, and Picasso handling

2018-09-13 Thread Alex Deucher
Treat them all as Raven rather than adding a new picasso
asic type.  This simplifies a lot of code and also handles the
case of rv2 chips with the 0x15d8 pci id.  It also fixes dmcu
fw handling for picasso.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  7 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 32 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 --
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c| 11 ++--
 drivers/gpu/drm/amd/amdgpu/psp_v10_0.c |  5 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 11 +---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 66 ++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  |  8 +--
 drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c|  1 -
 .../gpu/drm/amd/powerplay/hwmgr/processpptables.c  |  8 +--
 include/drm/amd_asic_type.h|  1 -
 16 files changed, 60 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 762dc5f886cd..354f0557d697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -91,7 +91,6 @@ static const char *amdgpu_asic_name[] = {
"VEGA12",
"VEGA20",
"RAVEN",
-   "PICASSO",
"LAST",
 };
 
@@ -1337,12 +1336,11 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
case CHIP_RAVEN:
if (adev->rev_id >= 8)
chip_name = "raven2";
+   else if (adev->pdev->device == 0x15d8)
+   chip_name = "picasso";
else
chip_name = "raven";
break;
-   case CHIP_PICASSO:
-   chip_name = "picasso";
-   break;
}
 
snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_gpu_info.bin", chip_name);
@@ -1468,8 +1466,7 @@ static int amdgpu_device_ip_early_init(struct 
amdgpu_device *adev)
case CHIP_VEGA12:
case CHIP_VEGA20:
case CHIP_RAVEN:
-   case CHIP_PICASSO:
-   if ((adev->asic_type == CHIP_RAVEN) || (adev->asic_type == 
CHIP_PICASSO))
+   if (adev->asic_type == CHIP_RAVEN)
adev->family = AMDGPU_FAMILY_RV;
else
adev->family = AMDGPU_FAMILY_AI;
@@ -2183,7 +2180,6 @@ bool amdgpu_device_asic_has_dc_support(enum amd_asic_type 
asic_type)
case CHIP_VEGA20:
 #if defined(CONFIG_DRM_AMD_DC_DCN1_0)
case CHIP_RAVEN:
-   case CHIP_PICASSO:
 #endif
return amdgpu_dc != 0;
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 33e1856fb8cc..ff10df4f50d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -874,8 +874,7 @@ static const struct pci_device_id pciidlist[] = {
{0x1002, 0x66AF, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA20},
/* Raven */
{0x1002, 0x15dd, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RAVEN|AMD_IS_APU},
-   /* Picasso */
-   {0x1002, 0x15d8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_PICASSO|AMD_IS_APU},
+   {0x1002, 0x15d8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RAVEN|AMD_IS_APU},
 
{0, 0, 0}
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 611c06d3600a..bd397d2916fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -56,7 +56,6 @@ static int psp_sw_init(void *handle)
psp_v3_1_set_psp_funcs(psp);
break;
case CHIP_RAVEN:
-   case CHIP_PICASSO:
psp_v10_0_set_psp_funcs(psp);
break;
case CHIP_VEGA20:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index acb4c66fe89b..1fa8bc337859 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -303,7 +303,6 @@ amdgpu_ucode_get_load_type(struct amdgpu_device *adev, int 
load_type)
return AMDGPU_FW_LOAD_SMU;
case CHIP_VEGA10:
case CHIP_RAVEN:
-   case CHIP_PICASSO:
case CHIP_VEGA12:
case CHIP_VEGA20:
if (!load_type)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index a74498ce87ff..a73674f9a0f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -63,14 +63,13 @@ int amdgpu_vcn_sw_init(struct amdgpu_device

[PATCH] drm/amdgpu/soc15: clean up picasso support

2018-09-13 Thread Alex Deucher
It's the same as raven so remove the duplicate case.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index f5a44d1fe5da..f930e09071d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -546,23 +546,6 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
amdgpu_device_ip_block_add(adev, &vce_v4_0_ip_block);
break;
case CHIP_RAVEN:
-   amdgpu_device_ip_block_add(adev, &vega10_common_ip_block);
-   amdgpu_device_ip_block_add(adev, &gmc_v9_0_ip_block);
-   amdgpu_device_ip_block_add(adev, &vega10_ih_ip_block);
-   amdgpu_device_ip_block_add(adev, &psp_v10_0_ip_block);
-   amdgpu_device_ip_block_add(adev, &pp_smu_ip_block);
-   if (adev->enable_virtual_display || amdgpu_sriov_vf(adev))
-   amdgpu_device_ip_block_add(adev, &dce_virtual_ip_block);
-#if defined(CONFIG_DRM_AMD_DC)
-   else if (amdgpu_device_has_dc_support(adev))
-   amdgpu_device_ip_block_add(adev, &dm_ip_block);
-#else
-#  warning "Enable CONFIG_DRM_AMD_DC for display support on SOC15."
-#endif
-   amdgpu_device_ip_block_add(adev, &gfx_v9_0_ip_block);
-   amdgpu_device_ip_block_add(adev, &sdma_v4_0_ip_block);
-   amdgpu_device_ip_block_add(adev, &vcn_v1_0_ip_block);
-   break;
case CHIP_PICASSO:
amdgpu_device_ip_block_add(adev, &vega10_common_ip_block);
amdgpu_device_ip_block_add(adev, &gmc_v9_0_ip_block);
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amd/display: Add DMCU firmware version

2018-09-13 Thread David Francis
Read the version number from the common firmware header and store
it in the dm struct

Signed-off-by: David Francis 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index f0ae11802e9a..d21d738c8356 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -589,6 +589,8 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
adev->firmware.fw_size +=
ALIGN(le32_to_cpu(hdr->intv_size_bytes), PAGE_SIZE);
 
+   adev->dm.dmcu_fw_version = le32_to_cpu(hdr->header.ucode_version);
+
DRM_DEBUG_KMS("PSP loading DMCU firmware\n");
 
return 0;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 9a57c654943a..b6fe9adf4b93 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -131,6 +131,7 @@ struct amdgpu_display_manager {
struct dm_comressor_info compressor;
 
const struct firmware *fw_dmcu;
+   uint32_t dmcu_fw_version;
 };
 
 struct amdgpu_dm_connector {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdgpu: Add DMCU to firmware query interface

2018-09-13 Thread David Francis
DMCU firmware version can be read using the AMDGPU_INFO ioctl
or the amdgpu_firmware_info debugfs entry

Signed-off-by: David Francis 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 
 include/uapi/drm/amdgpu_drm.h   |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 29ac3873eeb0..f5caf873f008 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -257,6 +257,10 @@ static int amdgpu_firmware_info(struct 
drm_amdgpu_info_firmware *fw_info,
fw_info->ver = adev->psp.asd_fw_version;
fw_info->feature = adev->psp.asd_feature_version;
break;
+   case AMDGPU_INFO_FW_DMCU:
+   fw_info->ver = adev->dm.dmcu_fw_version;
+   fw_info->feature = 0;
+   break;
default:
return -EINVAL;
}
@@ -1296,6 +1300,14 @@ static int amdgpu_debugfs_firmware_info(struct seq_file 
*m, void *data)
seq_printf(m, "VCN feature version: %u, firmware version: 0x%08x\n",
   fw_info.feature, fw_info.ver);
 
+   /* DMCU */
+   query_fw.fw_type = AMDGPU_INFO_FW_DMCU;
+   ret = amdgpu_firmware_info(&fw_info, &query_fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "DMCU feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
 
seq_printf(m, "VBIOS version: %s\n", ctx->vbios_version);
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 9eeba55b..6a0d77dcfc47 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -668,6 +668,8 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_GPM_MEM 0x10
/* Subquery id: Query GFX RLC SRLS firmware version */
#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_SRM_MEM 0x11
+   /* Subquery id: Query DMCU firmware version */
+   #define AMDGPU_INFO_FW_DMCU 0x12
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED0x0f
 /* the used VRAM size */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 0/2] DMCU firmware version storing and access

2018-09-13 Thread David Francis
David Francis (2):
  drm/amd/display: Add DMCU firmware version
  drm/amdgpu: Add DMCU to firmware query interface

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   | 12 
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 ++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  1 +
 include/uapi/drm/amdgpu_drm.h |  2 ++
 4 files changed, 17 insertions(+)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák
I was thinking about that too, but it would be too much trouble for
something we don't need.

Marek

On Thu, Sep 13, 2018 at 2:57 PM, Deucher, Alexander
 wrote:
> Why don't we just fix up the current GDS code so it works the same as vram
> and then we can add a new CS or context flag to ignore the current static
> allocation for gfx.  We can ignore data persistence if it's too much
> trouble.  Assume you always have to init the memory before you use it.
> That's already the case.
>
>
> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v3

2018-09-13 Thread Christian König

Am 13.09.2018 um 20:00 schrieb Philip Yang:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream. See
http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD intranet)

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig |  6 +--
  drivers/gpu/drm/amd/amdgpu/Makefile|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 88 +++---
  3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
-   select MMU_NOTIFIER
+   select HMM_MIRROR
help
- This option selects CONFIG_MMU_NOTIFIER if it isn't already
- selected to enabled full userptr support.
+ This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+ isn't already selected to enabled full userptr support.
  
  config DRM_AMDGPU_GART_DEBUGFS

bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
  
  include $(FULL_AMD_PATH)/powerplay/Makefile
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c

index e55508b..ea8671f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -46,6 +46,7 @@
  #include 
  #include 
  #include 
+#include 


Can we now drop including linux/mmu_notifier.h?


  #include 
  #include 
  #include 
@@ -66,6 +67,7 @@
   * @objects: interval tree containing amdgpu_mn_nodes
   * @read_lock: mutex for recursive locking of @lock
   * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
   *
   * Data for each amdgpu device and process address space.
   */
@@ -87,6 +89,9 @@ struct amdgpu_mn {
struct rb_root_cached   objects;
struct mutexread_lock;
atomic_trecursion;
+
+   /* HMM mirror */
+   struct hmm_mirror   mirror;
  };
  
  /**

@@ -103,7 +108,7 @@ struct amdgpu_mn_node {
  };
  
  /**

- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
   *
   * @work: previously sheduled work item
   *
@@ -129,28 +134,27 @@ static void amdgpu_mn_destroy(struct work_struct *work)
}
up_write(&amn->lock);
mutex_unlock(&adev->mn_lock);
-   mmu_notifier_unregister_no_release(&amn->mn, amn->mm);
+
+   hmm_mirror_unregister(&amn->mirror);
+
kfree(amn);
  }
  
  /**

   * amdgpu_mn_release - callback to notify about mm destruction
   *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
   *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
   */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
  {
-   struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+   struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
  
  	INIT_WORK(&amn->work, amdgpu_mn_destroy);

schedule_work(&amn->work);
  }
  
-

  /**
   * amdgpu_mn_lock - take the write side lock for this notifier
   *
@@ -355,12 +359,10 @@ static void amdgpu_mn_invalidate_range_end(struct 
mmu_notifier *mn,
  
  static const struct mmu_notifier_ops amdgpu_mn_ops[] = {

[AMDGPU_MN_TYPE_GFX] = {
-   .release = amdgpu_mn_release,
.invalidate_range_start = amdgpu_mn_invalidate_range_start_gfx,
.invalidate_range_end = amdgpu_mn_invalidate_range_end,
},
[AMDGPU_MN_TYPE_HSA] = {
-   .release = amdgpu_mn_release,
.invalidate_range_start = amdgpu_mn_invalidate_range_start_hsa,
.invalidate_range_end = amdgpu_mn_invalidate_range_end,
},
@@ -373,12 +375,63 @@ static const struct mmu_notifier_ops amdgpu_mn_ops[] = {
  #define AMDGPU_MN_KEY(mm, type) ((unsigned long)(mm) + (type))
  

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák
GDS is a temporary memory. Its purpose depends on the job, but most of
the time, the idea is:
- beginning of IB
- initialize GDS variables
- dispatch compute that works with GDS variables
- when done, copy GDS variables to memory
- repeat ...
- end of IB

GDS is like a pool of global shader GPRs.

GDS is too small for persistent data.

Marek

On Thu, Sep 13, 2018 at 1:26 PM, Christian König
 wrote:
> Are you sure of that? I mean it is rather pointless to have a Global Data
> Share when it can't be used to share anything?
>
> On the other hand I'm not opposed to get rid of all that stuff if we really
> don't need it.
>
> Christian.
>
> Am 13.09.2018 um 17:27 schrieb Marek Olšák:
>>
>> That's OK. We don't need IBs to get the same VMID.
>>
>> Marek
>>
>> On Thu, Sep 13, 2018 at 4:40 AM, Christian König
>>  wrote:
>>>
>>> As discussed internally that doesn't work because threads don't necessary
>>> get the same VMID assigned.
>>>
>>> Christian.
>>>
>>> Am 12.09.2018 um 22:33 schrieb Marek Olšák:

 From: Marek Olšák 

 I've chosen to do it like this because it's easy and allows an arbitrary
 number of processes.

 Signed-off-by: Marek Olšák 
 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125
 +---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
include/uapi/drm/amdgpu_drm.h   |  15 +--
19 files changed, 109 insertions(+), 519 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
 index b80243d3972e..7264a4930b88 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
 @@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
 *adev,
 struct drm_file *filp,
  / sizeof(struct amdgpu_bo_list_entry))
  return -EINVAL;
  size = sizeof(struct amdgpu_bo_list);
  size += num_entries * sizeof(struct amdgpu_bo_list_entry);
  list = kvmalloc(size, GFP_KERNEL);
  if (!list)
  return -ENOMEM;
  kref_init(&list->refcount);
 -   list->gds_obj = adev->gds.gds_gfx_bo;
 -   list->gws_obj = adev->gds.gws_gfx_bo;
 -   list->oa_obj = adev->gds.oa_gfx_bo;
  array = amdgpu_bo_list_array_entry(list, 0);
  memset(array, 0, num_entries * sizeof(struct
 amdgpu_bo_list_entry));
  for (i = 0; i < num_entries; ++i) {
  struct amdgpu_bo_list_entry *entry;
  struct drm_gem_object *gobj;
  struct amdgpu_bo *bo;
  struct mm_struct *usermm;
@@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
 *adev, struct drm_file *filp,
  } else {
  entry = &array[last_entry++];
  }
  entry->robj = bo;
  entry->priority = min(info[i].bo_priority,
AMDGPU_BO_LIST_MAX_PRIORITY);
  entry->tv.bo = &entry->robj->tbo;
  entry->tv.shared = !entry->robj->prime_shared_count;
- if (entry->robj->preferred_domains ==
 AMDGPU_GEM_DOMAIN_GDS)
 -   list->gds_obj = entry->robj;
 -   if (entry->robj->preferred_domains ==
 AMDGPU_GEM_DOMAIN_GWS)
 -   list->gws_obj = entry->robj;
 -   if (entry->robj->preferred_domains ==
 AMDGPU_GEM_DOMAIN_OA)
 -   list->oa_obj = entry->robj;
 -
  total_size += amdgpu_bo_size(entry->robj);
  trace_amdgpu_bo_list_set(list, entry->robj);
  }
  list->first_userptr = first_userptr;

Re: [PATCH 1/2] drm/amdgpu: remove amdgpu_bo_list_entry.robj

2018-09-13 Thread Felix Kuehling
On 2018-09-13 01:50 PM, Christian König wrote:
> Am 12.09.2018 um 22:21 schrieb Felix Kuehling:
>> On 2018-09-12 04:55 AM, Christian König wrote:
>>> We can get that just by casting tv.bo.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 42
>>> -
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 -
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 58
>>> -
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  3 +-
>>>   4 files changed, 58 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> index b80243d3972e..14d2982a47cc 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>> @@ -49,8 +49,11 @@ static void amdgpu_bo_list_free(struct kref *ref)
>>>  refcount);
>>>   struct amdgpu_bo_list_entry *e;
>>>   -    amdgpu_bo_list_for_each_entry(e, list)
>>> -    amdgpu_bo_unref(&e->robj);
>>> +    amdgpu_bo_list_for_each_entry(e, list) {
>>> +    struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>>> +
>>> +    amdgpu_bo_unref(&bo);
>>> +    }
>>>     call_rcu(&list->rhead, amdgpu_bo_list_free_rcu);
>>>   }
>>> @@ -112,21 +115,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
>>> *adev, struct drm_file *filp,
>>>   entry = &array[last_entry++];
>>>   }
>>>   -    entry->robj = bo;
>>>   entry->priority = min(info[i].bo_priority,
>>>     AMDGPU_BO_LIST_MAX_PRIORITY);
>>> -    entry->tv.bo = &entry->robj->tbo;
>>> -    entry->tv.shared = !entry->robj->prime_shared_count;
>>> -
>>> -    if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
>>> -    list->gds_obj = entry->robj;
>>> -    if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
>>> -    list->gws_obj = entry->robj;
>>> -    if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
>>> -    list->oa_obj = entry->robj;
>>> -
>>> -    total_size += amdgpu_bo_size(entry->robj);
>>> -    trace_amdgpu_bo_list_set(list, entry->robj);
>>> +    entry->tv.bo = &bo->tbo;
>>> +    entry->tv.shared = !bo->prime_shared_count;
>> You're no longer initializing entry->priority here. Is that intentional?
>
> Hui? Please take another look, the initialization of entry->priority
> is not touched by this patch.

Sorry, I misread it. I saw a "-" where there was none between all the
other lines being removed/replaced.

Reviewed-by: Felix Kuehling 

>
> Christian.
>
>
>>
>> Regards,
>>    Felix
>>
>>> +
>>> +    if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
>>> +    list->gds_obj = bo;
>>> +    if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
>>> +    list->gws_obj = bo;
>>> +    if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
>>> +    list->oa_obj = bo;
>>> +
>>> +    total_size += amdgpu_bo_size(bo);
>>> +    trace_amdgpu_bo_list_set(list, bo);
>>>   }
>>>     list->first_userptr = first_userptr;
>>> @@ -138,8 +140,11 @@ int amdgpu_bo_list_create(struct amdgpu_device
>>> *adev, struct drm_file *filp,
>>>   return 0;
>>>     error_free:
>>> -    while (i--)
>>> -    amdgpu_bo_unref(&array[i].robj);
>>> +    while (i--) {
>>> +    struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
>>> +
>>> +    amdgpu_bo_unref(&bo);
>>> +    }
>>>   kvfree(list);
>>>   return r;
>>>   @@ -191,9 +196,10 @@ void amdgpu_bo_list_get_list(struct
>>> amdgpu_bo_list *list,
>>>    * with the same priority, i.e. it must be stable.
>>>    */
>>>   amdgpu_bo_list_for_each_entry(e, list) {
>>> +    struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>>>   unsigned priority = e->priority;
>>>   -    if (!e->robj->parent)
>>> +    if (!bo->parent)
>>>   list_add_tail(&e->tv.head, &bucket[priority]);
>>>     e->user_pages = NULL;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>>> index 61b089768e1c..7c5f5d1601e6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>>> @@ -32,7 +32,6 @@ struct amdgpu_bo_va;
>>>   struct amdgpu_fpriv;
>>>     struct amdgpu_bo_list_entry {
>>> -    struct amdgpu_bo    *robj;
>>>   struct ttm_validate_buffer    tv;
>>>   struct amdgpu_bo_va    *bo_va;
>>>   uint32_t    priority;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index c5cc648a1b4e..2e488c6f9562 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -39,6 +39,7 @@ static int amdgpu_cs_user_fence_chunk(struct
>>> amdgpu_cs_parser *p,
>>>     uint32_t *offset

[PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v3

2018-09-13 Thread Philip Yang
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream. See
http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD intranet)

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig |  6 +--
 drivers/gpu/drm/amd/amdgpu/Makefile|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 88 +++---
 3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
 config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
-   select MMU_NOTIFIER
+   select HMM_MIRROR
help
- This option selects CONFIG_MMU_NOTIFIER if it isn't already
- selected to enabled full userptr support.
+ This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+ isn't already selected to enabled full userptr support.
 
 config DRM_AMDGPU_GART_DEBUGFS
bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
 amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
 amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
 amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
 
 include $(FULL_AMD_PATH)/powerplay/Makefile
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index e55508b..ea8671f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -66,6 +67,7 @@
  * @objects: interval tree containing amdgpu_mn_nodes
  * @read_lock: mutex for recursive locking of @lock
  * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
  *
  * Data for each amdgpu device and process address space.
  */
@@ -87,6 +89,9 @@ struct amdgpu_mn {
struct rb_root_cached   objects;
struct mutexread_lock;
atomic_trecursion;
+
+   /* HMM mirror */
+   struct hmm_mirror   mirror;
 };
 
 /**
@@ -103,7 +108,7 @@ struct amdgpu_mn_node {
 };
 
 /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
  *
  * @work: previously sheduled work item
  *
@@ -129,28 +134,27 @@ static void amdgpu_mn_destroy(struct work_struct *work)
}
up_write(&amn->lock);
mutex_unlock(&adev->mn_lock);
-   mmu_notifier_unregister_no_release(&amn->mn, amn->mm);
+
+   hmm_mirror_unregister(&amn->mirror);
+
kfree(amn);
 }
 
 /**
  * amdgpu_mn_release - callback to notify about mm destruction
  *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
  *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
  */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
 {
-   struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+   struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
 
INIT_WORK(&amn->work, amdgpu_mn_destroy);
schedule_work(&amn->work);
 }
 
-
 /**
  * amdgpu_mn_lock - take the write side lock for this notifier
  *
@@ -355,12 +359,10 @@ static void amdgpu_mn_invalidate_range_end(struct 
mmu_notifier *mn,
 
 static const struct mmu_notifier_ops amdgpu_mn_ops[] = {
[AMDGPU_MN_TYPE_GFX] = {
-   .release = amdgpu_mn_release,
.invalidate_range_start = amdgpu_mn_invalidate_range_start_gfx,
.invalidate_range_end = amdgpu_mn_invalidate_range_end,
},
[AMDGPU_MN_TYPE_HSA] = {
-   .release = amdgpu_mn_release,
.invalidate_range_start = amdgpu_mn_invalidate_range_start_hsa,
.invalidate_range_end = amdgpu_mn_invalidate_range_end,
},
@@ -373,12 +375,63 @@ static const struct mmu_notifier_ops amdgpu_mn_ops[] = {
 #define AMDGPU_MN_KEY(mm, type) ((unsigned long)(mm) + (type))
 
 /**
- * amdgpu_mn_get - create notifier context
+ * amdgpu_hmm_sync_cpu_device_pagetables - synchronize CPU/GPU page tables
+ *
+ * @mirror: the hmm_m

Re: [PATCH 1/2] drm/amdgpu: remove amdgpu_bo_list_entry.robj

2018-09-13 Thread Christian König

Am 12.09.2018 um 22:21 schrieb Felix Kuehling:

On 2018-09-12 04:55 AM, Christian König wrote:

We can get that just by casting tv.bo.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 42 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 58 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  3 +-
  4 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b80243d3972e..14d2982a47cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -49,8 +49,11 @@ static void amdgpu_bo_list_free(struct kref *ref)
   refcount);
struct amdgpu_bo_list_entry *e;
  
-	amdgpu_bo_list_for_each_entry(e, list)

-   amdgpu_bo_unref(&e->robj);
+   amdgpu_bo_list_for_each_entry(e, list) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+
+   amdgpu_bo_unref(&bo);
+   }
  
  	call_rcu(&list->rhead, amdgpu_bo_list_free_rcu);

  }
@@ -112,21 +115,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
entry = &array[last_entry++];
}
  
-		entry->robj = bo;

entry->priority = min(info[i].bo_priority,
  AMDGPU_BO_LIST_MAX_PRIORITY);
-   entry->tv.bo = &entry->robj->tbo;
-   entry->tv.shared = !entry->robj->prime_shared_count;
-
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
-   list->gds_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
-   list->gws_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
-   list->oa_obj = entry->robj;
-
-   total_size += amdgpu_bo_size(entry->robj);
-   trace_amdgpu_bo_list_set(list, entry->robj);
+   entry->tv.bo = &bo->tbo;
+   entry->tv.shared = !bo->prime_shared_count;

You're no longer initializing entry->priority here. Is that intentional?


Hui? Please take another look, the initialization of entry->priority is 
not touched by this patch.


Christian.




Regards,
   Felix


+
+   if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
+   list->gds_obj = bo;
+   if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
+   list->gws_obj = bo;
+   if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
+   list->oa_obj = bo;
+
+   total_size += amdgpu_bo_size(bo);
+   trace_amdgpu_bo_list_set(list, bo);
}
  
  	list->first_userptr = first_userptr;

@@ -138,8 +140,11 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
return 0;
  
  error_free:

-   while (i--)
-   amdgpu_bo_unref(&array[i].robj);
+   while (i--) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
+
+   amdgpu_bo_unref(&bo);
+   }
kvfree(list);
return r;
  
@@ -191,9 +196,10 @@ void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,

 * with the same priority, i.e. it must be stable.
 */
amdgpu_bo_list_for_each_entry(e, list) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
unsigned priority = e->priority;
  
-		if (!e->robj->parent)

+   if (!bo->parent)
list_add_tail(&e->tv.head, &bucket[priority]);
  
  		e->user_pages = NULL;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index 61b089768e1c..7c5f5d1601e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -32,7 +32,6 @@ struct amdgpu_bo_va;
  struct amdgpu_fpriv;
  
  struct amdgpu_bo_list_entry {

-   struct amdgpu_bo*robj;
struct ttm_validate_buffer  tv;
struct amdgpu_bo_va *bo_va;
uint32_tpriority;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c5cc648a1b4e..2e488c6f9562 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -39,6 +39,7 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser 
*p,
  uint32_t *offset)
  {
struct drm_gem_object *gobj;
+   struct amdgpu_bo *bo;
unsigned long size;
int r;
  
@@ -46,21 +47,21 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,

if (gobj == NULL)
return

Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-13 Thread Christian König

Am 13.09.2018 um 16:25 schrieb Alex Deucher:

On Wed, Sep 12, 2018 at 3:25 PM Christian König
 wrote:

While cutting the lists we sometimes accidentally added a list_head from
the stack to the LRUs, effectively corrupting the list.

Remove the list cutting and use explicit list manipulation instead.

Signed-off-by: Christian König 

Can you apply this patch first and then do the clean up as a later
series or temporarily disable the feature?  I want to send out a -next
pull this week and I don't want to leave this broken.


Yeah, that should work as well.

Christian.



Alex


---
  drivers/gpu/drm/ttm/ttm_bo.c | 51 ++--
  1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 138c98902033..b2a33bf1ef10 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
  }
  EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);

-static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
-   struct list_head *lru, bool is_swap)
+static void ttm_list_move_bulk_tail(struct list_head *list,
+   struct list_head *first,
+   struct list_head *last)
  {
-   struct list_head *list;
-   LIST_HEAD(entries);
-   LIST_HEAD(before);
+   first->prev->next = last->next;
+   last->next->prev = first->prev;

-   reservation_object_assert_held(pos->last->resv);
-   list = is_swap ? &pos->last->swap : &pos->last->lru;
-   list_cut_position(&entries, lru, list);
+   list->prev->next = first;
+   first->prev = list->prev;

-   reservation_object_assert_held(pos->first->resv);
-   list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
-   list_cut_position(&before, &entries, list);
-
-   list_splice(&before, lru);
-   list_splice_tail(&entries, lru);
+   last->next = list;
+   list->prev = last;
  }

  void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
@@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
*bulk)
 unsigned i;

 for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
 struct ttm_mem_type_manager *man;

-   if (!bulk->tt[i].first)
+   if (!pos->first)
 continue;

-   man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
-   ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
+   man = &pos->first->bdev->man[TTM_PL_TT];
+   ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
+   &pos->last->lru);
 }

 for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
 struct ttm_mem_type_manager *man;

-   if (!bulk->vram[i].first)
+   if (!pos->first)
 continue;

-   man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];
-   ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i], false);
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
+   man = &pos->first->bdev->man[TTM_PL_VRAM];
+   ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
+   &pos->last->lru);
 }

 for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
@@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
*bulk)
 if (!pos->first)
 continue;

+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
 lru = &pos->first->bdev->glob->swap_lru[i];
-   ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
+   ttm_list_move_bulk_tail(lru, &pos->first->swap,
+   &pos->last->swap);
 }
  }
  EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Christian König
Are you sure of that? I mean it is rather pointless to have a Global 
Data Share when it can't be used to share anything?


On the other hand I'm not opposed to get rid of all that stuff if we 
really don't need it.


Christian.

Am 13.09.2018 um 17:27 schrieb Marek Olšák:

That's OK. We don't need IBs to get the same VMID.

Marek

On Thu, Sep 13, 2018 at 4:40 AM, Christian König
 wrote:

As discussed internally that doesn't work because threads don't necessary
get the same VMID assigned.

Christian.

Am 12.09.2018 um 22:33 schrieb Marek Olšák:

From: Marek Olšák 

I've chosen to do it like this because it's easy and allows an arbitrary
number of processes.

Signed-off-by: Marek Olšák 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
   drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125 +---
   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
   include/uapi/drm/amdgpu_drm.h   |  15 +--
   19 files changed, 109 insertions(+), 519 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b80243d3972e..7264a4930b88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev,
struct drm_file *filp,
 / sizeof(struct amdgpu_bo_list_entry))
 return -EINVAL;
 size = sizeof(struct amdgpu_bo_list);
 size += num_entries * sizeof(struct amdgpu_bo_list_entry);
 list = kvmalloc(size, GFP_KERNEL);
 if (!list)
 return -ENOMEM;
 kref_init(&list->refcount);
-   list->gds_obj = adev->gds.gds_gfx_bo;
-   list->gws_obj = adev->gds.gws_gfx_bo;
-   list->oa_obj = adev->gds.oa_gfx_bo;
 array = amdgpu_bo_list_array_entry(list, 0);
 memset(array, 0, num_entries * sizeof(struct
amdgpu_bo_list_entry));
 for (i = 0; i < num_entries; ++i) {
 struct amdgpu_bo_list_entry *entry;
 struct drm_gem_object *gobj;
 struct amdgpu_bo *bo;
 struct mm_struct *usermm;
   @@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
*adev, struct drm_file *filp,
 } else {
 entry = &array[last_entry++];
 }
 entry->robj = bo;
 entry->priority = min(info[i].bo_priority,
   AMDGPU_BO_LIST_MAX_PRIORITY);
 entry->tv.bo = &entry->robj->tbo;
 entry->tv.shared = !entry->robj->prime_shared_count;
   - if (entry->robj->preferred_domains ==
AMDGPU_GEM_DOMAIN_GDS)
-   list->gds_obj = entry->robj;
-   if (entry->robj->preferred_domains ==
AMDGPU_GEM_DOMAIN_GWS)
-   list->gws_obj = entry->robj;
-   if (entry->robj->preferred_domains ==
AMDGPU_GEM_DOMAIN_OA)
-   list->oa_obj = entry->robj;
-
 total_size += amdgpu_bo_size(entry->robj);
 trace_amdgpu_bo_list_set(list, entry->robj);
 }
 list->first_userptr = first_userptr;
 list->num_entries = num_entries;
 trace_amdgpu_cs_bo_status(list->num_entries, total_size);
 *result = list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index 61b089768e1c..30f12a60aa28 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -36,23 +36,20 @@ struct amdgpu_bo_list_entry {
 struct ttm_validate_buffer  tv;
 struct amdgpu_bo_va *bo_va;
 uint32_tpriority;
 struct page **user_pages;
 int user_invalidated;
   };
 struct amdgpu_bo_list {
 struct rcu_head rhead;
 struct kref refcount;
-   struct amdgpu_bo *gds_obj;
-   struct amdgpu_bo *gws_obj;
-   str

[PATCH] drm/amdgpu/display: return proper error codes in dm

2018-09-13 Thread Alex Deucher
Replace -1 with proper error codes.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index eccae63d3ef1..541f33749961 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -493,7 +493,7 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
 error:
amdgpu_dm_fini(adev);
 
-   return -1;
+   return -EINVAL;
 }
 
 static void amdgpu_dm_fini(struct amdgpu_device *adev)
@@ -548,7 +548,7 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
break;
default:
DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
-   return -1;
+   return -EINVAL;
}
 
if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
@@ -1537,7 +1537,7 @@ static int amdgpu_dm_initialize_drm_device(struct 
amdgpu_device *adev)
link_cnt = dm->dc->caps.max_links;
if (amdgpu_dm_mode_config_init(dm->adev)) {
DRM_ERROR("DM: Failed to initialize mode config\n");
-   return -1;
+   return -EINVAL;
}
 
/* Identify the number of planes to be initialized */
@@ -1652,7 +1652,7 @@ static int amdgpu_dm_initialize_drm_device(struct 
amdgpu_device *adev)
kfree(aconnector);
for (i = 0; i < dm->dc->caps.max_planes; i++)
kfree(mode_info->planes[i]);
-   return -1;
+   return -EINVAL;
 }
 
 static void amdgpu_dm_destroy_drm_device(struct amdgpu_display_manager *dm)
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-13 Thread Michel Dänzer

[ Moving to dri-devel, where TTM patches are reviewed ]

On 2018-09-12 9:23 p.m., Christian König wrote:
> While cutting the lists we sometimes accidentally added a list_head from
> the stack to the LRUs, effectively corrupting the list.
> 
> Remove the list cutting and use explicit list manipulation instead.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 51 
> ++--
>  1 file changed, 30 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 138c98902033..b2a33bf1ef10 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object 
> *bo,
>  }
>  EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
>  
> -static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
> - struct list_head *lru, bool is_swap)
> +static void ttm_list_move_bulk_tail(struct list_head *list,
> + struct list_head *first,
> + struct list_head *last)
>  {
> - struct list_head *list;
> - LIST_HEAD(entries);
> - LIST_HEAD(before);
> + first->prev->next = last->next;
> + last->next->prev = first->prev;
>  
> - reservation_object_assert_held(pos->last->resv);
> - list = is_swap ? &pos->last->swap : &pos->last->lru;
> - list_cut_position(&entries, lru, list);
> + list->prev->next = first;
> + first->prev = list->prev;
>  
> - reservation_object_assert_held(pos->first->resv);
> - list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
> - list_cut_position(&before, &entries, list);
> -
> - list_splice(&before, lru);
> - list_splice_tail(&entries, lru);
> + last->next = list;
> + list->prev = last;
>  }
>  
>  void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
> @@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
> *bulk)
>   unsigned i;
>  
>   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> + struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
>   struct ttm_mem_type_manager *man;
>  
> - if (!bulk->tt[i].first)
> + if (!pos->first)
>   continue;
>  
> - man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> - ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
> + reservation_object_assert_held(pos->first->resv);
> + reservation_object_assert_held(pos->last->resv);
> +
> + man = &pos->first->bdev->man[TTM_PL_TT];
> + ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
> + &pos->last->lru);
>   }
>  
>   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> + struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
>   struct ttm_mem_type_manager *man;
>  
> - if (!bulk->vram[i].first)
> + if (!pos->first)
>   continue;
>  
> - man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];
> - ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i], false);
> + reservation_object_assert_held(pos->first->resv);
> + reservation_object_assert_held(pos->last->resv);
> +
> + man = &pos->first->bdev->man[TTM_PL_VRAM];
> + ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
> + &pos->last->lru);
>   }
>  
>   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> @@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
> *bulk)
>   if (!pos->first)
>   continue;
>  
> + reservation_object_assert_held(pos->first->resv);
> + reservation_object_assert_held(pos->last->resv);
> +
>   lru = &pos->first->bdev->glob->swap_lru[i];
> - ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
> + ttm_list_move_bulk_tail(lru, &pos->first->swap,
> + &pos->last->swap);
>   }
>  }
>  EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
> 

Seems like more code could be kept in / moved into the shared helper
function. That would keep the ttm_bo_bulk_move_lru_tail code leaner, and
might make the helper function changes easier to review as well.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/6] drm/amdgpu/sriov: Correct the setting about sdma doorbell offset of Vega10

2018-09-13 Thread Alex Deucher
On Wed, Sep 12, 2018 at 10:06 PM Felix Kuehling  wrote:
>
> On 2018-09-12 09:55 PM, Alex Deucher wrote:
> > On Wed, Sep 12, 2018 at 9:45 PM Felix Kuehling  
> > wrote:
> >> From: Emily Deng 
> >>
> >> Correct the format
> >>
> >> For vega10 sriov, the sdma doorbell must be fixed as follow to keep the
> >> same setting with host driver, or it will happen conflicts.
> >>
> >> Signed-off-by: Emily Deng 
> >> Acked-by: Alex Deucher 
> >> Signed-off-by: Felix Kuehling 
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  9 +
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 27 
> >> +++
> >>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 12 +---
> >>  3 files changed, 37 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index afa9e77..e60de88 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -420,6 +420,15 @@ typedef enum _AMDGPU_DOORBELL64_ASSIGNMENT
> >> AMDGPU_DOORBELL64_sDMA_ENGINE1= 0xE8,
> >> AMDGPU_DOORBELL64_sDMA_HI_PRI_ENGINE1 = 0xE9,
> >>
> >> +   /* For vega10 sriov, the sdma doorbell must be fixed as follow
> >> +* to keep the same setting with host driver, or it will
> >> +* happen conflicts
> >> +*/
> >> +   AMDGPU_VEGA10_DOORBELL64_sDMA_ENGINE0= 0xF0,
> >> +   AMDGPU_VEGA10_DOORBELL64_sDMA_HI_PRI_ENGINE0 = 0xF1,
> >> +   AMDGPU_VEGA10_DOORBELL64_sDMA_ENGINE1= 0xF2,
> >> +   AMDGPU_VEGA10_DOORBELL64_sDMA_HI_PRI_ENGINE1 = 0xF3,
> >> +
> >> /* Interrupt handler */
> >> AMDGPU_DOORBELL64_IH  = 0xF4,  /* For legacy 
> >> interrupt ring buffer */
> >> AMDGPU_DOORBELL64_IH_RING1= 0xF5,  /* For page 
> >> migration request log */
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >> index bf0b012..7a165a9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >> @@ -181,14 +181,25 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device 
> >> *adev)
> >>  * process in case of 64-bit doorbells so we
> >>  * can use each doorbell assignment twice.
> >>  */
> >> -   gpu_resources.sdma_doorbell[0][i] =
> >> -   AMDGPU_DOORBELL64_sDMA_ENGINE0 + (i >> 1);
> >> -   gpu_resources.sdma_doorbell[0][i+1] =
> >> -   AMDGPU_DOORBELL64_sDMA_ENGINE0 + 0x200 + 
> >> (i >> 1);
> >> -   gpu_resources.sdma_doorbell[1][i] =
> >> -   AMDGPU_DOORBELL64_sDMA_ENGINE1 + (i >> 1);
> >> -   gpu_resources.sdma_doorbell[1][i+1] =
> >> -   AMDGPU_DOORBELL64_sDMA_ENGINE1 + 0x200 + 
> >> (i >> 1);
> >> +   if (adev->asic_type == CHIP_VEGA10) {
> >> +   gpu_resources.sdma_doorbell[0][i] =
> >> +   
> >> AMDGPU_VEGA10_DOORBELL64_sDMA_ENGINE0 + (i >> 1);
> >> +   gpu_resources.sdma_doorbell[0][i+1] =
> >> +   
> >> AMDGPU_VEGA10_DOORBELL64_sDMA_ENGINE0 + 0x200 + (i >> 1);
> >> +   gpu_resources.sdma_doorbell[1][i] =
> >> +   
> >> AMDGPU_VEGA10_DOORBELL64_sDMA_ENGINE1 + (i >> 1);
> >> +   gpu_resources.sdma_doorbell[1][i+1] =
> >> +   
> >> AMDGPU_VEGA10_DOORBELL64_sDMA_ENGINE1 + 0x200 + (i >> 1);
> >> +   } else {
> >> +   gpu_resources.sdma_doorbell[0][i] =
> >> +   AMDGPU_DOORBELL64_sDMA_ENGINE0 + 
> >> (i >> 1);
> >> +   gpu_resources.sdma_doorbell[0][i+1] =
> >> +   AMDGPU_DOORBELL64_sDMA_ENGINE0 + 
> >> 0x200 + (i >> 1);
> >> +   gpu_resources.sdma_doorbell[1][i] =
> >> +   AMDGPU_DOORBELL64_sDMA_ENGINE1 + 
> >> (i >> 1);
> >> +   gpu_resources.sdma_doorbell[1][i+1] =
> >> +   AMDGPU_DOORBELL64_sDMA_ENGINE1 + 
> >> 0x200 + (i >> 1);
> >> +   }
> > It would probably make more sense to reverse the conditions here so we
> > retain the old behavior for all previous non-vega20 asics rather than
> > just vega10.  E.g.,
> >
> > if (vega20) {
> > // use new doorbell mapping
> > } else {
> > // use the old doorbell mapping
> > }
> >
> > Same thing below.
>
> This code is only applicable to GFXv9 and later GPUs with 64-bit
> doorbells. It does n

Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier (v2)

2018-09-13 Thread Philip Yang

On 2018-09-13 04:45 AM, Christian König wrote:

Am 13.09.2018 um 00:02 schrieb Philip Yang:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream. See
http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD 
intranet)


Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig  |  6 +--
  drivers/gpu/drm/amd/amdgpu/Makefile |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 78 
+

  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.h | 41 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 50 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h  |  7 +++


Please either fully merge amdgpu_mn.* into amdgpu_hmm.* or the other 
way around.


Agree, I will merge hmm changes and future hmm changes into amdgpu_mn.*. 
This will make

things simpler.
Additional to that you should add documentation to all newly added 
functions.



I will submit new version v3 for review.

Regards,
Philip

Regards,
Christian.


  6 files changed, 178 insertions(+), 5 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
  bool "Always enable userptr write support"
  depends on DRM_AMDGPU
-    select MMU_NOTIFIER
+    select HMM_MIRROR
  help
-  This option selects CONFIG_MMU_NOTIFIER if it isn't already
-  selected to enabled full userptr support.
+  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+  isn't already selected to enabled full userptr support.
    config DRM_AMDGPU_GART_DEBUGFS
  bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 138cb78..ee691e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -172,6 +172,7 @@ amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
  amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_hmm.o
    include $(FULL_AMD_PATH)/powerplay/Makefile
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

new file mode 100644
index 000..6c506f6
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the

+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
USE OR

+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "amdgpu.h"
+#include "amdgpu_mn.h"
+
+static void amdgpu_hmm_release(struct hmm_mirror *mirror)
+{
+    pr_debug("mirror=%p\n", mirror);
+    amdgpu_hmm_mn_release(mirror);
+}
+
+static int amdgpu_hmm_sync_cpu_device_pagetables(struct hmm_mirror 
*mirror,

+    const struct hmm_update *update)
+{
+    struct hmm *hmm;
+    struct mm_struct *mm;
+    unsigned long start;
+    unsigned long end;
+
+    start = update->start;
+    end = update->end;
+
+    pr_debug("mirror %p start %lx end %lx\n", mirror, start, end);
+
+    hmm = mirror->hmm;
+    mm = *(struct mm_struct **)hmm;
+
+    return amdgpu_mn_invalidate_range(mirror, mm, start, end,
+    update->blockable);
+}
+
+static struct hmm_mirror_ops amdgpu_hmm_mirror_ops = {
+    .sync_cpu_device_pagetables = 
amdgpu_hmm_sync_cpu_device_pagetab

Re: [PATCH 2/2] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail v2

2018-09-13 Thread Mike Lothian
Hi

This fixes a boot issue I had on Raven (
https://bugs.freedesktop.org/show_bug.cgi?id=107922)

Feel free to add to both patches:

Tested-by: Mike Lothian 

Cheers

Mike

On Thu, 13 Sep 2018 at 12:22 Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> While cutting the lists we sometimes accidentally added a list_head from
> the stack to the LRUs, effectively corrupting the list.
>
> Remove the list cutting and use explicit list manipulation instead.
>
> v2: separate out new list_bulk_move_tail helper
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 46
> +++-
>  1 file changed, 20 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 138c98902033..26b889f86670 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -247,47 +247,38 @@ void ttm_bo_move_to_lru_tail(struct
> ttm_buffer_object *bo,
>  }
>  EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
>
> -static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
> -   struct list_head *lru, bool is_swap)
> -{
> -   struct list_head *list;
> -   LIST_HEAD(entries);
> -   LIST_HEAD(before);
> -
> -   reservation_object_assert_held(pos->last->resv);
> -   list = is_swap ? &pos->last->swap : &pos->last->lru;
> -   list_cut_position(&entries, lru, list);
> -
> -   reservation_object_assert_held(pos->first->resv);
> -   list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
> -   list_cut_position(&before, &entries, list);
> -
> -   list_splice(&before, lru);
> -   list_splice_tail(&entries, lru);
> -}
> -
>  void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
>  {
> unsigned i;
>
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> +   struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
> struct ttm_mem_type_manager *man;
>
> -   if (!bulk->tt[i].first)
> +   if (!pos->first)
> continue;
>
> -   man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> -   ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
> +   reservation_object_assert_held(pos->first->resv);
> +   reservation_object_assert_held(pos->last->resv);
> +
> +   man = &pos->first->bdev->man[TTM_PL_TT];
> +   list_bulk_move_tail(&man->lru[i], &pos->first->lru,
> +   &pos->last->lru);
> }
>
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> +   struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
> struct ttm_mem_type_manager *man;
>
> -   if (!bulk->vram[i].first)
> +   if (!pos->first)
> continue;
>
> -   man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];
> -   ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i],
> false);
> +   reservation_object_assert_held(pos->first->resv);
> +   reservation_object_assert_held(pos->last->resv);
> +
> +   man = &pos->first->bdev->man[TTM_PL_VRAM];
> +   list_bulk_move_tail(&man->lru[i], &pos->first->lru,
> +   &pos->last->lru);
> }
>
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> @@ -297,8 +288,11 @@ void ttm_bo_bulk_move_lru_tail(struct
> ttm_lru_bulk_move *bulk)
> if (!pos->first)
> continue;
>
> +   reservation_object_assert_held(pos->first->resv);
> +   reservation_object_assert_held(pos->last->resv);
> +
> lru = &pos->first->bdev->glob->swap_lru[i];
> -   ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
> +   list_bulk_move_tail(lru, &pos->first->swap,
> &pos->last->swap);
> }
>  }
>  EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
> --
> 2.14.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák
That's OK. We don't need IBs to get the same VMID.

Marek

On Thu, Sep 13, 2018 at 4:40 AM, Christian König
 wrote:
> As discussed internally that doesn't work because threads don't necessary
> get the same VMID assigned.
>
> Christian.
>
> Am 12.09.2018 um 22:33 schrieb Marek Olšák:
>>
>> From: Marek Olšák 
>>
>> I've chosen to do it like this because it's easy and allows an arbitrary
>> number of processes.
>>
>> Signed-off-by: Marek Olšák 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
>>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125 +---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
>>   include/uapi/drm/amdgpu_drm.h   |  15 +--
>>   19 files changed, 109 insertions(+), 519 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> index b80243d3972e..7264a4930b88 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> @@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev,
>> struct drm_file *filp,
>> / sizeof(struct amdgpu_bo_list_entry))
>> return -EINVAL;
>> size = sizeof(struct amdgpu_bo_list);
>> size += num_entries * sizeof(struct amdgpu_bo_list_entry);
>> list = kvmalloc(size, GFP_KERNEL);
>> if (!list)
>> return -ENOMEM;
>> kref_init(&list->refcount);
>> -   list->gds_obj = adev->gds.gds_gfx_bo;
>> -   list->gws_obj = adev->gds.gws_gfx_bo;
>> -   list->oa_obj = adev->gds.oa_gfx_bo;
>> array = amdgpu_bo_list_array_entry(list, 0);
>> memset(array, 0, num_entries * sizeof(struct
>> amdgpu_bo_list_entry));
>> for (i = 0; i < num_entries; ++i) {
>> struct amdgpu_bo_list_entry *entry;
>> struct drm_gem_object *gobj;
>> struct amdgpu_bo *bo;
>> struct mm_struct *usermm;
>>   @@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
>> *adev, struct drm_file *filp,
>> } else {
>> entry = &array[last_entry++];
>> }
>> entry->robj = bo;
>> entry->priority = min(info[i].bo_priority,
>>   AMDGPU_BO_LIST_MAX_PRIORITY);
>> entry->tv.bo = &entry->robj->tbo;
>> entry->tv.shared = !entry->robj->prime_shared_count;
>>   - if (entry->robj->preferred_domains ==
>> AMDGPU_GEM_DOMAIN_GDS)
>> -   list->gds_obj = entry->robj;
>> -   if (entry->robj->preferred_domains ==
>> AMDGPU_GEM_DOMAIN_GWS)
>> -   list->gws_obj = entry->robj;
>> -   if (entry->robj->preferred_domains ==
>> AMDGPU_GEM_DOMAIN_OA)
>> -   list->oa_obj = entry->robj;
>> -
>> total_size += amdgpu_bo_size(entry->robj);
>> trace_amdgpu_bo_list_set(list, entry->robj);
>> }
>> list->first_userptr = first_userptr;
>> list->num_entries = num_entries;
>> trace_amdgpu_cs_bo_status(list->num_entries, total_size);
>> *result = list;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> index 61b089768e1c..30f12a60aa28 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> @@ -36,23 +36,20 @@ struct amdgpu_bo_list_entry {
>> struct ttm_validate_buffer  tv;
>> struct amdgpu_bo_va *bo_va;
>> uint32_tpriority;
>> struct page **user_pages;
>> int user_invalidated;
>>   };
>> struct amdgpu_bo_list {
>> struct rcu_head rhead;
>> struct kref refcount;
>> -   struct amdgpu_bo *gds_obj;
>> -   struct amdgpu_bo *gws_obj;
>> -   struct amdgpu_bo *oa_obj;
>>  

Re: [PATCH] drm/amd/display: Fix pflip IRQ status after gpu reset.

2018-09-13 Thread Harry Wentland
On 2018-09-12 04:51 PM, Andrey Grodzovsky wrote:
> Problem:
> After GPU reset pflip completion IRQ is disabled and hence
> any subsequent mode set or plane update leads to hang.
> 
> Fix:
> Unless acrtc->otg_inst is initialized to -1 during display
> block initializtion then durng resume from GPU reset
> amdgpu_irq_gpu_reset_resume_helper will override CRTC 0 pflip
> IRQ value with whatever value was on every other unused CRTC because
> dm_irq_state will do irq_source = dal_irq_type + acrtc->otg_inst
> where acrtc->otg_inst will be 0 for every unused CRTC.
> 
> Signed-off-by: Andrey Grodzovsky 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 5103eba..75c4b80 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3351,6 +3351,7 @@ static int amdgpu_dm_crtc_init(struct 
> amdgpu_display_manager *dm,
>  
>   acrtc->crtc_id = crtc_index;
>   acrtc->base.enabled = false;
> + acrtc->otg_inst = -1;
>  
>   dm->adev->mode_info.crtcs[crtc_index] = acrtc;
>   drm_crtc_enable_color_mgmt(&acrtc->base, MAX_COLOR_LUT_ENTRIES,
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-13 Thread Alex Deucher
On Wed, Sep 12, 2018 at 3:25 PM Christian König
 wrote:
>
> While cutting the lists we sometimes accidentally added a list_head from
> the stack to the LRUs, effectively corrupting the list.
>
> Remove the list cutting and use explicit list manipulation instead.
>
> Signed-off-by: Christian König 

Can you apply this patch first and then do the clean up as a later
series or temporarily disable the feature?  I want to send out a -next
pull this week and I don't want to leave this broken.

Alex

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 51 
> ++--
>  1 file changed, 30 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 138c98902033..b2a33bf1ef10 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object 
> *bo,
>  }
>  EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
>
> -static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
> -   struct list_head *lru, bool is_swap)
> +static void ttm_list_move_bulk_tail(struct list_head *list,
> +   struct list_head *first,
> +   struct list_head *last)
>  {
> -   struct list_head *list;
> -   LIST_HEAD(entries);
> -   LIST_HEAD(before);
> +   first->prev->next = last->next;
> +   last->next->prev = first->prev;
>
> -   reservation_object_assert_held(pos->last->resv);
> -   list = is_swap ? &pos->last->swap : &pos->last->lru;
> -   list_cut_position(&entries, lru, list);
> +   list->prev->next = first;
> +   first->prev = list->prev;
>
> -   reservation_object_assert_held(pos->first->resv);
> -   list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
> -   list_cut_position(&before, &entries, list);
> -
> -   list_splice(&before, lru);
> -   list_splice_tail(&entries, lru);
> +   last->next = list;
> +   list->prev = last;
>  }
>
>  void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
> @@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
> *bulk)
> unsigned i;
>
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> +   struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
> struct ttm_mem_type_manager *man;
>
> -   if (!bulk->tt[i].first)
> +   if (!pos->first)
> continue;
>
> -   man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> -   ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
> +   reservation_object_assert_held(pos->first->resv);
> +   reservation_object_assert_held(pos->last->resv);
> +
> +   man = &pos->first->bdev->man[TTM_PL_TT];
> +   ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
> +   &pos->last->lru);
> }
>
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> +   struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
> struct ttm_mem_type_manager *man;
>
> -   if (!bulk->vram[i].first)
> +   if (!pos->first)
> continue;
>
> -   man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];
> -   ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i], false);
> +   reservation_object_assert_held(pos->first->resv);
> +   reservation_object_assert_held(pos->last->resv);
> +
> +   man = &pos->first->bdev->man[TTM_PL_VRAM];
> +   ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
> +   &pos->last->lru);
> }
>
> for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> @@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
> *bulk)
> if (!pos->first)
> continue;
>
> +   reservation_object_assert_held(pos->first->resv);
> +   reservation_object_assert_held(pos->last->resv);
> +
> lru = &pos->first->bdev->glob->swap_lru[i];
> -   ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
> +   ttm_list_move_bulk_tail(lru, &pos->first->swap,
> +   &pos->last->swap);
> }
>  }
>  EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu] Bail from drmmode_cm_init if there's no CRTC

2018-09-13 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Michel 
Dänzer 
Sent: Thursday, September 13, 2018 5:47:29 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH xf86-video-amdgpu] Bail from drmmode_cm_init if there's no CRTC

From: Michel Dänzer 

We would crash due to dereferencing the NULL mode_res->crtc pointer.

Bugzilla: https://bugs.freedesktop.org/107913
Signed-off-by: Michel Dänzer 
---
 src/drmmode_display.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/drmmode_display.c b/src/drmmode_display.c
index 6ef6a98e2..cbda8ad35 100644
--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -3198,6 +3198,9 @@ static void drmmode_cm_init(int drm_fd, drmmode_ptr 
drmmode,
 memset(drmmode->cm_prop_ids, 0, sizeof(drmmode->cm_prop_ids));
 drmmode->gamma_lut_size = drmmode->degamma_lut_size = 0;

+   if (!mode_res->crtcs)
+   return;
+
 /* AMD hardware has color management support on all pipes. It is
  * therefore sufficient to only check the first CRTC.
  */
--
2.19.0.rc2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] tests/amdgpu: add unaligned VM test

2018-09-13 Thread Christian König

Am 11.09.2018 um 04:06 schrieb Zhang, Jerry (Junwei):

On 09/10/2018 05:33 PM, Christian König wrote:

Am 10.09.2018 um 04:44 schrieb Zhang, Jerry (Junwei):

On 09/10/2018 02:04 AM, Christian König wrote:

Make a VM mapping which is as unaligned as possible.


Is it going to test unaligned address between BO allocation and BO 
mapping

and skip huge page mapping?


Yes and no.

Huge page handling works by mapping at least 2MB of continuous memory 
on a 2MB aligned address.


What I do here is I allocate 4GB of VRAM and try to map it to an 
address which is aligned to 1GB + 4KB.


In other words the VM subsystem will add a single PTE to align the 
entry to 8KB, then it add two PTEs to align it to 16KB, then four to 
get to 32KB and so on until we have the maximum alignment of 2GB

which Vega/Raven support in the L1.


Thanks to explain that.

From the trace log, it will map 1*4KB, 2*4KB, ..., 256*4KB, then back 
to 1*4KB.


 amdgpu_test-1384  [005]    110.634466: amdgpu_vm_bo_update: 
soffs=11, eoffs=1f, flags=70
 amdgpu_test-1384  [005]    110.634467: amdgpu_vm_set_ptes: 
pe=f5feffd008, addr=01fec0, incr=4096, flags=71, count=1
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd010, addr=01fec01000, incr=4096, flags=f1, count=2
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd020, addr=01fec03000, incr=4096, flags=171, count=4
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd040, addr=01fec07000, incr=4096, flags=1f1, count=8
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd080, addr=01fec0f000, incr=4096, flags=271, count=16
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd100, addr=01fec1f000, incr=4096, flags=2f1, count=32
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffd200, addr=01fec3f000, incr=4096, flags=371, count=64
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffd400, addr=01fec7f000, incr=4096, flags=3f1, count=128
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffd800, addr=01fecff000, incr=4096, flags=471, count=256
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffc000, addr=01fedff000, incr=4096, flags=71, count=1
 amdgpu_test-1384  [005]    110.634470: amdgpu_vm_set_ptes: 
pe=f5feffc008, addr=01fea0, incr=4096, flags=71, count=1
 amdgpu_test-1384  [005]    110.634470: amdgpu_vm_set_ptes: 
pe=f5feffc010, addr=01fea01000, incr=4096, flags=f1, count=2


Yes, that it is exactly the expected result with the old code.



And it sounds like a performance test for Vega and later.
If so, shall we add some time stamp in the log?


Well I used it as performance test, but the resulting numbers are not 
very comparable.


It is useful to push to libdrm because it also exercises the VM code and 
makes sure that the code doesn't crash on corner cases.


Regards,
Christian.



Regards,
Jerry



Regards,
Christian.





Signed-off-by: Christian König 
---
  tests/amdgpu/vm_tests.c | 45 
-

  1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/tests/amdgpu/vm_tests.c b/tests/amdgpu/vm_tests.c
index 7b6dc5d6..fada2987 100644
--- a/tests/amdgpu/vm_tests.c
+++ b/tests/amdgpu/vm_tests.c
@@ -31,8 +31,8 @@ static  amdgpu_device_handle device_handle;
  static  uint32_t  major_version;
  static  uint32_t  minor_version;

-
  static void amdgpu_vmid_reserve_test(void);
+static void amdgpu_vm_unaligned_map(void);

  CU_BOOL suite_vm_tests_enable(void)
  {
@@ -84,6 +84,7 @@ int suite_vm_tests_clean(void)

  CU_TestInfo vm_tests[] = {
  { "resere vmid test",  amdgpu_vmid_reserve_test },
+    { "unaligned map",  amdgpu_vm_unaligned_map },
  CU_TEST_INFO_NULL,
  };

@@ -167,3 +168,45 @@ static void amdgpu_vmid_reserve_test(void)
  r = amdgpu_cs_ctx_free(context_handle);
  CU_ASSERT_EQUAL(r, 0);
  }
+
+static void amdgpu_vm_unaligned_map(void)
+{
+    const uint64_t map_size = (4ULL << 30) - (2 << 12);
+    struct amdgpu_bo_alloc_request request = {};
+    amdgpu_bo_handle buf_handle;
+    amdgpu_va_handle handle;
+    uint64_t vmc_addr;
+    int r;
+
+    request.alloc_size = 4ULL << 30;
+    request.phys_alignment = 4096;
+    request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM;
+    request.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
+
+    r = amdgpu_bo_alloc(device_handle, &request, &buf_handle);
+    /* Don't let the test fail if the device doesn't have enough 
VRAM */


We may print some info to the console here.

Regards,
Jerry


+    if (r)
+    return;
+
+    r = amdgpu_va_range_alloc(device_handle, 
amdgpu_gpu_va_range_general,

+  4ULL << 30, 1ULL << 30, 0, &vmc_addr,
+  &handle, 0);
+    CU_ASSERT_EQUAL(r, 0);
+    if (r)
+    goto error_va_alloc;
+
+    vmc_addr += 1 << 12;
+
+    r = amdgpu_

[PATCH 5/5] drm/amd: rename ADOBE to OP

2018-09-13 Thread Hans Verkuil
From: Hans Verkuil 

The CTA-861 standard renamed ADOBE to OP, so do the same to remain
in sync with the standard.

Signed-off-by: Hans Verkuil 
Cc: amd-gfx@lists.freedesktop.org
---
 drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c   | 4 ++--
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 4 ++--
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h| 2 +-
 drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c | 2 +-
 drivers/gpu/drm/amd/display/dc/inc/hw/transform.h   | 4 ++--
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
index 83d121510ef5..c7709711d3c3 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
@@ -99,7 +99,7 @@ static bool is_rgb_type(
color_space == COLOR_SPACE_XR_RGB   ||
color_space == COLOR_SPACE_MSREF_SCRGB  ||
color_space == COLOR_SPACE_2020_RGB_FULLRANGE   ||
-   color_space == COLOR_SPACE_ADOBERGB ||
+   color_space == COLOR_SPACE_OPRGB||
color_space == COLOR_SPACE_DCIP3||
color_space == COLOR_SPACE_DOLBYVISION)
ret = true;
@@ -230,7 +230,7 @@ void color_space_to_black_color(
case COLOR_SPACE_XV_YCC_601:
case COLOR_SPACE_2020_RGB_FULLRANGE:
case COLOR_SPACE_2020_RGB_LIMITEDRANGE:
-   case COLOR_SPACE_ADOBERGB:
+   case COLOR_SPACE_OPRGB:
case COLOR_SPACE_DCIP3:
case COLOR_SPACE_DISPLAYNATIVE:
case COLOR_SPACE_DOLBYVISION:
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index ea6beccfd89d..2e454e905ee2 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -2151,8 +2151,8 @@ static void set_avi_info_frame(
color_space == COLOR_SPACE_2020_YCBCR) {
hdmi_info.bits.EC0_EC2 = COLORIMETRYEX_BT2020RGBYCBCR;
hdmi_info.bits.C0_C1   = COLORIMETRY_EXTENDED;
-   } else if (color_space == COLOR_SPACE_ADOBERGB) {
-   hdmi_info.bits.EC0_EC2 = COLORIMETRYEX_ADOBERGB;
+   } else if (color_space == COLOR_SPACE_OPRGB) {
+   hdmi_info.bits.EC0_EC2 = COLORIMETRYEX_OPRGB;
hdmi_info.bits.C0_C1   = COLORIMETRY_EXTENDED;
}
 
diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
index b789cb2b354b..ddaaf17a3bc6 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -524,7 +524,7 @@ enum dc_color_space {
COLOR_SPACE_2020_RGB_FULLRANGE,
COLOR_SPACE_2020_RGB_LIMITEDRANGE,
COLOR_SPACE_2020_YCBCR,
-   COLOR_SPACE_ADOBERGB,
+   COLOR_SPACE_OPRGB,
COLOR_SPACE_DCIP3,
COLOR_SPACE_DISPLAYNATIVE,
COLOR_SPACE_DOLBYVISION,
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c
index 91642e684858..d37d7a20ef54 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c
@@ -423,7 +423,7 @@ static void dce110_stream_encoder_dp_set_stream_attribute(
case COLOR_SPACE_2020_YCBCR:
case COLOR_SPACE_XR_RGB:
case COLOR_SPACE_MSREF_SCRGB:
-   case COLOR_SPACE_ADOBERGB:
+   case COLOR_SPACE_OPRGB:
case COLOR_SPACE_DCIP3:
case COLOR_SPACE_XV_YCC_709:
case COLOR_SPACE_XV_YCC_601:
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index cfcc54f2ce65..99282c6c91c3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -1729,7 +1729,7 @@ bool is_rgb_cspace(enum dc_color_space output_color_space)
case COLOR_SPACE_SRGB_LIMITED:
case COLOR_SPACE_2020_RGB_FULLRANGE:
case COLOR_SPACE_2020_RGB_LIMITEDRANGE:
-   case COLOR_SPACE_ADOBERGB:
+   case COLOR_SPACE_OPRGB:
return true;
case COLOR_SPACE_YCBCR601:
case COLOR_SPACE_YCBCR709:
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c
index 6f9078f3c4d3..17e5d287aca7 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c
@@ -394,7 +394,7 @@ 

Re: [PATCH 2/2] drm/amdgpu: use a single linked list for amdgpu_vm_bo_base

2018-09-13 Thread Christian König

Am 13.09.2018 um 00:58 schrieb Felix Kuehling:

Is the small reduction in memory footprint (8 bytes per page table on a
64-bit kernel) really worth the trouble of open-coding a single-linked
list implementation?


Well the key point is it is now a power of two again. So we don't waste 
28KB per page table (on 4 levels) any more because it is rounded up to 
the next order size :)



I guess this change makes a bigger difference for
2-level page tables than it does for 4-level, because the amdgpu_vm_pt
array is allocated at the page directory level and includes page tables
that don't even exist yet and may never exist. The amount of memory you
save is the same as the size of the page directory.

I wonder if the overhead could be reduced more effectively by allocating
struct amdgpu_vm_pt with the page table, rather than with the page
directory. Then the amdgpu_vm_pt.entries array would be an array of
pointers instead. It could be an array[0] at the end of the structure
since the number of entries is know then the page directory is
allocated. The BO could also be embedded in the amdgpu_vm_pt structure
so it doesn't need to be a separate allocation from the amdgpu_vm_pt.


Yeah, thought about that as well. But the change looked to invasive on 
first glance.



Acked-by: Felix Kuehling 


Thanks,
Christian.



Regards,
   Felix


On 2018-09-12 04:55 AM, Christian König wrote:

Instead of the double linked list. Gets the size of amdgpu_vm_pt down to
64 bytes again.

We could even reduce it down to 32 bytes, but that would require some
rather extreme hacks.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 38 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +-
  4 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index de990bdcdd6c..e6909252aefa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -448,7 +448,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
return -ENOMEM;
drm_gem_private_object_init(adev->ddev, &bo->gem_base, size);
INIT_LIST_HEAD(&bo->shadow_list);
-   INIT_LIST_HEAD(&bo->va);
+   bo->vm_bo = NULL;
bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
bp->domain;
bo->allowed_domains = bo->preferred_domains;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 907fdf46d895..64337ff2ad63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -89,8 +89,8 @@ struct amdgpu_bo {
void*metadata;
u32 metadata_size;
unsignedprime_shared_count;
-   /* list of all virtual address to which this bo is associated to */
-   struct list_headva;
+   /* per VM structure for page tables and with virtual addresses */
+   struct amdgpu_vm_bo_base*vm_bo;
/* Constant after initialization */
struct drm_gem_object   gem_base;
struct amdgpu_bo*parent;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index cb6a5114128e..fb6b16273c54 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -309,12 +309,13 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,
  {
base->vm = vm;
base->bo = bo;
-   INIT_LIST_HEAD(&base->bo_list);
+   base->next = NULL;
INIT_LIST_HEAD(&base->vm_status);
  
  	if (!bo)

return;
-   list_add_tail(&base->bo_list, &bo->va);
+   base->next = bo->vm_bo;
+   bo->vm_bo = base;
  
  	if (bo->tbo.resv != vm->root.base.bo->tbo.resv)

return;
@@ -352,7 +353,7 @@ static struct amdgpu_vm_pt *amdgpu_vm_pt_parent(struct 
amdgpu_vm_pt *pt)
if (!parent)
return NULL;
  
-	return list_first_entry(&parent->va, struct amdgpu_vm_pt, base.bo_list);

+   return container_of(parent->vm_bo, struct amdgpu_vm_pt, base);
  }
  
  /**

@@ -954,7 +955,7 @@ static void amdgpu_vm_free_pts(struct amdgpu_device *adev,
for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry) {
  
  		if (entry->base.bo) {

-   list_del(&entry->base.bo_list);
+   entry->base.bo->vm_bo = NULL;
list_del(&entry->base.vm_status);
amdgpu_bo_unref(&entry->base.bo->shadow);
amdgpu_bo_unref(&entry->base.bo);
@@ -1162,12 +1163,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job, bool need_
 

Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-13 Thread Christian König

Am 13.09.2018 um 10:31 schrieb Huang Rui:

On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote:

While cutting the lists we sometimes accidentally added a list_head from
the stack to the LRUs, effectively corrupting the list.

Remove the list cutting and use explicit list manipulation instead.

This patch actually fixes the corruption bug. Was it a defect of
list_cut_position or list_splice handlers?


We somehow did something illegal with list_cut_position. I haven't 
narrowed it down till the end, but we ended up with list_heads from the 
stack to the lru.


Anyway adding a specialized list bulk move function is much simpler and 
avoids the issue.


I've just split that up as Michel suggested and send it out to the 
mailing lists, please review that version once more.


Thanks,
Christian.



Reviewed-and-Tested: Huang Rui 


Signed-off-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 51 ++--
  1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 138c98902033..b2a33bf1ef10 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
  }
  EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
  
-static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,

-   struct list_head *lru, bool is_swap)
+static void ttm_list_move_bulk_tail(struct list_head *list,
+   struct list_head *first,
+   struct list_head *last)
  {
-   struct list_head *list;
-   LIST_HEAD(entries);
-   LIST_HEAD(before);
+   first->prev->next = last->next;
+   last->next->prev = first->prev;
  
-	reservation_object_assert_held(pos->last->resv);

-   list = is_swap ? &pos->last->swap : &pos->last->lru;
-   list_cut_position(&entries, lru, list);
+   list->prev->next = first;
+   first->prev = list->prev;
  
-	reservation_object_assert_held(pos->first->resv);

-   list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
-   list_cut_position(&before, &entries, list);
-
-   list_splice(&before, lru);
-   list_splice_tail(&entries, lru);
+   last->next = list;
+   list->prev = last;
  }
  
  void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)

@@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
*bulk)
unsigned i;
  
  	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {

+   struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
struct ttm_mem_type_manager *man;
  
-		if (!bulk->tt[i].first)

+   if (!pos->first)
continue;
  
-		man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];

-   ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
+   man = &pos->first->bdev->man[TTM_PL_TT];
+   ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
+   &pos->last->lru);
}
  
  	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {

+   struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
struct ttm_mem_type_manager *man;
  
-		if (!bulk->vram[i].first)

+   if (!pos->first)
continue;
  
-		man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];

-   ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i], false);
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
+   man = &pos->first->bdev->man[TTM_PL_VRAM];
+   ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
+   &pos->last->lru);
}
  
  	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {

@@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
*bulk)
if (!pos->first)
continue;
  
+		reservation_object_assert_held(pos->first->resv);

+   reservation_object_assert_held(pos->last->resv);
+
lru = &pos->first->bdev->glob->swap_lru[i];
-   ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
+   ttm_list_move_bulk_tail(lru, &pos->first->swap,
+   &pos->last->swap);
}
  }
  EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] list: introduce list_bulk_move_tail helper

2018-09-13 Thread Christian König
Move all entries between @first and including @last before @head.

This is useful for LRU lists where a whole block of entries should be
moved to the end of an list.

Signed-off-by: Christian König 
---
 include/linux/list.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/list.h b/include/linux/list.h
index de04cc5ed536..edb7628e46ed 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -183,6 +183,29 @@ static inline void list_move_tail(struct list_head *list,
list_add_tail(list, head);
 }
 
+/**
+ * list_bulk_move_tail - move a subsection of a list to its tail
+ * @head: the head that will follow our entry
+ * @first: first entry to move
+ * @last: last entry to move, can be the same as first
+ *
+ * Move all entries between @first and including @last before @head.
+ * All three entries must belong to the same linked list.
+ */
+static inline void list_bulk_move_tail(struct list_head *head,
+  struct list_head *first,
+  struct list_head *last)
+{
+   first->prev->next = last->next;
+   last->next->prev = first->prev;
+
+   head->prev->next = first;
+   first->prev = head->prev;
+
+   last->next = head;
+   head->prev = last;
+}
+
 /**
  * list_is_last - tests whether @list is the last entry in list @head
  * @list: the entry to test
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail v2

2018-09-13 Thread Christian König
While cutting the lists we sometimes accidentally added a list_head from
the stack to the LRUs, effectively corrupting the list.

Remove the list cutting and use explicit list manipulation instead.

v2: separate out new list_bulk_move_tail helper

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 46 +++-
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 138c98902033..26b889f86670 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -247,47 +247,38 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
 
-static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
-   struct list_head *lru, bool is_swap)
-{
-   struct list_head *list;
-   LIST_HEAD(entries);
-   LIST_HEAD(before);
-
-   reservation_object_assert_held(pos->last->resv);
-   list = is_swap ? &pos->last->swap : &pos->last->lru;
-   list_cut_position(&entries, lru, list);
-
-   reservation_object_assert_held(pos->first->resv);
-   list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
-   list_cut_position(&before, &entries, list);
-
-   list_splice(&before, lru);
-   list_splice_tail(&entries, lru);
-}
-
 void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
 {
unsigned i;
 
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
struct ttm_mem_type_manager *man;
 
-   if (!bulk->tt[i].first)
+   if (!pos->first)
continue;
 
-   man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
-   ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
+   man = &pos->first->bdev->man[TTM_PL_TT];
+   list_bulk_move_tail(&man->lru[i], &pos->first->lru,
+   &pos->last->lru);
}
 
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
struct ttm_mem_type_manager *man;
 
-   if (!bulk->vram[i].first)
+   if (!pos->first)
continue;
 
-   man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];
-   ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i], false);
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
+   man = &pos->first->bdev->man[TTM_PL_VRAM];
+   list_bulk_move_tail(&man->lru[i], &pos->first->lru,
+   &pos->last->lru);
}
 
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
@@ -297,8 +288,11 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
*bulk)
if (!pos->first)
continue;
 
+   reservation_object_assert_held(pos->first->resv);
+   reservation_object_assert_held(pos->last->resv);
+
lru = &pos->first->bdev->glob->swap_lru[i];
-   ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
+   list_bulk_move_tail(lru, &pos->first->swap, &pos->last->swap);
}
 }
 EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Christian König

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when

that one returns NULL you use wait_event_*() to wait for a signal
point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, which is
need when wait ioctl returns.

I don't really see a problem with that. When you wait for the first
one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence you
make sure that they wake up your thread when they get one.

So essentially exactly what drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.

I thought more a bit, we don't that mechanism at all, if use advanced wait

pt, we can easily use fence array to achieve it for wait ioctl, we should use
kernel existing feature as much as possible, not invent another, shouldn't we?
I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.

This is obviously a workaround when doing for wait ioctl, Do you see it used in 
other place?


And I absolutely don't see a
need to modify that and replace it with something far more complex.

The wait ioctl is simplified much more by fence array, not complex, and we just 
need  to allocate a wait pt.  If keeping old syncobj cb workaround, all wait pt 
logic still is there, just save allocation and wait pt handling, in fact, which 
part isn't complex at all. But compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb with 
the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi event 
macro, but for now just adding the sequence number should do the trick.


Regards,
Christian.



Thanks,
David Zhou

Regards,
Christian.


Thanks,
David Zhou

Christian.


Thanks,
David Zhou

Regards,
Christian.


Back to my implementation, it already fixes all your concerns
before, and can be able to easily used in wait_ioctl. When you
feel that is complicated, I guess that is because we merged all
logic to that and much clean up in one patch. In fact, it already
is very simple, timeline_init/fini, create signal/wait_pt, find
signal_pt for wait_pt, garbage collection, just them.

Thanks,
David Zhou

Regards,
Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH xf86-video-amdgpu] Bail from drmmode_cm_init if there's no CRTC

2018-09-13 Thread Michel Dänzer
From: Michel Dänzer 

We would crash due to dereferencing the NULL mode_res->crtc pointer.

Bugzilla: https://bugs.freedesktop.org/107913
Signed-off-by: Michel Dänzer 
---
 src/drmmode_display.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/drmmode_display.c b/src/drmmode_display.c
index 6ef6a98e2..cbda8ad35 100644
--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -3198,6 +3198,9 @@ static void drmmode_cm_init(int drm_fd, drmmode_ptr 
drmmode,
memset(drmmode->cm_prop_ids, 0, sizeof(drmmode->cm_prop_ids));
drmmode->gamma_lut_size = drmmode->degamma_lut_size = 0;
 
+   if (!mode_res->crtcs)
+   return;
+
/* AMD hardware has color management support on all pipes. It is
 * therefore sufficient to only check the first CRTC.
 */
-- 
2.19.0.rc2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Koenig, Christian
> Sent: Thursday, September 13, 2018 5:20 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):
> >
> >> -Original Message-
> >> From: Christian König 
> >> Sent: Thursday, September 13, 2018 4:50 PM
> >> To: Zhou, David(ChunMing) ; Koenig, Christian
> >> ; dri-de...@lists.freedesktop.org
> >> Cc: Dave Airlie ; Rakos, Daniel
> >> ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >>
> >> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):
>  -Original Message-
>  From: Koenig, Christian
>  Sent: Thursday, September 13, 2018 2:56 PM
>  To: Zhou, David(ChunMing) ; Zhou,
>  David(ChunMing) ; dri-
>  de...@lists.freedesktop.org
>  Cc: Dave Airlie ; Rakos, Daniel
>  ; amd-gfx@lists.freedesktop.org
>  Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
>  Am 13.09.2018 um 04:15 schrieb zhoucm1:
> > On 2018年09月12日 19:05, Christian König wrote:
> >> [SNIP]
> >> +static void drm_syncobj_find_signal_pt_for_wait_pt(struct
> >> drm_syncobj *syncobj,
> >> +   struct drm_syncobj_wait_pt
> >> +*wait_pt) {
> > That whole approach still looks horrible complicated to me.
> >>> It's already very close to what you said before.
> >>>
> > Especially the separation of signal and wait pt is completely
> > unnecessary as far as I can see.
> > When a wait pt is requested we just need to search for the
> > signal point which it will trigger.
> >>> Yeah, I tried this, but when I implement cpu wait ioctl on
> >>> specific point, we need a advanced wait pt fence, otherwise, we
> >>> could still need old syncobj cb.
> >> Why? I mean you just need to call drm_syncobj_find_fence() and
> >> when
> >> that one returns NULL you use wait_event_*() to wait for a signal
> >> point >= your wait point to appear and try again.
> > e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
> > no fence yet, as you said, during drm_syncobj_find_fence(A) is
> > working on wait_event, syncobjB and syncobjC could already be
> > signaled, then we don't know which one is first signaled, which is
> > need when wait ioctl returns.
>  I don't really see a problem with that. When you wait for the first
>  one you need to wait for A,B,C at the same time anyway.
> 
>  So what you do is to register a fence callback on the fences you
>  already have and for the syncobj which doesn't yet have a fence you
>  make sure that they wake up your thread when they get one.
> 
>  So essentially exactly what drm_syncobj_fence_get_or_add_callback()
>  already does today.
> >>> So do you mean we need still use old syncobj CB for that?
> >> Yes, as far as I can see it should work.
> >>
> >>>Advanced wait pt is bad?
> >> Well it isn't bad, I just don't see any advantage in it.
> >
> > The advantage is to replace old syncobj cb.
> >
> >> The existing mechanism
> >> should already be able to handle that.
> > I thought more a bit, we don't that mechanism at all, if use advanced wait
> pt, we can easily use fence array to achieve it for wait ioctl, we should use
> kernel existing feature as much as possible, not invent another, shouldn't we?
> I remember  you said  it before.
> 
> Yeah, but the syncobj cb is an existing feature.

This is obviously a workaround when doing for wait ioctl, Do you see it used in 
other place?

> And I absolutely don't see a
> need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, and we just 
need  to allocate a wait pt.  If keeping old syncobj cb workaround, all wait pt 
logic still is there, just save allocation and wait pt handling, in fact, which 
part isn't complex at all. But compare with ugly syncobj cb, which is simpler.

Thanks,
David Zhou
> 
> Regards,
> Christian.
> 
> >
> > Thanks,
> > David Zhou
> >> Christian.
> >>
> >>> Thanks,
> >>> David Zhou
>  Regards,
>  Christian.
> 
> > Back to my implementation, it already fixes all your concerns
> > before, and can be able to easily used in wait_ioctl. When you
> > feel that is complicated, I guess that is because we merged all
> > logic to that and much clean up in one patch. In fact, it already
> > is very simple, timeline_init/fini, create signal/wait_pt, find
> > signal_pt for wait_pt, garbage collection, just them.
> >
> > Thanks,
> > David Zhou
> >> Regards,
> >> Christian.
> >>> ___
> >>> amd-gfx mailing list
> 

Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring

2018-09-13 Thread Zhang, Jerry(Junwei)

On 09/11/2018 05:56 PM, Christian König wrote:

Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
  3 files changed, 43 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5eba66ecf668..20bb702f5c7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2940,54 +2940,6 @@ static int amdgpu_device_ip_post_soft_reset(struct 
amdgpu_device *adev)
return 0;
  }
  
-/**

- * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
- *
- * @adev: amdgpu_device pointer
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: amdgpu_bo buffer whose shadow is being restored
- * @fence: dma_fence associated with the operation
- *
- * Restores the VRAM buffer contents from the shadow in GTT.  Used to
- * restore things like GPUVM page tables after a GPU reset where
- * the contents of VRAM might be lost.
- * Returns 0 on success, negative error code on failure.
- */
-static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
- struct amdgpu_ring *ring,
- struct amdgpu_bo *bo,
- struct dma_fence **fence)
-{
-   uint32_t domain;
-   int r;
-
-   if (!bo->shadow)
-   return 0;
-
-   r = amdgpu_bo_reserve(bo, true);
-   if (r)
-   return r;
-   domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
-   /* if bo has been evicted, then no need to recover */
-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   r = amdgpu_bo_validate(bo->shadow);
-   if (r) {
-   DRM_ERROR("bo validate failed!\n");
-   goto err;
-   }
-
-   r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
-NULL, fence, true);
-   if (r) {
-   DRM_ERROR("recover page table failed!\n");
-   goto err;
-   }
-   }
-err:
-   amdgpu_bo_unreserve(bo);
-   return r;
-}
-
  /**
   * amdgpu_device_recover_vram - Recover some VRAM contents
   *
@@ -2996,16 +2948,15 @@ static int 
amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
   * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
   * restore things like GPUVM page tables after a GPU reset where
   * the contents of VRAM might be lost.
- * Returns 0 on success, 1 on failure.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
   */
  static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
  {
-   struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
-   struct amdgpu_bo *bo, *tmp;
struct dma_fence *fence = NULL, *next = NULL;
-   long r = 1;
-   int i = 0;
-   long tmo;
+   struct amdgpu_bo *shadow;
+   long r = 1, tmo;
  
  	if (amdgpu_sriov_runtime(adev))

tmo = msecs_to_jiffies(8000);
@@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct 
amdgpu_device *adev)
  
  	DRM_INFO("recover vram bo from shadow start\n");

mutex_lock(&adev->shadow_list_lock);
-   list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
-   next = NULL;
-   amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
+   list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
+
+   /* No need to recover an evicted BO */
+   if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
+   shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)

is there a change that shadow bo evicted to other domain?
like SYSTEM?

Regards,
Jerry

+   continue;
+
+   r = amdgpu_bo_restore_shadow(shadow, &next);
+   if (r)
+   break;
+
if (fence) {
r = dma_fence_wait_timeout(fence, false, tmo);
-   if (r == 0)
-   pr_err("wait fence %p[%d] timeout\n", fence, i);
-   else if (r < 0)
-   pr_err("wait fence %p[%d] interrupted\n", 
fence, i);
-   if (r < 1) {
-   dma_fence_put(fence);
-   fence = next;
+   dma_fence_put(fence);
+   fence = next;
+   if (r <= 0)
break;
-   }
-   i++;
+   

Re: [PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment

2018-09-13 Thread Zhang, Jerry(Junwei)

On 09/11/2018 05:56 PM, Christian König wrote:

They aren't directly used by the hardware.

Signed-off-by: Christian König 

Reviewed-by: Junwei Zhang 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 7db0040ca145..3a6f92de5504 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -516,7 +516,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
  }
  
  static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,

-  unsigned long size, int byte_align,
+  unsigned long size,
   struct amdgpu_bo *bo)
  {
struct amdgpu_bo_param bp;
@@ -527,7 +527,6 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device 
*adev,
  
  	memset(&bp, 0, sizeof(bp));

bp.size = size;
-   bp.byte_align = byte_align;
bp.domain = AMDGPU_GEM_DOMAIN_GTT;
bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC |
AMDGPU_GEM_CREATE_SHADOW;
@@ -576,7 +575,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv,
NULL));
  
-		r = amdgpu_bo_create_shadow(adev, bp->size, bp->byte_align, (*bo_ptr));

+   r = amdgpu_bo_create_shadow(adev, bp->size, *bo_ptr);
  
  		if (!bp->resv)

reservation_object_unlock((*bo_ptr)->tbo.resv);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery

2018-09-13 Thread Zhang, Jerry(Junwei)

On 09/11/2018 05:56 PM, Christian König wrote:

It shouldn't add much overhead and we should make sure that critical
VRAM content is always restored.

Signed-off-by: Christian König 

Acked-by: Junwei Zhang 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 93476b8c2e72..5eba66ecf668 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2989,7 +2989,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct 
amdgpu_device *adev,
  }
  
  /**

- * amdgpu_device_handle_vram_lost - Handle the loss of VRAM contents
+ * amdgpu_device_recover_vram - Recover some VRAM contents
   *
   * @adev: amdgpu_device pointer
   *
@@ -2998,7 +2998,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct 
amdgpu_device *adev,
   * the contents of VRAM might be lost.
   * Returns 0 on success, 1 on failure.
   */
-static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
+static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
  {
struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
struct amdgpu_bo *bo, *tmp;
@@ -3125,8 +3125,8 @@ static int amdgpu_device_reset(struct amdgpu_device *adev)
}
}
  
-	if (!r && ((need_full_reset && !(adev->flags & AMD_IS_APU)) || vram_lost))

-   r = amdgpu_device_handle_vram_lost(adev);
+   if (!r)
+   r = amdgpu_device_recover_vram(adev);
  
  	return r;

  }
@@ -3172,7 +3172,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
amdgpu_virt_release_full_gpu(adev, true);
if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
atomic_inc(&adev->vram_lost_counter);
-   r = amdgpu_device_handle_vram_lost(adev);
+   r = amdgpu_device_recover_vram(adev);
}
  
  	return r;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] drm/amdgpu: stop pipelining VM PDs/PTs moves

2018-09-13 Thread Zhang, Jerry(Junwei)


On 09/11/2018 05:55 PM, Christian König wrote:

We are going to need this for recoverable page fault handling and it
makes shadow handling during GPU reset much more easier.

Signed-off-by: Christian König 

Acked-by: Junwei Zhang 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 6 +-
  2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b5f20b42439e..a7e39c9dd14b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1363,7 +1363,7 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
  {
WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM);
WARN_ON_ONCE(!ww_mutex_is_locked(&bo->tbo.resv->lock) &&
-!bo->pin_count);
+!bo->pin_count && bo->tbo.type != ttm_bo_type_kernel);
WARN_ON_ONCE(bo->tbo.mem.start == AMDGPU_BO_INVALID_OFFSET);
WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_VRAM &&
 !(bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS));
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d9f3201c9e5c..2f32dc692d40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -524,7 +524,11 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
if (r)
goto error;
  
-	r = ttm_bo_pipeline_move(bo, fence, evict, new_mem);

+   /* Always block for VM page tables before committing the new location */
+   if (bo->type == ttm_bo_type_kernel)
+   r = ttm_bo_move_accel_cleanup(bo, fence, true, new_mem);
+   else
+   r = ttm_bo_pipeline_move(bo, fence, evict, new_mem);
dma_fence_put(fence);
return r;
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Christian König

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):



-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when

that one returns NULL you use wait_event_*() to wait for a signal
point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, which is
need when wait ioctl returns.

I don't really see a problem with that. When you wait for the first
one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence you
make sure that they wake up your thread when they get one.

So essentially exactly what drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


   Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.


The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.

I thought more a bit, we don't that mechanism at all, if use advanced wait pt, 
we can easily use fence array to achieve it for wait ioctl, we should use 
kernel existing feature as much as possible, not invent another, shouldn't we?  
I remember  you said  it before.


Yeah, but the syncobj cb is an existing feature. And I absolutely don't 
see a need to modify that and replace it with something far more complex.


Regards,
Christian.



Thanks,
David Zhou

Christian.


Thanks,
David Zhou

Regards,
Christian.


Back to my implementation, it already fixes all your concerns
before, and can be able to easily used in wait_ioctl. When you feel
that is complicated, I guess that is because we merged all logic to
that and much clean up in one patch. In fact, it already is very
simple, timeline_init/fini, create signal/wait_pt, find signal_pt
for wait_pt, garbage collection, just them.

Thanks,
David Zhou

Regards,
Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Thursday, September 13, 2018 4:50 PM
> To: Zhou, David(ChunMing) ; Koenig, Christian
> ; dri-de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):
> >
> >> -Original Message-
> >> From: Koenig, Christian
> >> Sent: Thursday, September 13, 2018 2:56 PM
> >> To: Zhou, David(ChunMing) ; Zhou,
> >> David(ChunMing) ; dri-
> >> de...@lists.freedesktop.org
> >> Cc: Dave Airlie ; Rakos, Daniel
> >> ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >>
> >> Am 13.09.2018 um 04:15 schrieb zhoucm1:
> >>> On 2018年09月12日 19:05, Christian König wrote:
>  [SNIP]
>  +static void drm_syncobj_find_signal_pt_for_wait_pt(struct
>  drm_syncobj *syncobj,
>  +   struct drm_syncobj_wait_pt
>  +*wait_pt) {
> >>> That whole approach still looks horrible complicated to me.
> > It's already very close to what you said before.
> >
> >>> Especially the separation of signal and wait pt is completely
> >>> unnecessary as far as I can see.
> >>> When a wait pt is requested we just need to search for the
> >>> signal point which it will trigger.
> > Yeah, I tried this, but when I implement cpu wait ioctl on
> > specific point, we need a advanced wait pt fence, otherwise, we
> > could still need old syncobj cb.
>  Why? I mean you just need to call drm_syncobj_find_fence() and
> when
>  that one returns NULL you use wait_event_*() to wait for a signal
>  point >= your wait point to appear and try again.
> >>> e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
> >>> no fence yet, as you said, during drm_syncobj_find_fence(A) is
> >>> working on wait_event, syncobjB and syncobjC could already be
> >>> signaled, then we don't know which one is first signaled, which is
> >>> need when wait ioctl returns.
> >> I don't really see a problem with that. When you wait for the first
> >> one you need to wait for A,B,C at the same time anyway.
> >>
> >> So what you do is to register a fence callback on the fences you
> >> already have and for the syncobj which doesn't yet have a fence you
> >> make sure that they wake up your thread when they get one.
> >>
> >> So essentially exactly what drm_syncobj_fence_get_or_add_callback()
> >> already does today.
> > So do you mean we need still use old syncobj CB for that?
> 
> Yes, as far as I can see it should work.
> 
> >   Advanced wait pt is bad?
> 
> Well it isn't bad, I just don't see any advantage in it.


The advantage is to replace old syncobj cb.

> The existing mechanism
> should already be able to handle that.

I thought more a bit, we don't that mechanism at all, if use advanced wait pt, 
we can easily use fence array to achieve it for wait ioctl, we should use 
kernel existing feature as much as possible, not invent another, shouldn't we?  
I remember  you said  it before.

Thanks,
David Zhou
> 
> Christian.
> 
> >
> > Thanks,
> > David Zhou
> >> Regards,
> >> Christian.
> >>
> >>> Back to my implementation, it already fixes all your concerns
> >>> before, and can be able to easily used in wait_ioctl. When you feel
> >>> that is complicated, I guess that is because we merged all logic to
> >>> that and much clean up in one patch. In fact, it already is very
> >>> simple, timeline_init/fini, create signal/wait_pt, find signal_pt
> >>> for wait_pt, garbage collection, just them.
> >>>
> >>> Thanks,
> >>> David Zhou
>  Regards,
>  Christian.
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Christian König

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt *wait_pt)
+{

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the signal
point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on specific
point, we need a advanced wait pt fence, otherwise, we could still
need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and when
that one returns NULL you use wait_event_*() to wait for a signal
point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no
fence yet, as you said, during drm_syncobj_find_fence(A) is working on
wait_event, syncobjB and syncobjC could already be signaled, then we
don't know which one is first signaled, which is need when wait ioctl
returns.

I don't really see a problem with that. When you wait for the first one you
need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you already have
and for the syncobj which doesn't yet have a fence you make sure that they
wake up your thread when they get one.

So essentially exactly what drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?


Yes, as far as I can see it should work.


  Advanced wait pt is bad?


Well it isn't bad, I just don't see any advantage in it. The existing 
mechanism should already be able to handle that.


Christian.



Thanks,
David Zhou

Regards,
Christian.


Back to my implementation, it already fixes all your concerns before,
and can be able to easily used in wait_ioctl. When you feel that is
complicated, I guess that is because we merged all logic to that and
much clean up in one patch. In fact, it already is very simple,
timeline_init/fini, create signal/wait_pt, find signal_pt for wait_pt,
garbage collection, just them.

Thanks,
David Zhou

Regards,
Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier (v2)

2018-09-13 Thread Christian König

Am 13.09.2018 um 00:02 schrieb Philip Yang:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream. See
http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD intranet)

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig  |  6 +--
  drivers/gpu/drm/amd/amdgpu/Makefile |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 78 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.h | 41 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 50 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h  |  7 +++


Please either fully merge amdgpu_mn.* into amdgpu_hmm.* or the other way 
around.


Additional to that you should add documentation to all newly added 
functions.


Regards,
Christian.


  6 files changed, 178 insertions(+), 5 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
-   select MMU_NOTIFIER
+   select HMM_MIRROR
help
- This option selects CONFIG_MMU_NOTIFIER if it isn't already
- selected to enabled full userptr support.
+ This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+ isn't already selected to enabled full userptr support.
  
  config DRM_AMDGPU_GART_DEBUGFS

bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 138cb78..ee691e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -172,6 +172,7 @@ amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
  amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_hmm.o
  
  include $(FULL_AMD_PATH)/powerplay/Makefile
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

new file mode 100644
index 000..6c506f6
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "amdgpu.h"
+#include "amdgpu_mn.h"
+
+static void amdgpu_hmm_release(struct hmm_mirror *mirror)
+{
+   pr_debug("mirror=%p\n", mirror);
+   amdgpu_hmm_mn_release(mirror);
+}
+
+static int amdgpu_hmm_sync_cpu_device_pagetables(struct hmm_mirror *mirror,
+   const struct hmm_update *update)
+{
+   struct hmm *hmm;
+   struct mm_struct *mm;
+   unsigned long start;
+   unsigned long end;
+
+   start = update->start;
+   end = update->end;
+
+   pr_debug("mirror %p start %lx end %lx\n", mirror, start, end);
+
+   hmm = mirror->hmm;
+   mm = *(struct mm_struct **)hmm;
+
+   return amdgpu_mn_invalidate_range(mirror, mm, start, end,
+   update->blockable);
+}
+
+static struct hmm_mirror_ops amdgpu_hmm_mirror_ops = {
+   .sync_cpu_device_pagetables = amdgpu_hmm_sync_cpu_device_pagetables,
+   .release = amdgpu_hmm_release
+};
+
+int amdgpu_hmm_register(struct hmm_mirror *mirror, struct mm_struct *mm)
+{
+   pr_deb

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Christian König
As discussed internally that doesn't work because threads don't 
necessary get the same VMID assigned.


Christian.

Am 12.09.2018 um 22:33 schrieb Marek Olšák:

From: Marek Olšák 

I've chosen to do it like this because it's easy and allows an arbitrary
number of processes.

Signed-off-by: Marek Olšák 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
  drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125 +---
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
  include/uapi/drm/amdgpu_drm.h   |  15 +--
  19 files changed, 109 insertions(+), 519 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b80243d3972e..7264a4930b88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
/ sizeof(struct amdgpu_bo_list_entry))
return -EINVAL;
  
  	size = sizeof(struct amdgpu_bo_list);

size += num_entries * sizeof(struct amdgpu_bo_list_entry);
list = kvmalloc(size, GFP_KERNEL);
if (!list)
return -ENOMEM;
  
  	kref_init(&list->refcount);

-   list->gds_obj = adev->gds.gds_gfx_bo;
-   list->gws_obj = adev->gds.gws_gfx_bo;
-   list->oa_obj = adev->gds.oa_gfx_bo;
  
  	array = amdgpu_bo_list_array_entry(list, 0);

memset(array, 0, num_entries * sizeof(struct amdgpu_bo_list_entry));
  
  	for (i = 0; i < num_entries; ++i) {

struct amdgpu_bo_list_entry *entry;
struct drm_gem_object *gobj;
struct amdgpu_bo *bo;
struct mm_struct *usermm;
  
@@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,

} else {
entry = &array[last_entry++];
}
  
  		entry->robj = bo;

entry->priority = min(info[i].bo_priority,
  AMDGPU_BO_LIST_MAX_PRIORITY);
entry->tv.bo = &entry->robj->tbo;
entry->tv.shared = !entry->robj->prime_shared_count;
  
-		if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)

-   list->gds_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
-   list->gws_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
-   list->oa_obj = entry->robj;
-
total_size += amdgpu_bo_size(entry->robj);
trace_amdgpu_bo_list_set(list, entry->robj);
}
  
  	list->first_userptr = first_userptr;

list->num_entries = num_entries;
  
  	trace_amdgpu_cs_bo_status(list->num_entries, total_size);
  
  	*result = list;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index 61b089768e1c..30f12a60aa28 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -36,23 +36,20 @@ struct amdgpu_bo_list_entry {
struct ttm_validate_buffer  tv;
struct amdgpu_bo_va *bo_va;
uint32_tpriority;
struct page **user_pages;
int user_invalidated;
  };
  
  struct amdgpu_bo_list {

struct rcu_head rhead;
struct kref refcount;
-   struct amdgpu_bo *gds_obj;
-   struct amdgpu_bo *gws_obj;
-   struct amdgpu_bo *oa_obj;
unsigned first_userptr;
unsigned num_entries;
  };
  
  int amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id,

   struct amdgpu_bo_list **result);
  void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
 struct list_head *validated);
  void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
  int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
diff --gi

Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-13 Thread Huang Rui
On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote:
> While cutting the lists we sometimes accidentally added a list_head from
> the stack to the LRUs, effectively corrupting the list.
> 
> Remove the list cutting and use explicit list manipulation instead.

This patch actually fixes the corruption bug. Was it a defect of
list_cut_position or list_splice handlers?

Reviewed-and-Tested: Huang Rui 

> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 51 
> ++--
>  1 file changed, 30 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 138c98902033..b2a33bf1ef10 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object 
> *bo,
>  }
>  EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
>  
> -static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
> - struct list_head *lru, bool is_swap)
> +static void ttm_list_move_bulk_tail(struct list_head *list,
> + struct list_head *first,
> + struct list_head *last)
>  {
> - struct list_head *list;
> - LIST_HEAD(entries);
> - LIST_HEAD(before);
> + first->prev->next = last->next;
> + last->next->prev = first->prev;
>  
> - reservation_object_assert_held(pos->last->resv);
> - list = is_swap ? &pos->last->swap : &pos->last->lru;
> - list_cut_position(&entries, lru, list);
> + list->prev->next = first;
> + first->prev = list->prev;
>  
> - reservation_object_assert_held(pos->first->resv);
> - list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
> - list_cut_position(&before, &entries, list);
> -
> - list_splice(&before, lru);
> - list_splice_tail(&entries, lru);
> + last->next = list;
> + list->prev = last;
>  }
>  
>  void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
> @@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
> *bulk)
>   unsigned i;
>  
>   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> + struct ttm_lru_bulk_move_pos *pos = &bulk->tt[i];
>   struct ttm_mem_type_manager *man;
>  
> - if (!bulk->tt[i].first)
> + if (!pos->first)
>   continue;
>  
> - man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
> - ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
> + reservation_object_assert_held(pos->first->resv);
> + reservation_object_assert_held(pos->last->resv);
> +
> + man = &pos->first->bdev->man[TTM_PL_TT];
> + ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
> + &pos->last->lru);
>   }
>  
>   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> + struct ttm_lru_bulk_move_pos *pos = &bulk->vram[i];
>   struct ttm_mem_type_manager *man;
>  
> - if (!bulk->vram[i].first)
> + if (!pos->first)
>   continue;
>  
> - man = &bulk->vram[i].first->bdev->man[TTM_PL_VRAM];
> - ttm_bo_bulk_move_helper(&bulk->vram[i], &man->lru[i], false);
> + reservation_object_assert_held(pos->first->resv);
> + reservation_object_assert_held(pos->last->resv);
> +
> + man = &pos->first->bdev->man[TTM_PL_VRAM];
> + ttm_list_move_bulk_tail(&man->lru[i], &pos->first->lru,
> + &pos->last->lru);
>   }
>  
>   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> @@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move 
> *bulk)
>   if (!pos->first)
>   continue;
>  
> + reservation_object_assert_held(pos->first->resv);
> + reservation_object_assert_held(pos->last->resv);
> +
>   lru = &pos->first->bdev->glob->swap_lru[i];
> - ttm_bo_bulk_move_helper(&bulk->swap[i], lru, true);
> + ttm_list_move_bulk_tail(lru, &pos->first->swap,
> + &pos->last->swap);
>   }
>  }
>  EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
> -- 
> 2.14.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Koenig, Christian
> Sent: Thursday, September 13, 2018 2:56 PM
> To: Zhou, David(ChunMing) ; Zhou,
> David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 13.09.2018 um 04:15 schrieb zhoucm1:
> > On 2018年09月12日 19:05, Christian König wrote:
> >
> >> [SNIP]
> >> +static void drm_syncobj_find_signal_pt_for_wait_pt(struct
> >> drm_syncobj *syncobj,
> >> +   struct drm_syncobj_wait_pt *wait_pt)
> >> +{
> >
> > That whole approach still looks horrible complicated to me.
> >>> It's already very close to what you said before.
> >>>
> >
> > Especially the separation of signal and wait pt is completely
> > unnecessary as far as I can see.
> > When a wait pt is requested we just need to search for the signal
> > point which it will trigger.
> >>> Yeah, I tried this, but when I implement cpu wait ioctl on specific
> >>> point, we need a advanced wait pt fence, otherwise, we could still
> >>> need old syncobj cb.
> >>
> >> Why? I mean you just need to call drm_syncobj_find_fence() and when
> >> that one returns NULL you use wait_event_*() to wait for a signal
> >> point >= your wait point to appear and try again.
> > e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no
> > fence yet, as you said, during drm_syncobj_find_fence(A) is working on
> > wait_event, syncobjB and syncobjC could already be signaled, then we
> > don't know which one is first signaled, which is need when wait ioctl
> > returns.
> 
> I don't really see a problem with that. When you wait for the first one you
> need to wait for A,B,C at the same time anyway.
> 
> So what you do is to register a fence callback on the fences you already have
> and for the syncobj which doesn't yet have a fence you make sure that they
> wake up your thread when they get one.
> 
> So essentially exactly what drm_syncobj_fence_get_or_add_callback()
> already does today.

So do you mean we need still use old syncobj CB for that? Advanced wait pt is 
bad?

Thanks,
David Zhou
> 
> Regards,
> Christian.
> 
> >
> > Back to my implementation, it already fixes all your concerns before,
> > and can be able to easily used in wait_ioctl. When you feel that is
> > complicated, I guess that is because we merged all logic to that and
> > much clean up in one patch. In fact, it already is very simple,
> > timeline_init/fini, create signal/wait_pt, find signal_pt for wait_pt,
> > garbage collection, just them.
> >
> > Thanks,
> > David Zhou
> >>
> >> Regards,
> >> Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amd: Add DM DMCU support

2018-09-13 Thread Huang Rui
On Tue, Sep 11, 2018 at 03:15:13PM -0400, David Francis wrote:
> DMCU (Display Microcontroller Unit) is a GPU chip involved in
> eDP features like Adaptive Backlight Modulation and Panel Self
> Refresh.
> 
> DC is already fully equipped to initialize DMCU as long as the
> firmware is loaded.
> 
> At the moment only the raven firmware is available.
> 
> A single .bin file is loaded by the kernel's loading mechanism
> and split into two ucodes according to the header.
> 
> DMCU is optional, so if the firmware is not found, no error or
> warning is raised.
> 
> Signed-off-by: David Francis 
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 91 ++-
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +
>  2 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 5103eba75cb3..7541c849a417 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -30,6 +30,7 @@
>  #include "vid.h"
>  #include "amdgpu.h"
>  #include "amdgpu_display.h"
> +#include "amdgpu_ucode.h"
>  #include "atom.h"
>  #include "amdgpu_dm.h"
>  #include "amdgpu_pm.h"
> @@ -50,6 +51,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -71,6 +73,9 @@
>  
>  #include "modules/inc/mod_freesync.h"
>  
> +#define FIRMWARE_RAVEN_DMCU  "amdgpu/raven_dmcue.bin"
> +MODULE_FIRMWARE(FIRMWARE_RAVEN_DMCU);
> +
>  /* basic init/fini API */
>  static int amdgpu_dm_init(struct amdgpu_device *adev);
>  static void amdgpu_dm_fini(struct amdgpu_device *adev);
> @@ -514,13 +519,97 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
>   return;
>  }
>  
> -static int dm_sw_init(void *handle)
> +static int load_dmcu_fw(struct amdgpu_device *adev)
>  {
> + const char *fw_name_dmcu;
> + int r;
> + const struct dmcu_firmware_header_v1_0 *hdr;
> +
> + switch(adev->asic_type) {
> + case CHIP_BONAIRE:
> + case CHIP_HAWAII:
> + case CHIP_KAVERI:
> + case CHIP_KABINI:
> + case CHIP_MULLINS:
> + case CHIP_TONGA:
> + case CHIP_FIJI:
> + case CHIP_CARRIZO:
> + case CHIP_STONEY:
> + case CHIP_POLARIS11:
> + case CHIP_POLARIS10:
> + case CHIP_POLARIS12:
> + case CHIP_VEGAM:
> + case CHIP_VEGA10:
> + case CHIP_VEGA12:
> + case CHIP_VEGA20:
> + return 0;
> + case CHIP_RAVEN:
> + fw_name_dmcu = FIRMWARE_RAVEN_DMCU;
> + break;
> + default:
> + DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
> + return -1;

We can use -EINVAL instead of magic number as return value.

For others, it looks good for me.
With this fixed, feel free to add

Reviewed-by: Huang Rui 

> + }
> +
> + if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
> + DRM_DEBUG_KMS("dm: DMCU firmware not supported on direct or SMU 
> loading\n");
> + return 0;
> + }
> +
> + r = request_firmware_direct(&adev->dm.fw_dmcu, fw_name_dmcu, adev->dev);
> + if (r == -ENOENT) {
> + /* DMCU firmware is not necessary, so don't raise a fuss if 
> it's missing */
> + DRM_DEBUG_KMS("dm: DMCU firmware not found\n");
> + adev->dm.fw_dmcu = NULL;
> + return 0;
> + }
> + if (r) {
> + dev_err(adev->dev, "amdgpu_dm: Can't load firmware \"%s\"\n",
> + fw_name_dmcu);
> + return r;
> + }
> +
> + r = amdgpu_ucode_validate(adev->dm.fw_dmcu);
> + if (r) {
> + dev_err(adev->dev, "amdgpu_dm: Can't validate firmware 
> \"%s\"\n",
> + fw_name_dmcu);
> + release_firmware(adev->dm.fw_dmcu);
> + adev->dm.fw_dmcu = NULL;
> + return r;
> + }
> +
> + hdr = (const struct dmcu_firmware_header_v1_0 *)adev->dm.fw_dmcu->data;
> + adev->firmware.ucode[AMDGPU_UCODE_ID_DMCU_ERAM].ucode_id = 
> AMDGPU_UCODE_ID_DMCU_ERAM;
> + adev->firmware.ucode[AMDGPU_UCODE_ID_DMCU_ERAM].fw = adev->dm.fw_dmcu;
> + adev->firmware.fw_size +=
> + ALIGN(le32_to_cpu(hdr->header.ucode_size_bytes) - 
> le32_to_cpu(hdr->intv_size) * 4, PAGE_SIZE);
> +
> + adev->firmware.ucode[AMDGPU_UCODE_ID_DMCU_INTV].ucode_id = 
> AMDGPU_UCODE_ID_DMCU_INTV;
> + adev->firmware.ucode[AMDGPU_UCODE_ID_DMCU_INTV].fw = adev->dm.fw_dmcu;
> + adev->firmware.fw_size +=
> + ALIGN(le32_to_cpu(hdr->intv_size) * 4, PAGE_SIZE);
> +
> + DRM_DEBUG_KMS("PSP loading DMCU firmware\n");
> +
>   return 0;
>  }
>  
> +static int dm_sw_init(void *handle)
> +{
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> + return load_dmcu_fw(adev);
> +}
> +
>  static int dm_sw_fini(void *handle)
>  {
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> + if(adev->dm.fw_dmcu) {
>

Re: [PATCH 2/3] drm/amd: Add PSP DMCU support

2018-09-13 Thread Huang Rui
On Tue, Sep 11, 2018 at 03:15:12PM -0400, David Francis wrote:
> DMCU (Display Microcontroller Unit) is a GPU chip involved in
> eDP features like Adaptive Backlight Modulation and Panel Self
> Refresh.
> 
> PSP is already equipped to handle DMCU firmware loading, all
> that is needed is to translate between the new DMCU ucode ID and
> the equivalent psp_gfx_fw_type.
> 
> Signed-off-by: David Francis 

Reviewed-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> index 02be34e72ed9..240dc8c85867 100644
> --- a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> @@ -91,6 +91,12 @@ psp_v10_0_get_fw_type(struct amdgpu_firmware_info *ucode, 
> enum psp_gfx_fw_type *
>   case AMDGPU_UCODE_ID_VCN:
>   *type = GFX_FW_TYPE_VCN;
>   break;
> + case AMDGPU_UCODE_ID_DMCU_ERAM:
> + *type = GFX_FW_TYPE_DMCU_ERAM;
> + break;
> + case AMDGPU_UCODE_ID_DMCU_INTV:
> + *type = GFX_FW_TYPE_DMCU_ISR;
> + break;
>   case AMDGPU_UCODE_ID_MAXIMUM:
>   default:
>   return -EINVAL;
> -- 
> 2.17.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amd: Add ucode DMCU support

2018-09-13 Thread Huang Rui
On Tue, Sep 11, 2018 at 03:15:11PM -0400, David Francis wrote:
> DMCU (Display Microcontroller Unit) is a GPU chip involved in
> eDP features like Adaptive Backlight Modulation and Panel Self
> Refresh.
> 
> DMCU has two pieces of firmware: the ERAM and the interrupt
> vectors, which must be loaded seperately.
> 
> To this end, the DMCU firmware has a custom header and parsing
> logic similar to MEC, to extract the two ucodes from a single
> struct firmware.
> 
> Signed-off-by: David Francis 

Reviewed-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 21 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 10 ++
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index a942fd28dae8..3b1af1cecf14 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -322,6 +322,7 @@ static int amdgpu_ucode_init_single_fw(struct 
> amdgpu_device *adev,
>  {
>   const struct common_firmware_header *header = NULL;
>   const struct gfx_firmware_header_v1_0 *cp_hdr = NULL;
> + const struct dmcu_firmware_header_v1_0 *dmcu_hdr = NULL;
>  
>   if (NULL == ucode->fw)
>   return 0;
> @@ -333,8 +334,8 @@ static int amdgpu_ucode_init_single_fw(struct 
> amdgpu_device *adev,
>   return 0;
>  
>   header = (const struct common_firmware_header *)ucode->fw->data;
> -
>   cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
> + dmcu_hdr = (const struct dmcu_firmware_header_v1_0 *)ucode->fw->data;
>  
>   if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP ||
>   (ucode->ucode_id != AMDGPU_UCODE_ID_CP_MEC1 &&
> @@ -343,7 +344,9 @@ static int amdgpu_ucode_init_single_fw(struct 
> amdgpu_device *adev,
>ucode->ucode_id != AMDGPU_UCODE_ID_CP_MEC2_JT &&
>ucode->ucode_id != AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL &&
>ucode->ucode_id != AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM &&
> -  ucode->ucode_id != AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM)) {
> +  ucode->ucode_id != AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM &&
> +  ucode->ucode_id != AMDGPU_UCODE_ID_DMCU_ERAM &&
> +  ucode->ucode_id != AMDGPU_UCODE_ID_DMCU_INTV)) {
>   ucode->ucode_size = le32_to_cpu(header->ucode_size_bytes);
>  
>   memcpy(ucode->kaddr, (void *)((uint8_t *)ucode->fw->data +
> @@ -365,6 +368,20 @@ static int amdgpu_ucode_init_single_fw(struct 
> amdgpu_device *adev,
> 
> le32_to_cpu(header->ucode_array_offset_bytes) +
> le32_to_cpu(cp_hdr->jt_offset) * 
> 4),
>  ucode->ucode_size);
> + } else if (ucode->ucode_id == AMDGPU_UCODE_ID_DMCU_ERAM) {
> + ucode->ucode_size = le32_to_cpu(header->ucode_size_bytes) -
> + le32_to_cpu(dmcu_hdr->intv_size) * 4;
> +
> + memcpy(ucode->kaddr, (void *)((uint8_t *)ucode->fw->data +
> +   
> le32_to_cpu(header->ucode_array_offset_bytes)),
> +ucode->ucode_size);
> + } else if (ucode->ucode_id == AMDGPU_UCODE_ID_DMCU_INTV) {
> + ucode->ucode_size = le32_to_cpu(dmcu_hdr->intv_size) * 4;
> +
> + memcpy(ucode->kaddr, (void *)((uint8_t *)ucode->fw->data +
> +   
> le32_to_cpu(header->ucode_array_offset_bytes) +
> +   
> le32_to_cpu(dmcu_hdr->intv_offset) * 4),
> +ucode->ucode_size);
>   } else if (ucode->ucode_id == AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL) {
>   ucode->ucode_size = 
> adev->gfx.rlc.save_restore_list_cntl_size_bytes;
>   memcpy(ucode->kaddr, adev->gfx.rlc.save_restore_list_cntl,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
> index b358e7519987..13bd540709b6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
> @@ -157,6 +157,13 @@ struct gpu_info_firmware_header_v1_0 {
>   uint16_t version_minor; /* version */
>  };
>  
> +/* version_major=1, version_minor=0 */
> +struct dmcu_firmware_header_v1_0 {
> + struct common_firmware_header header;
> + uint32_t intv_offset; /* interrupt vectors offset from end of header, 
> in words */
> + uint32_t intv_size;  /* size of interrupt vectors, in words */
> +};
> +
>  /* header is fixed size */
>  union amdgpu_firmware_header {
>   struct common_firmware_header common;
> @@ -170,6 +177,7 @@ union amdgpu_firmware_header {
>   struct sdma_firmware_header_v1_0 sdma;
>   struct sdma_firmware_header_v1_1 sdma_v1_1;
>   struct gpu_info_firmware_header_v1_0 gpu_info;
> + struct dmcu_firmware_header_v1_0 dmcu;

Re: [PATCH 1/2] drm/amdgpu: Moved fault hash table to amdgpu vm

2018-09-13 Thread Christian König

I think I'm missing something.
Actually I'm thinking more that I'm missing something :) But the result 
is the same.



You're raising some potential problems that I think must be about swapping out 
page tables that don't have fences on them (otherwise they wouldn't be evicted) 
and that are managed by HMM.
Wait a second, exactly that's the point. The page tables do have fences 
on them when they are evicted.


We currently just pipeline the eviction so that it happens only after 
those fences are signaled.



I think we should have a phone call to sort this out.

Yeah, that is a really good idea.

Regards,
Christian.

Am 13.09.2018 um 06:45 schrieb Kuehling, Felix:

I think I'm missing something. I thought the whole discussion we were having 
about tracking current and future locations of page tables was about swapping 
out page tables that are managed by HMM. We currently do swap out page tables 
that don't have fence on them and that are not managed by HMM. And pipelining 
that is not a problem today.

You're raising some potential problems that I think must be about swapping out 
page tables that don't have fences on them (otherwise they wouldn't be evicted) 
and that are managed by HMM. If there is no fence on the page table, no engine 
that can't handle page faults can depend on the page tables. So we can discard 
the contents and set the parent page directory entry to invalid. Setting the 
parent PDE needs to be synchronous so that the GPU doesn't try to access a page 
table that is no longer there. No pipelining, no problem.

Then you came back with an argument "what about engines that don't support page 
faults". Those engines can't use memory mapped by HMM anyway. And they can't be 
evicted while they have a fence on them that indicates active usage by such an engine.

You seem to see some problems that require not pipelining page table evictions. 
I still don't understand what the problem is. I think we should have a phone 
call to sort this out.

A few more comments inline, but I think we're misunderstanding each other ...


5. The new system memory location of the page table is noted in its BO.

You mean in the parent page table? You can just invalidate the entry in
the parent page table and let it fault.

I'm repeating myself, but exactly that is what won't work.

See we still have engines which can't handle page faults which uses
the same VM at the same time. This means that we can't just fault in
page tables.

And I don't understand why that is a problem. Those clients rely on
fences to keep their BOs resident, including the page tables. Are you
planning to change that?

No, but how do you want to swap out page tables when there is a fence added?

I don't. That's my point. If the page table has fences on it, it can't be 
swapped out. So there is no problem.


Or do you want to stop adding fences to page tables for engines with
recoverable faults?

Yes, that was my assumption, coming from a KFD perspective where with HMM we 
want to get away from our eviction fences that enforce BO residency, including 
page tables. Right now signaling an eviction fence means stopping user mode 
queues. We want to stop doing that.

If page tables get evicted, that's fine as long as those virtual addresses 
aren't accessed. Once an access happens, the page table needs to be restored or 
recreated. Once that's done, the retrying hardware block will get a successful 
translation and continue executing.


I think that is completely unrealistic considering the performance
penalty of faults.

I agree that evicting a page table that's in active use is bad. For amdgpu_cs 
you can prevent that with a fence, no problem.

But a fence is not a good way to prevent that for KFD, because it forces us to 
keep using our eviction fence and preempting user mode queues on evictions. You 
see, the eviction fence does not prevent the page table from being evicted, but 
it forces us to preempt all our queues when an eviction happens. That is a 
worse performance penalty than dealing with a page fault because the page fault 
is much more selective. A page fault can interrupt a compute shader at wave 
granularity. An preemption interrupts compute shaders with process granularity 
and at much longer time scales.

For KFD I would try to find a better way to avoid evictions of page tables, 
maybe by bumping them up in the LRU at appropriate times. But even without any 
such improvements, page faults are still better for KFD than the current 
eviction-fence-based approach.

Regards,
   Felix


At least for currently available hardware we should limit page faults to
be used in as few cases as possible, e.g. SVM and userptr.

Regards,
Christian.


From: Koenig, Christian
Sent: Wednesday, September 12, 2018 11:59 AM
To: Kuehling, Felix; Zeng, Oak; Oak Zeng; amd-gfx@lists.freedesktop.org; Yang, 
Philip
Subject: Re: [PATCH 1/2] drm/amdgpu: Moved fault hash table to amdgpu vm

Am 12.09.2018 um 17:29