Re: deprecated register issues

2018-03-07 Thread Mao, David
We requires base driver to provide the mask of disabled RB.
This is why kernel read the CC_RB_BACKEND_DISABLE to collect the harvest 
configuration.
Where did you get to know that the register is deprecated?
I think it should still be there.

Best Regards,
David

On Mar 7, 2018, at 9:49 PM, Liu, Monk 
> wrote:

+ UMD guys

Hi David

Do you know if GC_USER_RB_BACKEND_DISABLE is still exist for gfx9/vega10 ?

We found CC_RB_BACKEND_DISABLE was deprecated but looks it is still in use in 
kmd, so
I want to check with you both of above registers

Thanks
/Monk

From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Christian K?nig
Sent: 2018年3月7日 20:26
To: Liu, Monk >; Deucher, Alexander 
>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: deprecated register issues

Hi Monk,

I honestly don't have the slightest idea why we are still accessing 
CC_RB_BACKEND_DISABLE. Maybe it still contains some useful values?

Key point was that we needed to stop accessing it all the time to avoid 
triggering problems.

Regards,
Christian.

Am 07.03.2018 um 13:11 schrieb Liu, Monk:

Hi Christian



I remember you and AlexD mentioned that a handful registers are deprecated for 
greenland (gfx9)

e.g. CC_RB_BACKEND_DISABLE



do you know why we still have this routine ?


static u32
gfx_v9_0_get_rb_active_bitmap(struct amdgpu_device *adev)

{

u32 data, mask;



data = RREG32_SOC15(GC,
0, mmCC_RB_BACKEND_DISABLE);

data |= RREG32_SOC15(GC,
0, mmGC_USER_RB_BACKEND_DISABLE);



data &= CC_RB_BACKEND_DISABLE__BACKEND_DISABLE_MASK;

data >>= GC_USER_RB_BACKEND_DISABLE__BACKEND_DISABLE__SHIFT;



mask = amdgpu_gfx_create_bitmask(adev->gfx.config.max_backends_per_se /

 adev->gfx.config.max_sh_per_se);



return (~data) & mask;

}



see that it still read CC_RB_BACKEND_DISABLE



thanks



/Monk





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: Initial release of AMD Open Source Driver for Vulkan

2017-12-28 Thread Mao, David
Hi Dave, 
- How does AMD envisage development on this going forward, how will AMD 
internal developement team engage with community development efforts on this 
code base?
Our purpose of open-sourcing our vulkan driver is to accelerate the vulkan 
development by engagement between our internal development team and community.
We encourage users and developers to report issues  and raise feature requests 
on github issue tracker.
For detail, please refer to the "How to Contribute" section of README.md.( 
https://github.com/GPUOpen-Drivers/AMDVLK)

- How will code come be integrated from the AMD internal codebase into this 
codebase?
We will do the merge regularly to align the code base. 
For issues reported on github issue tracker , we will do the development work 
on github dev branch directly.

- How will external contributions be taken into this code base and merged 
internally?
The external contribution would issue a Pull request to the dev branch.
After it passes code review and basic test, maintainer will merge the commits 
to dev branch.
The changes in dev branch will be tested internally and regularly merged back 
to master branch.

- How often will this codebase be updated? every day/week/month/hardware 
release?
Tentatively, the code changes in our internal branch will be weekly merged into 
github master branch.

- Will llvm master eventually be shippable? Will new llvm features be developed 
in the open?
For changes in LLVM, normally/eventually we will contribute to master as long 
as the changes could be accepted by the reviewers.

- At the moment radv and anv cooperate on developing new features under 
Khronos, will AMD be open to developing new features internally at Khronos 
while they are still pre-ratification?
IIRC, the KHX extension is unlikely to be drafted anymore which is good for 
open source driver.
In general, we would prefer to working unratified extensions in private branch 
and push to open source branch once it get ratified.

Thanks. 
Best Regards,
David

-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com] 
Sent: Wednesday, December 27, 2017 3:50 AM
To: Mao, David <david@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Initial release of AMD Open Source Driver for Vulkan

On 22 December 2017 at 21:03, Mao, David <david@amd.com> wrote:
> We are pleased to announce the initial release of AMD Open Source 
> Driver for Vulkan.
>
>
>
> The AMD Open Source Driver for Vulkan is an open-source Vulkan driver 
> for Radeon graphics adapters on Linux. It is built on top of AMD's 
> Platform Abstraction Library (PAL), a shared component that is 
> designed to encapsulate certain hardware and OS-specific programming 
> details for many of AMD's 3D and compute drivers. Leveraging PAL can 
> help provide a consistent experience across platforms, including 
> support for recently released GPUs and compatibility with AMD developer tools.
>
>
>
> The driver uses the LLVM-Based Pipeline Compiler (LLPC) library to 
> compile shaders that compose a particular VkPipeline object. LLPC 
> builds on LLVM's existing shader compilation infrastructure for AMD 
> GPUs to generate code objects compatible with PAL's pipeline ABI.
>
>
>
> The AMD Open Source Driver for Vulkan is designed to support the 
> following
> features:
>
> - Vulkan 1.0
>
> - More than 30 extensions
>
> - Radeon GPUProfiler tracing
>
> - Built-in debug and profiling tools
>
> - Mid-command buffer preemption and SR-IOV virtualization
>
>
>
> The following features and improvements are planned in future releases:
>
> - Upcoming versions of the Vulkan API
>
> - Hardware performance counter collection through RenderDoc
>
> - LLPC optimizations to improve GPU-limited performance and compile 
> time
>
> - Optimizations to improve CPU-limited performance
>
>
>
> Please refer to  the README file under
> https://github.com/GPUOpen-Drivers/AMDVLK   for more information.  Looking
> forward to hearing your feedback.

Excellent!

Before I spend much time digging into the code, I am wondering how much thought 
on development model and future development has been put into this from AMD 
perspective.

How does AMD envisage development on this going forward, how will AMD internal 
developement team engage with community development efforts on this code base?

How will code come be integrated from the AMD internal codebase into this 
codebase?
How will external contributions be taken into this code base and merged 
internally?
How often will this codebase be updated? every day/week/month/hardware release?
Will llvm master eventually be shippable? Will new llvm features be developed 
in the open?

At the moment radv and anv cooperate on developing new features under Khronos, 
will AMD be open to developing new features internally at

Re: Initial release of AMD Open Source Driver for Vulkan

2017-12-22 Thread Mao, David
Hi Lothian,
Thanks for testing out out driver!
Officially we recommend you to stick to GCC5 for now, however, we do have a fix 
for the constexpr issue mentioned below that just didn’t make it to this first 
release.
According to your diff, are you using ICC?
Could you let us know the compiler version as well as your distro?

Thanks.
Best Regards,
David

On Dec 22, 2017, at 9:48 PM, Mike Lothian 
> wrote:

Congratulations on getting this out the door

It didn't compile for me without these changes:

In pal:

diff --git a/src/util/math.cpp b/src/util/math.cpp
index 46e9ede..3af4259 100644
--- a/src/util/math.cpp
+++ b/src/util/math.cpp
@@ -54,7 +54,7 @@ static uint32 Float32ToFloatN(float f, const NBitFloatInfo& 
info);
 static float FloatNToFloat32(uint32 fBits, const NBitFloatInfo& info);

 // Initialize the descriptors for various N-bit floating point representations:
-static constexpr NBitFloatInfo Float16Info =
+static NBitFloatInfo Float16Info =
 {
 16,   // numBits
 10,   // numFracBits
@@ -72,7 +72,7 @@ static constexpr NBitFloatInfo Float16Info =
 (23 - 10),// fracBitsDiff
 };

-static constexpr NBitFloatInfo Float11Info =
+static NBitFloatInfo Float11Info =
 {
 11,   // numBits
 6,// numFracBits
@@ -90,7 +90,7 @@ static constexpr NBitFloatInfo Float11Info =
 23 - 6,   // fracBitsDiff
 };

-static constexpr NBitFloatInfo Float10Info =
+static NBitFloatInfo Float10Info =
 {
 10,   // numBits
 5,// numFracBits

In xgl:

diff --git a/icd/CMakeLists.txt b/icd/CMakeLists.txt
index 4e4d669..5006184 100644
--- a/icd/CMakeLists.txt
+++ b/icd/CMakeLists.txt
@@ -503,16 +503,16 @@ if (UNIX)

 target_link_libraries(xgl PRIVATE c stdc++ ${CMAKE_DL_LIBS} pthread)

-if(NOT ICD_USE_GCC)
-message(WARNING "Intel ICC untested in CMake.")
-target_link_libraries(xgl PRIVATE -fabi-version=0 -static-intel)
-endif()
+#if(NOT ICD_USE_GCC)
+#message(WARNING "Intel ICC untested in CMake.")
+#target_link_libraries(xgl PRIVATE -fabi-version=0 -static-intel)
+#endif()

 if(CMAKE_BUILD_TYPE_RELEASE)
 if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
 execute_process(COMMAND ${CMAKE_C_COMPILER} -dumpversion 
OUTPUT_VARIABLE GCC_VERSION)
 if (GCC_VERSION VERSION_GREATER 5.3 OR GCC_VERSION VERSION_EQUAL 
5.3)
-target_link_libraries(xgl PRIVATE -flto=4  -fuse-linker-plugin 
-Wno-odr)
+target_link_libraries(xgl PRIVATE -Wno-odr)
 message(WARNING "LTO enabled for Linking")
 endif()
 endif()
@@ -530,17 +530,17 @@ if (UNIX)

 # CMAKE-TODO: What is whole-archive used for?
 #target_link_libraries(xgl -Wl,--whole-archive ${ICD_LIBS} 
-Wl,--no-whole-archive)
-if(CMAKE_BUILD_TYPE_RELEASE)
-execute_process(COMMAND ${CMAKE_C_COMPILER} -dumpversion 
OUTPUT_VARIABLE GCC_VERSION)
-if (GCC_VERSION VERSION_GREATER 5.3 OR GCC_VERSION VERSION_EQUAL 5.3)
-target_link_libraries(xgl PRIVATE -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/src/libpal.a -Wl,--no-whole-archive)
-target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/metrohash/libmetrohash.a -Wl,--no-whole-archive)
-target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/gpuopen/libgpuopen.a -Wl,--no-whole-archive)
-target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/vam/libvam.a -Wl,--no-whole-archive)
-target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/addrlib/libaddrlib.a -Wl,--no-whole-archive)
-target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/jemalloc/libjemalloc.a -Wl,--no-whole-archive)
-endif()
-endif()
+#if(CMAKE_BUILD_TYPE_RELEASE)
+#execute_process(COMMAND ${CMAKE_C_COMPILER} -dumpversion 
OUTPUT_VARIABLE GCC_VERSION)
+#if (GCC_VERSION VERSION_GREATER 5.3 OR GCC_VERSION VERSION_EQUAL 5.3)
+#target_link_libraries(xgl PRIVATE -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/src/libpal.a -Wl,--no-whole-archive)
+#target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/metrohash/libmetrohash.a -Wl,--no-whole-archive)
+#target_link_libraries(xgl PUBLIC -Wl,--whole-archive 
${PROJECT_BINARY_DIR}/pal/gpuopen/libgpuopen.a -Wl,--no-whole-archive)
+#target_link_libraries(xgl PUBLIC -Wl,--whole-archive 

Initial release of AMD Open Source Driver for Vulkan

2017-12-22 Thread Mao, David
We are pleased to announce the initial release of AMD Open Source Driver for 
Vulkan.



The AMD Open Source Driver for Vulkan is an open-source Vulkan driver for 
Radeon graphics adapters on Linux. It is built on top of AMD's Platform 
Abstraction Library (PAL), a shared component that is designed to encapsulate 
certain hardware and OS-specific programming details for many of AMD's 3D and 
compute drivers. Leveraging PAL can help provide a consistent experience across 
platforms, including support for recently released GPUs and compatibility with 
AMD developer tools.



The driver uses the LLVM-Based Pipeline Compiler (LLPC) library to compile 
shaders that compose a particular VkPipeline object. LLPC builds on LLVM's 
existing shader compilation infrastructure for AMD GPUs to generate code 
objects compatible with PAL's pipeline ABI.



The AMD Open Source Driver for Vulkan is designed to support the following 
features:

- Vulkan 1.0

- More than 30 extensions

- Radeon GPUProfiler tracing

- Built-in debug and profiling tools

- Mid-command buffer preemption and SR-IOV virtualization



The following features and improvements are planned in future releases:

- Upcoming versions of the Vulkan API

- Hardware performance counter collection through RenderDoc

- LLPC optimizations to improve GPU-limited performance and compile time

- Optimizations to improve CPU-limited performance



Please refer to  the README file under 
https://github.com/GPUOpen-Drivers/AMDVLK   for more information.  Looking 
forward to hearing your feedback.



Thanks,



The AMD driver team for Vulkan

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to create syncobj as signaled initially

2017-11-28 Thread Mao, David
I have never tried to commit the change before. So I guess the answer is no. 
Could you let me know, how I can apply for the commit right?

Thanks. 
Best Regards,
David

-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] 
Sent: Tuesday, November 28, 2017 9:29 PM
To: Mao, David <david@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to create 
syncobj as signaled initially

Reviewed-by: Christian König <christian.koe...@amd.com>

But in general for libdrm changes I would ping Marek, Nicolai, Michel and in 
this special case Dave Airlie because he added the patch with the missing flags 
field.

And I strongly assume you don't have commit rights, don't you?

Regards,
Christian.

Am 28.11.2017 um 14:22 schrieb Mao, David:
> Anyone can help to review the change?
> Thanks.
>
> Best Regards,
> David
>
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf 
> Of David Mao
> Sent: Tuesday, November 28, 2017 11:26 AM
> To: amd-gfx@lists.freedesktop.org
> Subject: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to 
> create syncobj as signaled initially
>
> Change-Id: Icf8d29bd4b50ee76936faacbbe099492cf0557cc
> Signed-off-by: David Mao <david@amd.com>
> ---
>   amdgpu/amdgpu.h| 15 +++
>   amdgpu/amdgpu_cs.c | 10 ++
>   2 files changed, 25 insertions(+)
>
> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 78fbd1e..47bdb3a 
> 100644
> --- a/amdgpu/amdgpu.h
> +++ b/amdgpu/amdgpu.h
> @@ -1727,6 +1727,21 @@ const char 
> *amdgpu_get_marketing_name(amdgpu_device_handle dev);
>   /**
>*  Create kernel sync object
>*
> + * \param   dev - \c [in]  device handle
> + * \param   flags   - \c [in]  flags that affect creation
> + * \param   syncobj - \c [out] sync object handle
> + *
> + * \return   0 on success\n
> + *  <0 - Negative POSIX Error code
> + *
> +*/
> +int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
> +   uint32_t  flags,
> +   uint32_t *syncobj);
> +
> +/**
> + *  Create kernel sync object
> + *
>* \param   dev   - \c [in]  device handle
>* \param   syncobj   - \c [out] sync object handle
>*
> diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 
> 64ad911..a9fbab9 100644
> --- a/amdgpu/amdgpu_cs.c
> +++ b/amdgpu/amdgpu_cs.c
> @@ -606,6 +606,16 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
> sem)
>   return amdgpu_cs_unreference_sem(sem);  }
>   
> +int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
> +   uint32_t  flags,
> +   uint32_t *handle)
> +{
> + if (NULL == dev)
> + return -EINVAL;
> +
> + return drmSyncobjCreate(dev->fd, flags, handle); }
> +
>   int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
>uint32_t *handle)
>   {
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to create syncobj as signaled initially

2017-11-28 Thread Mao, David
Anyone can help to review the change?
Thanks.

Best Regards,
David

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of David 
Mao
Sent: Tuesday, November 28, 2017 11:26 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to create 
syncobj as signaled initially

Change-Id: Icf8d29bd4b50ee76936faacbbe099492cf0557cc
Signed-off-by: David Mao <david@amd.com>
---
 amdgpu/amdgpu.h| 15 +++
 amdgpu/amdgpu_cs.c | 10 ++
 2 files changed, 25 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 78fbd1e..47bdb3a 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1727,6 +1727,21 @@ const char 
*amdgpu_get_marketing_name(amdgpu_device_handle dev);
 /**
  *  Create kernel sync object
  *
+ * \param   dev - \c [in]  device handle
+ * \param   flags   - \c [in]  flags that affect creation
+ * \param   syncobj - \c [out] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
+ uint32_t  flags,
+ uint32_t *syncobj);
+
+/**
+ *  Create kernel sync object
+ *
  * \param   dev  - \c [in]  device handle
  * \param   syncobj   - \c [out] sync object handle
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 64ad911..a9fbab9 
100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -606,6 +606,16 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem)
return amdgpu_cs_unreference_sem(sem);  }
 
+int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
+ uint32_t  flags,
+ uint32_t *handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjCreate(dev->fd, flags, handle); }
+
 int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
 uint32_t *handle)
 {
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm: Adding amdgpu_cs_create_syncobj2 to be able to create sync object as signaled initially

2017-11-23 Thread Mao, David
Signed-off-by: David Mao <david@amd.com>
---
 amdgpu/amdgpu.h| 15 +++
 amdgpu/amdgpu_cs.c | 10 ++
 2 files changed, 25 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 78fbd1e..47bdb3a 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1727,6 +1727,21 @@ const char 
*amdgpu_get_marketing_name(amdgpu_device_handle dev);
 /**
  *  Create kernel sync object
  *
+ * \param   dev - \c [in]  device handle
+ * \param   flags   - \c [in]  flags that affect creation
+ * \param   syncobj - \c [out] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
+ uint32_t  flags,
+ uint32_t *syncobj);
+
+/**
+ *  Create kernel sync object
+ *
  * \param   dev  - \c [in]  device handle
  * \param   syncobj   - \c [out] sync object handle
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 64ad911..76ce7fc 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -606,6 +606,16 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem)
return amdgpu_cs_unreference_sem(sem);
 }

+int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
+   uint32_t  flags,
+   uint32_t *handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjCreate(dev->fd, flags, handle);
+}
+
 int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
 uint32_t *handle)
 {
--
2.7.4
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Mao, David
Thanks Monk for the summary!

Hi Nicolai,
In order to block the usage of new context reference the old allocation, i 
think we need to do something in UMD so that KMD don’t need to monitor the 
resource list.
I want to make sure we are on the same page.
If you agree, then there might have two options to do that in UMD: (You can do 
whatever you want, i just want to elaborate the idea a little bit to facilitate 
the discussion).
-  If sharelist is valid, driver need to compare the current vram_lost_count 
and share list’s vram_lost count, The context will fail to create if share list 
created before the reset.
- Or, we can copy the vram_lost count from the sharelist, kernel will fail the 
submission if the vram_lost count is smaller than current one.
I personally want to go first for OrcaGL.

Thanks.
Best Regards,
David
-
On 12 Oct 2017, at 4:03 PM, Liu, Monk 
> wrote:

V2 summary

Hi team

please give your comments

•  When a job timed out (set from lockup_timeout kernel parameter), What KMD 
should do in TDR routine :

1.Update adev->gpu_reset_counter, and stop scheduler first
2.Set its fence error status to “ECANCELED”,
3.Find the context behind this job, and set this context as “guilty” 
(will have a new member field in context structure – bool guilty)
a)   There will be “bool * guilty” in entity structure, which points to its 
father context’s member – “bool guilty” when context initialized , so no matter 
we get context or entity, we always know if it is “guilty”
b)   For kernel entity that used for VM updates, there is no context back 
it, so kernel entity’s “bool *guilty” always “NULL”.
c)The idea to skip the whole context is for consistence consideration, 
because we’ll fake signal the hang job in job_run(), so all jobs in its context 
shall be dropped otherwise either bad drawing/computing results or more GPU 
hang.

4.Do GPU reset, which is can be some callbacks to let bare-metal and 
SR-IOV implement with their favor style
5.After reset, KMD need to aware if the VRAM lost happens or not, 
bare-metal can implement some function to judge, while for SR-IOV I prefer to 
read it from GIM side (for initial version we consider it’s always VRAM lost, 
till GIM side change aligned)
6.If VRAM lost hit, update adev->vram_lost_counter.
7.Do GTT recovery and shadow buffer recovery.
8.Re-schedule all JOBs in mirror list and restart scheduler

•  For GPU scheduler function --- job_run()
1.Before schedule a job to ring, checks if job->vram_lost_counter == 
adev->vram_lost_counter, and drop this job if mismatch
2.Before schedule a job to ring, checks if job->entity->guilty is NULL 
or not, and drop this job if (guilty!=NULL && *guilty == TRUE)
3.if a job is dropped:
a)   set job’s sched_fence status to “ECANCELED”
b)   fake/force signal job’s hw fence (no need to set hw fence’s status)

•  For cs_wait() IOCTL:
After it found fence signaled, it should check if there is error on this fence 
and return the error status of this fence

•  For cs_wait_fences() IOCTL:
Similar with above approach

•  For cs_submit() IOCTL:
1.check if current ctx been marked “guilty” and return “ECANCELED”  if 
so.
2.set job->vram_lost_counter with adev->vram_lost_counter, and return 
“ECANCELED” if ctx->vram_lost_counter != job->vram_lost_counter(Christian 
already submitted this patch)
a)   discussion: can we return “ENODEV” if vram_lost_counter mismatch ? 
that way UMD know this context is under “device lost”

•  Introduce a new IOCTL to let UMD query latest adev->vram_lost_counter:

•  For amdgpu_ctx_query():
•  Don’t update ctx->reset_counter when querying this function, otherwise the 
query result is not consistent
•  Set out->state.reset_status to “AMDGPU_CTX_GUILTY_RESET” if the ctx is 
“guilty”, no need to check “ctx->reset_counter”
•  Set out->state.reset_status to “AMDGPU_CTX_INNOCENT_RESET”if the ctx isn’t 
“guilty” && ctx->reset_counter != adev->reset_counter
•  Set out->state.reset_status to “AMDGPU_CTX_NO_RESET” if ctx->reset_counter 
== adev->reset_counter
•  Set out->state.flags to “AMDGPU_CTX_FLAG_VRAM_LOST” if 
ctx->vram_lost_counter != adev->vram_lost_counter
•  discussion: can we return “ENODEV” for amdgpu_ctx_query() if 
ctx->vram_lost_counter != adev->vram_lost_counter ? that way UMD know this 
context is under “device lost”
•  UMD shall release this context if it is AMDGPU_CTX_GUILTY_RESET or its flags 
is “AMDGPU_CTX_FLAG_VRAM_LOST”

For UMD behavior we still have something need to consider:
If MESA creates a new context from an old context (share list?? I’m not 
familiar with UMD , David Mao shall have some discuss on it with Nicolai), the 
new created context’s vram_lost_counter
And reset_counter shall all be ported from that old context , otherwise 
CS_SUBMIT will not block it which isn’t correct



Need your feedback, thx


From: amd-gfx 

Re: [PATCH] drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2

2017-09-19 Thread Mao, David
Hi Andres,
The explicit sync should not be used for DrI3  and DRI2 but for cross process 
memory sharing, right?
We still have to rely on implicit sync to guarantee the. Correct order of 
rendering and present.
Could you confirm?

Thanks.

Sent from my iPhone

On 19 Sep 2017, at 9:57 PM, Andres Rodriguez 
> wrote:



On 2017-09-19 09:24 AM, Christian König wrote:
Am 19.09.2017 um 14:59 schrieb Andres Rodriguez:
Introduce a flag to signal that access to a BO will be synchronized
through an external mechanism.

Currently all buffers shared between contexts are subject to implicit
synchronization. However, this is only required for protocols that
currently don't support an explicit synchronization mechanism (DRI2/3).

This patch introduces the AMDGPU_GEM_CREATE_EXPLICIT_SYNC, so that
users can specify when it is safe to disable implicit sync.

v2: only disable explicit sync in amdgpu_cs_ioctl

Signed-off-by: Andres Rodriguez >
---

Hey Christian,

I kept the amdgpu_bo_explicit_sync() function since it makes it easier
to maintain an 80 line wrap in amdgpu_cs_sync_rings()
Looks good to me, but I would like to see the matching user space code as well.
Especially I have no idea how you want to have DRI3 compatibility with that?

No problem. I'm fixing the radv patch atm and I'll re-send it for your 
reference.

Regards,
Andres

Regards,
Christian.

Regards,
Andres

  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 4 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 8 
  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   | 7 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h   | 3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 5 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 
  include/uapi/drm/amdgpu_drm.h  | 2 ++
  8 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index db97e78..bc8a403 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -704,7 +704,8 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
  list_for_each_entry(e, >validated, tv.head) {
  struct reservation_object *resv = e->robj->tbo.resv;
-r = amdgpu_sync_resv(p->adev, >job->sync, resv, p->filp);
+r = amdgpu_sync_resv(p->adev, >job->sync, resv, p->filp,
+ amdgpu_bo_explicit_sync(e->robj));
  if (r)
  return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index b0d45c8..21e9936 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -212,7 +212,9 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
AMDGPU_GEM_CREATE_CPU_GTT_USWC |
AMDGPU_GEM_CREATE_VRAM_CLEARED |
-  AMDGPU_GEM_CREATE_VM_ALWAYS_VALID))
+  AMDGPU_GEM_CREATE_VM_ALWAYS_VALID |
+  AMDGPU_GEM_CREATE_EXPLICIT_SYNC))
+
  return -EINVAL;
  /* reject invalid gem domains */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index c26ef53..428aae0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -193,6 +193,14 @@ static inline bool amdgpu_bo_gpu_accessible(struct 
amdgpu_bo *bo)
  }
  }
+/**
+ * amdgpu_bo_explicit_sync - return whether the bo is explicitly synced
+ */
+static inline bool amdgpu_bo_explicit_sync(struct amdgpu_bo *bo)
+{
+return bo->flags & AMDGPU_GEM_CREATE_EXPLICIT_SYNC;
+}
+
  int amdgpu_bo_create(struct amdgpu_device *adev,
  unsigned long size, int byte_align,
  bool kernel, u32 domain, u64 flags,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index c586f44..a4bf21f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -169,14 +169,14 @@ int amdgpu_sync_fence(struct amdgpu_device *adev, struct 
amdgpu_sync *sync,
   *
   * @sync: sync object to add fences from reservation object to
   * @resv: reservation object with embedded fence
- * @shared: true if we should only sync to the exclusive fence
+ * @explicit_sync: true if we should only sync to the exclusive fence
   *
   * Sync to the fence
   */
  int amdgpu_sync_resv(struct amdgpu_device *adev,
   struct amdgpu_sync *sync,
   struct reservation_object *resv,
- void *owner)
+ void *owner, bool explicit_sync)
  {
  struct reservation_object_list *flist;
  struct dma_fence *f;
@@ -191,6 +191,9 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
  f = reservation_object_get_excl(resv);
 

Re: [PATCH libdrm] amdgpu: revert semaphore support

2017-07-11 Thread Mao, David
Hi Christian,
When will sync object support landed in upstream kernel, which version in 
specific?
We still rely on legacy semaphore implementation and we have to use it if sync 
object still takes time.
Thanks.  
Best Regards,
David
> On 11 Jul 2017, at 5:15 PM, Christian König  wrote:
> 
> From: Christian König 
> 
> This reverts commit 6b79c66b841dded6ffa6b56f14e4eb10a90a7c07
> and commit 6afadeaf13279fcdbc48999f522e1dc90a9dfdaf.
> 
> Semaphore support was never used by any open source project and
> not even widely by any closed source driver.
> 
> This should be replaced by sync object support.
> 
> Signed-off-by: Christian König 
> ---
> amdgpu/amdgpu.h  |  65 -
> amdgpu/amdgpu_cs.c   | 237 +--
> amdgpu/amdgpu_internal.h |  15 ---
> 3 files changed, 5 insertions(+), 312 deletions(-)
> 
> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
> index 1901fa8..8bf57a4 100644
> --- a/amdgpu/amdgpu.h
> +++ b/amdgpu/amdgpu.h
> @@ -128,11 +128,6 @@ typedef struct amdgpu_bo_list *amdgpu_bo_list_handle;
>  */
> typedef struct amdgpu_va *amdgpu_va_handle;
> 
> -/**
> - * Define handle for semaphore
> - */
> -typedef struct amdgpu_semaphore *amdgpu_semaphore_handle;
> -
> /*--*/
> /* -- Structures -- */
> /*--*/
> @@ -1259,66 +1254,6 @@ int amdgpu_bo_va_op_raw(amdgpu_device_handle dev,
>   uint32_t ops);
> 
> /**
> - *  create semaphore
> - *
> - * \param   sem - \c [out] semaphore handle
> - *
> - * \return   0 on success\n
> - *  <0 - Negative POSIX Error code
> - *
> -*/
> -int amdgpu_cs_create_semaphore(amdgpu_semaphore_handle *sem);
> -
> -/**
> - *  signal semaphore
> - *
> - * \param   context- \c [in] GPU Context
> - * \param   ip_type- \c [in] Hardware IP block type = AMDGPU_HW_IP_*
> - * \param   ip_instance- \c [in] Index of the IP block of the same type
> - * \param   ring   - \c [in] Specify ring index of the IP
> - * \param   sem - \c [in] semaphore handle
> - *
> - * \return   0 on success\n
> - *  <0 - Negative POSIX Error code
> - *
> -*/
> -int amdgpu_cs_signal_semaphore(amdgpu_context_handle ctx,
> -uint32_t ip_type,
> -uint32_t ip_instance,
> -uint32_t ring,
> -amdgpu_semaphore_handle sem);
> -
> -/**
> - *  wait semaphore
> - *
> - * \param   context- \c [in] GPU Context
> - * \param   ip_type- \c [in] Hardware IP block type = AMDGPU_HW_IP_*
> - * \param   ip_instance- \c [in] Index of the IP block of the same type
> - * \param   ring   - \c [in] Specify ring index of the IP
> - * \param   sem - \c [in] semaphore handle
> - *
> - * \return   0 on success\n
> - *  <0 - Negative POSIX Error code
> - *
> -*/
> -int amdgpu_cs_wait_semaphore(amdgpu_context_handle ctx,
> -  uint32_t ip_type,
> -  uint32_t ip_instance,
> -  uint32_t ring,
> -  amdgpu_semaphore_handle sem);
> -
> -/**
> - *  destroy semaphore
> - *
> - * \param   sem  - \c [in] semaphore handle
> - *
> - * \return   0 on success\n
> - *  <0 - Negative POSIX Error code
> - *
> -*/
> -int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle sem);
> -
> -/**
>  *  Get the ASIC marketing name
>  *
>  * \param   dev - \c [in] Device handle. See 
> #amdgpu_device_initialize()
> diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
> index 868eb7b..c0794d2 100644
> --- a/amdgpu/amdgpu_cs.c
> +++ b/amdgpu/amdgpu_cs.c
> @@ -40,9 +40,6 @@
> #include "amdgpu_drm.h"
> #include "amdgpu_internal.h"
> 
> -static int amdgpu_cs_unreference_sem(amdgpu_semaphore_handle sem);
> -static int amdgpu_cs_reset_sem(amdgpu_semaphore_handle sem);
> -
> /**
>  * Create command submission context
>  *
> @@ -56,7 +53,6 @@ int amdgpu_cs_ctx_create(amdgpu_device_handle dev,
> {
>   struct amdgpu_context *gpu_context;
>   union drm_amdgpu_ctx args;
> - int i, j, k;
>   int r;
> 
>   if (!dev || !context)
> @@ -68,10 +64,6 @@ int amdgpu_cs_ctx_create(amdgpu_device_handle dev,
> 
>   gpu_context->dev = dev;
> 
> - r = pthread_mutex_init(_context->sequence_mutex, NULL);
> - if (r)
> - goto error;
> -
>   /* Create the context */
>   memset(, 0, sizeof(args));
>   args.in.op = AMDGPU_CTX_OP_ALLOC_CTX;
> @@ -80,16 +72,11 @@ int amdgpu_cs_ctx_create(amdgpu_device_handle dev,
>   goto error;
> 
>   gpu_context->id = args.out.alloc.ctx_id;
> - for (i = 0; i < AMDGPU_HW_IP_NUM; i++)
> - for (j 

Re: [PATCH libdrm] amdgpu: revert semaphore support

2017-07-11 Thread Mao, David
This is not true.
Can we keep this for a while until we have sync object in place?

Thanks.
Best Regards,
David
> On 11 Jul 2017, at 5:28 PM, Christian König <deathsim...@vodafone.de> wrote:
> 
> I hoped that Dave Airlied will land it together with this patch.
> 
> As far as I know the closed source driver already doesn't use that any more 
> either.
> 
> Regards,
> Christian.
> 
> Am 11.07.2017 um 11:20 schrieb Mao, David:
>> Hi Christian,
>> When will sync object support landed in upstream kernel, which version in 
>> specific?
>> We still rely on legacy semaphore implementation and we have to use it if 
>> sync object still takes time.
>> Thanks.
>> Best Regards,
>> David
>>> On 11 Jul 2017, at 5:15 PM, Christian König <deathsim...@vodafone.de> wrote:
>>> 
>>> From: Christian König <christian.koe...@amd.com>
>>> 
>>> This reverts commit 6b79c66b841dded6ffa6b56f14e4eb10a90a7c07
>>> and commit 6afadeaf13279fcdbc48999f522e1dc90a9dfdaf.
>>> 
>>> Semaphore support was never used by any open source project and
>>> not even widely by any closed source driver.
>>> 
>>> This should be replaced by sync object support.
>>> 
>>> Signed-off-by: Christian König <christian.koe...@amd.com>
>>> ---
>>> amdgpu/amdgpu.h  |  65 -
>>> amdgpu/amdgpu_cs.c   | 237 
>>> +--
>>> amdgpu/amdgpu_internal.h |  15 ---
>>> 3 files changed, 5 insertions(+), 312 deletions(-)
>>> 
>>> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
>>> index 1901fa8..8bf57a4 100644
>>> --- a/amdgpu/amdgpu.h
>>> +++ b/amdgpu/amdgpu.h
>>> @@ -128,11 +128,6 @@ typedef struct amdgpu_bo_list *amdgpu_bo_list_handle;
>>>  */
>>> typedef struct amdgpu_va *amdgpu_va_handle;
>>> 
>>> -/**
>>> - * Define handle for semaphore
>>> - */
>>> -typedef struct amdgpu_semaphore *amdgpu_semaphore_handle;
>>> -
>>> /*--*/
>>> /* -- Structures -- 
>>> */
>>> /*--*/
>>> @@ -1259,66 +1254,6 @@ int amdgpu_bo_va_op_raw(amdgpu_device_handle dev,
>>> uint32_t ops);
>>> 
>>> /**
>>> - *  create semaphore
>>> - *
>>> - * \param   sem   - \c [out] semaphore handle
>>> - *
>>> - * \return   0 on success\n
>>> - *  <0 - Negative POSIX Error code
>>> - *
>>> -*/
>>> -int amdgpu_cs_create_semaphore(amdgpu_semaphore_handle *sem);
>>> -
>>> -/**
>>> - *  signal semaphore
>>> - *
>>> - * \param   context- \c [in] GPU Context
>>> - * \param   ip_type- \c [in] Hardware IP block type = 
>>> AMDGPU_HW_IP_*
>>> - * \param   ip_instance- \c [in] Index of the IP block of the same type
>>> - * \param   ring   - \c [in] Specify ring index of the IP
>>> - * \param   sem   - \c [in] semaphore handle
>>> - *
>>> - * \return   0 on success\n
>>> - *  <0 - Negative POSIX Error code
>>> - *
>>> -*/
>>> -int amdgpu_cs_signal_semaphore(amdgpu_context_handle ctx,
>>> -  uint32_t ip_type,
>>> -  uint32_t ip_instance,
>>> -  uint32_t ring,
>>> -  amdgpu_semaphore_handle sem);
>>> -
>>> -/**
>>> - *  wait semaphore
>>> - *
>>> - * \param   context- \c [in] GPU Context
>>> - * \param   ip_type- \c [in] Hardware IP block type = 
>>> AMDGPU_HW_IP_*
>>> - * \param   ip_instance- \c [in] Index of the IP block of the same type
>>> - * \param   ring   - \c [in] Specify ring index of the IP
>>> - * \param   sem   - \c [in] semaphore handle
>>> - *
>>> - * \return   0 on success\n
>>> - *  <0 - Negative POSIX Error code
>>> - *
>>> -*/
>>> -int amdgpu_cs_wait_semaphore(amdgpu_context_handle ctx,
>>> -uint32_t ip_type,
>>> -uint32_t ip_instance,
>>> -uint32_t ring,
>>> -amdgpu_semaphore_handle sem);
>>> -
>>> -/*

Re: Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-06-29 Thread Mao, David
Sounds good!
One thing to confirm, If the original location is already in the invisible, 
will the notifier callback to move the bo from invisible to visible?  if it is 
true and the logic is already available in the kernel, can we use NO_CPU_ACCESS 
flag by default to accomplish the similar purpose for now?
It also reminds me of another related topic, can we always take visible heap as 
priority against to the remote in this case?
So far, kernel don’t have the heap priority.
IIRC, if the LFB bo moved to GTT, it will never be moved back since GTT is also 
its preferred heap. (Kernel seems to add the GTT even if the UMD only ask for 
LFB).

Thanks.
Best Regards,
David
On 30 Jun 2017, at 11:36 AM, Michel Dänzer 
<mic...@daenzer.net<mailto:mic...@daenzer.net>> wrote:

On 30/06/17 10:55 AM, Mao, David wrote:
Vulkan allows the application to decide whether it wants the allocation
to be host visible and device local.
If we drop the flag, what will happen if we did not set the
NO_CPU_ACCESS flag?
Will it fail the map in case the allocation could be placed in invisible
heap then?

No, it'll work just as well. On attempted CPU access,
amdgpu_bo_fault_reserve_notify will ensure that it's CPU accessible.

The difference is that it'll allow BOs which aren't being actively
accessed by the CPU to be in CPU invisible VRAM, reducing pressure on
CPU visible VRAM.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-06-29 Thread Mao, David
Vulkan allows the application to decide whether it wants the allocation to be 
host visible and device local.
If we drop the flag, what will happen if we did not set the NO_CPU_ACCESS flag?
Will it fail the map in case the allocation could be placed in invisible heap 
then?

Thanks.
Best Regards,
David
On 29 Jun 2017, at 11:03 PM, Marek Olšák 
> wrote:

Do you have any concern if we also stop using the CPU_ACCESS flag on radeon?

Thanks,
Marek

On Thu, Jun 29, 2017 at 4:51 PM, Christian König
> wrote:
Yeah, I was thinking something similar.

See the intention behind CPU_ACCESS_REQUIRED is to always guarantee that CPU
access is immediately possible.

If you ask me that is not really useful for the UMD and was never meant to
be used by Mesa (only the closed source UMD and some kernel internal use
cases).

I would like to keep the behavior in the kernel driver as it is, but we
should really stop using this as a hint in Mesa.

Regards,
Christian.


Am 29.06.2017 um 16:41 schrieb Marek Olšák:

Hi,

Given how our memory manager works and the guesswork that UMDs have to
do to determine whether to set the flag, I think the flag isn't
useful.

I'm proposing that CPU_ACCESS_REQUIRED:
- will be deprecated.
- It will remain to be accepted by the kernel driver, but it will
either not have any effect, or it will serve as a hint that might or
might not be followed.
- The only flag that UMDs are expected to set with regard to CPU
access is NO_CPU_ACCESS.

The main motivation is the reduction of "virtual" heaps for UMD buffer
suballocators and reusable buffer pools. A higher number of heaps
means that more memory can be wasted by UMDs.

Opinions?

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Exclusive gpu access for SteamVR usecases

2017-05-26 Thread Mao, David
Hi Andres,
Why the fd is needed for this interface?
Why not just using the dev->fd instead of fd?
IIRC, if there are more than one fds opened in the process upon the same 
device, they will share the same amdgpu_device_handle which is guaranteed by 
amdgpu_device_initialize.
In other word, we should not run into the case that user creates more contexts 
with newly opened fd after tuning the priority of existing context in the same 
process unless the previous fd is closed.

Thanks.
Best Regards,
David

On 25 May 2017, at 8:00 AM, Andres Rodriguez 
> wrote:

When multiple environments are running simultaneously on a system, e.g.
an X desktop + a SteamVR game session, it may be useful to sacrifice
performance in one environment in order to boost it on the other.

This series provides a mechanism for a DRM_MASTER to provide exclusive
gpu access to a group of processes.

Note: This series is built on the assumption that the drm lease patch series
will extend DRM_MASTER status to lesees.

The libdrm we intend to provide is as follows:

/**
* Set the priority of all contexts in a process
*
* This function will change the priority of all contexts owned by
* the process identified by fd.
*
* \param dev - \c [in] device handle
* \param fd  - \c [in] fd from target process
* \param priority- \c [in] target priority AMDGPU_CTX_PRIORITY_*
*
* \return  0 on success\n
* <0 - Negative POSIX error code
*
* \notes @fd can be *any* file descriptor from the target process.
* \notes this function requires DRM_MASTER
*/
int amdgpu_sched_process_priority_set(amdgpu_device_handle dev,
 int fd, int32_t priority);

/**
* Request to raise the minimum required priority to schedule a gpu job
*
* Submit a request to increase the minimum required priority to schedule
* a gpu job. Once this function returns, the gpu scheduler will no longer
* consider jobs from contexts with priority lower than @priority.
*
* The minimum priority considered by the scheduler will be the highest from
* all currently active requests.
*
* Requests are refcounted, and must be balanced using
* amdgpu_sched_min_priority_put()
*
* \param dev - \c [in] device handle
* \param priority- \c [in] target priority AMDGPU_CTX_PRIORITY_*
*
* \return  0 on success\n
* <0 - Negative POSIX error code
*
* \notes this function requires DRM_MASTER
*/
int amdgpu_sched_min_priority_get(amdgpu_device_handle dev,
 int32_t priority);

/**
* Drop a request to raise the minimum required scheduler priority
*
* This call balances amdgpu_sched_min_priority_get()
*
* If no other active requests exists for @priority, the minimum required
* priority will decay to a lower level until one is reached with an active
* request or the lowest priority is reached.
*
* \param dev - \c [in] device handle
* \param priority- \c [in] target priority AMDGPU_CTX_PRIORITY_*
*
* \return  0 on success\n
* <0 - Negative POSIX error code
*
* \notes this function requires DRM_MASTER
*/
int amdgpu_sched_min_priority_put(amdgpu_device_handle dev,
 int32_t priority);

Using this app, VRComposer can raise the priority of the VRapp and itself. Then
it can restrict the minimum scheduler priority in order to become exclusive gpu
clients.

One of the areas I'd like feedback is the following scenario. If a VRapp opens
a new fd and creates a new context after a call to set_priority, this specific
context will be lower priority than the rest. If the minimum required priority
is then raised, it is possible that this new context will be starved and
deadlock the VRapp.

One solution I had in mind to address this situation, is to make set_priority
also raise the priority of future contexts created by the VRapp. However, that
would require keeping track of the requested priority on a per-process data
structure. The current design appears to steer clean of keeping any process
specific data, and everything instead of stored on a per-file basis. Which is
why I did not pursue this approach. But if this is something you'd like me to
implement let me know.

One could also argue that preventing an application deadlock should be handled
between the VRComposer and the VRApp. It is not the kernel's responsibility to
babysit userspace applications and prevent themselves from shooting themselves
in the foot. The same could be achieved by improper usage of shared fences
between processes.

Thoughts/feedback/comments on this issue, or others, are appreciated.

Regards,
Andres


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

2017-04-11 Thread Mao, David
Does it means we have to submit command to trigger the semaphore wait/signal?

Best Regards,
David

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Dave 
Airlie
Sent: Tuesday, April 11, 2017 11:22 AM
To: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

From: Dave Airlie 

This creates a new command submission chunk for amdgpu to add wait and signal 
sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new chunks, one for 
semaphore waiting, one for semaphore signalling and just takes a list of 
handles for each.

This is based on work originally done by David Zhou at AMD, with input from 
Christian Konig on what things should look like.

NOTE: this interface addition needs a version bump to expose it to userspace.

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 82 - 
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 include/uapi/drm/amdgpu_drm.h   |  6 +++
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index df25b32..77bfe80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 
@@ -217,6 +218,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void 
*data)
break;
 
case AMDGPU_CHUNK_ID_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SEM_WAIT:
+   case AMDGPU_CHUNK_ID_SEM_SIGNAL:
break;
 
default:
@@ -1008,6 +1011,41 @@ static int amdgpu_process_fence_dep(struct 
amdgpu_cs_parser *p,
return 0;
 }
 
+static int amdgpu_sem_lookup_and_sync(struct amdgpu_cs_parser *p,
+ uint32_t handle)
+{
+   int r;
+   struct dma_fence *old_fence;
+
+   r = drm_syncobj_swap_fences(p->filp, handle, NULL, _fence);
+   if (r)
+   return r;
+
+   r = amdgpu_sync_fence(p->adev, >job->sync, old_fence);
+   dma_fence_put(old_fence);
+
+   return r;
+}
+
+static int amdgpu_process_sem_wait_dep(struct amdgpu_cs_parser *p,
+  struct amdgpu_cs_chunk *chunk) {
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_sem_lookup_and_sync(p, deps[i].handle);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
 static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
  struct amdgpu_cs_parser *p)
 {
@@ -1022,12 +1060,54 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
r = amdgpu_process_fence_dep(p, chunk);
if (r)
return r;
+   } else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SEM_WAIT) {
+   r = amdgpu_process_sem_wait_dep(p, chunk);
+   if (r)
+   return r;
}
}
 
return 0;
 }
 
+static int amdgpu_process_sem_signal_dep(struct amdgpu_cs_parser *p,
+struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = drm_syncobj_replace_fence(p->filp, deps[i].handle,
+ p->fence);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p) {
+   int i, r;
+
+   for (i = 0; i < p->nchunks; ++i) {
+   struct amdgpu_cs_chunk *chunk;
+
+   chunk = >chunks[i];
+
+   if (chunk->chunk_id == AMDGPU_CHUNK_ID_SEM_SIGNAL) {
+   r = amdgpu_process_sem_signal_dep(p, chunk);
+   if (r)
+   return r;
+   }
+   }
+   return 0;
+}
+
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
union 

RE: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

2017-04-11 Thread Mao, David
=> we'd really want to pass a semaphore between the X server and client to do 
this perfectly.
Do you means that you want X to signal the semaphore that waited by client, 
through special version of xsync?
We use pretty complex tricks to build synchronization logic upon the event and 
shm fence. 
But it would be better if we could use unified way to signal/wait the xsync 
fence and the semaphore object.
I can see the benefit for combining the semaphores' wait/signal into the submit 
routine, but how about extend the interface to allow null ib submission?
In this case, it will always return the last seq_no for null ib list, and the 
semaphore in signal list will be associated with last fence as well. 
IIRC, the semaphore wait is applied to schedule entity as the dependency, which 
means it don't need to be associated with schedule job as well. 

Thanks. 
Best Regards,
David
-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com] 
Sent: Wednesday, April 12, 2017 11:58 AM
To: Mao, David <david@amd.com>
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: Re: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

On 12 April 2017 at 13:34, Mao, David <david@amd.com> wrote:
> My point is it is reasonable to split the semaphore signal/wait with the 
> command submission.
> For the signal ioctl, we could just pick the last fence in the same schedule 
> context, and we don't need to ask for a explicit flush or a dummy submission 
> trick.
> The spec guarantee the signal always comes before the wait, which means, we 
> could always get the valid fence. For the kernel sem object.

I'm a bit vague on the schedule contexts stuff, but does anything guarantee the 
X server present operation be in the same schedule context?

This might be something for Christian to chime in on, we could I suppose add 
ioctls to avoid the dummy CS submission, we could also make dummy CS submission 
simpler, if we submit no IBs then we could just have it deal with the 
semaphores for those cases and avoid any explicit flushes, which saves 
reproducing the logic to wait and sync.

But at least for the wait case, we need to send something to the scheduler to 
wait on, and that looks like the CS ioctl we have now pretty much, For the 
signal case there might be a better argument that an explicit signal with last 
fence on this ctx could be used, however at least with the way radv works now, 
we definitely know the X server is finished with the present buffer as it tells 
us via its own sync logic, at that point radv submits an empty CS with the 
signal semaphores, we'd really want to pass a semaphore between the X server 
and client to do this perfectly.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

2017-04-11 Thread Mao, David
My point is it is reasonable to split the semaphore signal/wait with the 
command submission.
For the signal ioctl, we could just pick the last fence in the same schedule 
context, and we don't need to ask for a explicit flush or a dummy submission 
trick. 
The spec guarantee the signal always comes before the wait, which means, we 
could always get the valid fence. For the kernel sem object. 

Thanks. 
Best Regards,
David

-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com] 
Sent: Wednesday, April 12, 2017 11:18 AM
To: Mao, David <david@amd.com>
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: Re: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

On 12 April 2017 at 12:49, Mao, David <david@amd.com> wrote:
> But how to handle the semaphore wait in the vkQueuePresentkHR?

The problem here is that really we'd want the presenting process to do the 
signal once it submits the work for actual presentations (be that the X server 
DDX or whatever).

However that is going to be a bit tricky, for radv I've just been submitting an 
empty command stream submit, once the X server lets us know we've presented.

I looked how the codebase before I started working on it worked, and I can't 
see if it dealt with this properly either, the impression I get is that it 
might submit the wait sems via the sem ioctl onto a ctx, but the X server might 
be using a different ctx, so would never execute the wait, and we'd execute the 
wait the next time we did a command submission.

I suppose we could just queue up the vkQueuePresentKHR wait sems in userspace 
instead of a NULL cs if this solution was acceptable.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

2017-04-11 Thread Mao, David
But how to handle the semaphore wait in the vkQueuePresentkHR?

Thanks. 
Best Regards,
David

-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com] 
Sent: Wednesday, April 12, 2017 10:44 AM
To: Mao, David <david@amd.com>
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: Re: [PATCH 8/8] amdgpu: use sync file for shared semaphores (v2.1)

On 12 April 2017 at 12:36, Mao, David <david@amd.com> wrote:
> Does it means we have to submit command to trigger the semaphore wait/signal?

Yes, but I think that should be fine, we need to submit a job to the scheduler 
to get the waits to happen or to have a fence to fill into the signals.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 1/1] drm/amdgpu: export gfx config double offchip LDS buffers (v3)

2017-02-20 Thread Mao, David
Thanks for the Info, but it seems the PACKAGE_VERSION is not the version i am 
looking for.
Is there any ways that libdrm.so/libdrm_amdgpu.so can provide in the runtime to 
indicate the ABI state besides the SO_VERSION?
I may want to compile the driver against the latest libdrm but still hope it 
can co-work with previous version libdrm/libdrm_amdgpu. 
Is it not supported?

Best Regards,
David
> On 20 Feb 2017, at 8:45 PM, Emil Velikov <emil.l.veli...@gmail.com> wrote:
> 
> On 20 February 2017 at 09:20, Mao, David <david@amd.com> wrote:
>> Hi Jerry & Christian,
>> Regarding to the version control, what is the rule of bumping the 
>> PACKAGE_VERSION of libdrm?
>> We may need something, like the size of gpu_info structure or the something 
>> like PACKAGE VERSION of the running libdrm_amdgpu.so, to ensure that we did 
>> not breach the ABI.
>> 
> Please see the RELEASING (and include/drm/README) documents in libdrm.
> 
> Thanks
> Emil

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v3 1/1] drm/amdgpu: export gfx config double offchip LDS buffers (v3)

2017-02-20 Thread Mao, David
Hi Jerry & Christian,
Regarding to the version control, what is the rule of bumping the 
PACKAGE_VERSION of libdrm?
We may need something, like the size of gpu_info structure or the something 
like PACKAGE VERSION of the running libdrm_amdgpu.so, to ensure that we did not 
breach the ABI.

Thanks. 
Best Regards,
David

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Zhang, Jerry
Sent: Monday, February 20, 2017 5:04 PM
To: Zhang, Jerry ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH v3 1/1] drm/amdgpu: export gfx config double offchip LDS 
buffers (v3)

Hi Christian,

Read the mail again, maybe you'd like to add amdgpu_gfx_config to the 
amdgpu_gca_config, right?
{{{
+struct amdgpu_gfx_config {
+   uint32_t double_offchip_lds_buf;
+};
+
  @@ -856,6 +856,9 @@ struct amdgpu_gca_config {
uint32_t macrotile_mode_array[16];

struct amdgpu_rb_config
  rb_config[AMDGPU_GFX_MAX_SE][AMDGPU_GFX_MAX_SH_PER_SE];
+
+   /* gfx config */
+   struct amdgpu_gfx_configgc;
  };
}}}

> In this case you might want to bump the driver minor number as well to 
> indicate validity of that field.
> 
> See we can extend the structures by adding new fields to the end, but 
> old kernels will not set those fields (so they stay zero cleared).
> 
> Not sure what the UMD is expecting here.

About the version bump, we can add version in the end of amdgpu_gfx_config?
Vulcan may get to know the enabled features/configs according to the version 
for a specific release.


Regards,
Jerry (Junwei Zhang)

Linux Base Graphics
SRDC Software Development
_


> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf 
> Of Junwei Zhang
> Sent: Monday, February 20, 2017 10:51
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Jerry
> Subject: [PATCH v3 1/1] drm/amdgpu: export gfx config double offchip 
> LDS buffers (v3)
> 
> v2: move the config struct to drm_amdgpu_info_device
> v3: move the config feature to amdgpu_gca_config
> 
> Signed-off-by: Junwei Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  2 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c   |  6 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   |  6 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 16 +++-
>  include/uapi/drm/amdgpu_drm.h   |  2 ++
>  6 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 7f1421f..9c552a9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -856,6 +856,9 @@ struct amdgpu_gca_config {
>   uint32_t macrotile_mode_array[16];
> 
>   struct amdgpu_rb_config
> rb_config[AMDGPU_GFX_MAX_SE][AMDGPU_GFX_MAX_SH_PER_SE];
> +
> + /* gfx configure feature */
> + uint32_t double_offchip_lds_buf;
>  };
> 
>  struct amdgpu_cu_info {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 6b9bf0e..bcc13907d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -545,6 +545,8 @@ static int amdgpu_info_ioctl(struct drm_device 
> *dev, void *data, struct drm_file
>   dev_info.vram_type = adev->mc.vram_type;
>   dev_info.vram_bit_width = adev->mc.vram_width;
>   dev_info.vce_harvest_config = adev->vce.harvest_config;
> + dev_info.gc_double_offchip_lds_buf =
> + adev->gfx.config.double_offchip_lds_buf;
> 
>   return copy_to_user(out, _info,
>   min((size_t)size, sizeof(dev_info))) ? 
> -EFAULT :
> 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> index 782190d..138e15a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> @@ -1579,6 +1579,11 @@ static void gfx_v6_0_setup_spi(struct 
> amdgpu_device
> *adev)
>   mutex_unlock(>grbm_idx_mutex);
>  }
> 
> +static void gfx_v6_0_config_init(struct amdgpu_device *adev) {
> + adev->gfx.config.double_offchip_lds_buf = 1; }
> +
>  static void gfx_v6_0_gpu_init(struct amdgpu_device *adev)  {
>   u32 gb_addr_config = 0;
> @@ -1736,6 +1741,7 @@ static void gfx_v6_0_gpu_init(struct 
> amdgpu_device
> *adev)
>   gfx_v6_0_setup_spi(adev);
> 
>   gfx_v6_0_get_cu_info(adev);
> + gfx_v6_0_config_init(adev);
> 
>   WREG32(mmCP_QUEUE_THRESHOLDS, ((0x16 <<
> CP_QUEUE_THRESHOLDS__ROQ_IB1_START__SHIFT) |
>  (0x2b <<
> CP_QUEUE_THRESHOLDS__ROQ_IB2_START__SHIFT)));
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index 8e07a50..6e7b273 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ 

RE: Random short freezes due to TTM buffer migrations

2016-08-17 Thread Mao, David
It becomes regular for application to request a big chunk of memory and do the 
sub-allocation by itself. 
I agree Kernel should do better to provide fine grained paging granularity. 
I don't know whether 1/2 of vram is the biggest single allocation application 
would like to do, but it is better if we can remove this limitation in the 
future as well.

Thanks. 
Best Regards,
David

-Original Message-
From: Zhou, David(ChunMing) 
Sent: Wednesday, August 17, 2016 9:58 AM
To: Zhou, David(ChunMing) <david1.z...@amd.com>; Kuehling, Felix 
<felix.kuehl...@amd.com>; Christian König <deathsim...@vodafone.de>; Marek 
Olšák <mar...@gmail.com>; amd-gfx@lists.freedesktop.org; Mao, David 
<david@amd.com>
Subject: RE: Random short freezes due to TTM buffer migrations

Add his email.

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf 
> Of Zhou, David(ChunMing)
> Sent: Wednesday, August 17, 2016 9:57 AM
> To: Kuehling, Felix <felix.kuehl...@amd.com>; Christian König 
> <deathsim...@vodafone.de>; Marek Olšák <mar...@gmail.com>; amd- 
> g...@lists.freedesktop.org
> Subject: RE: Random short freezes due to TTM buffer migrations
> 
> +David Mao,
> 
> Well, our Vulcan stack aslo encountered this problem before, the 
> performance is very low when migration is often. At that moment, we 
> want to add some algorithm for eviction LRU, but failed to find an 
> appropriate generic  way. Then UMD decreased some VRAM usage at last.
> Hope we can get a solution for full VRAM usage this time.
> 
> Regards,
> David Zhou
> 
> > -Original Message-
> > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
> > Behalf Of Felix Kuehling
> > Sent: Wednesday, August 17, 2016 2:34 AM
> > To: Christian König <deathsim...@vodafone.de>; Marek Olšák 
> > <mar...@gmail.com>; amd-gfx@lists.freedesktop.org
> > Subject: Re: Random short freezes due to TTM buffer migrations
> >
> > Very nice. I'm looking forward to this for KFD as well.
> >
> > One question: Will it be possible to share these split BOs as dmabufs?
> >
> > Regards,
> >   Felix
> >
> >
> > On 16-08-16 11:27 AM, Christian König wrote:
> > > Hi Marek,
> > >
> > > I'm already working on this.
> > >
> > > My current approach is to use a custom BO manager for VRAM with 
> > > TTM and so split allocations into chunks of 4MB.
> > >
> > > Large BOs are still swapped out as one, but it makes it much more 
> > > likely to that you can allocate 1/2 of VRAM as one buffer.
> > >
> > > Give me till the end of the week to finish this and then we can 
> > > test if that's sufficient or if we need to do more.
> > >
> > > Regards,
> > > Christian.
> > >
> > > Am 16.08.2016 um 16:33 schrieb Marek Olšák:
> > >> Hi,
> > >>
> > >> I'm seeing random temporary freezes (up to 2 seconds) under 
> > >> memory pressure. Before I describe the exact circumstances, I'd 
> > >> like to say that this is a serious issue affecting playability of 
> > >> certain AAA Linux games.
> > >>
> > >> In order to reproduce this, an application should:
> > >> - allocate a few very large buffers (256-512 MB per buffer)
> > >> - allocate more memory than there is available VRAM. The issue 
> > >> also occurs (but at a lower frequency) if the app needs only 80% of VRAM.
> > >>
> > >> Example: ttm_bo_validate needs to migrate a 512 MB buffer. The 
> > >> total size of moved memory for that call can be as high as 1.5 GB.
> > >> This is always followed by a big temporary drop in VRAM usage.
> > >>
> > >> The game I'm testing needs 3.4 GB of VRAM.
> > >>
> > >> Setups:
> > >> Tonga - 2 GB: It's nearly unplayable, because freezes occur too often.
> > >> Fiji - 4 GB: There is one freeze at the beginning (which is 
> > >> annoying too), after that it's smooth.
> > >>
> > >> So even 4 GB is not enough.
> > >>
> > >> Workarounds:
> > >> - Split buffers into smaller pieces in the kernel. It's not 
> > >> necessary to manage memory at page granularity (64KB). Splitting 
> > >> buffers into 16MB-large pieces might not be optimal but it would 
> > >> be a significant improvement.
> > >> - Or do the same in Mesa. This would prevent inter-process and 
> > >> inter-API b