RE: [PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Quan, Evan
Hi Alex,

OK, I got your concern. Then how about the followings?

If user want to change the FMin and FMax, they just need to pass in the new 
clock values. 
E.g, "s 0 350 10" will update the Fmin to 350Mhz and increase its voltage by 
10mV.
"s 7 1000 -10" will update the Fmax to 1000Mhz and decrease voltage by 10mV.

Same for MCLK except the voltage offset passed in takes no effect.

OD_SCLK:
0:0Mhz  0mV
4:   0Mhz  0mV
7:   0Mhz  0mV

OD_MCLK:
3:0Mhz  0mV

OD_RANGE:
SCLK[0]:   0Mhz  0Mhz
VDDGFX[0]:   0mV   0mV
SCLK[4]:   0Mhz  0Mhz
VDDGFX[4]:   0mV   0mV
SCLK[7]:   0Mhz  0Mhz
VDDGFX[7]:   0mV   0mV

MCLK[3]:   0Mhz  0Mhz
VDDMEM[3]:0mV   0mV


Regard,
Evan

> -Original Message-
> From: Alex Deucher 
> Sent: 2018年8月30日 13:06
> To: Quan, Evan 
> Cc: amd-gfx list ; Deucher, Alexander
> ; Zhu, Rex 
> Subject: Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support
> 
> On Wed, Aug 29, 2018 at 11:54 PM Quan, Evan 
> wrote:
> >
> > Hi Alex,
> >
> >
> > >What about the min/max sclk? Do the curve values take that into
> > >account?  What about max mclk?
> >
> > Voltage curve does not take these into consideration. And the max sclk and
> mclk can be set through existing sysfs interface pp_sclk_od and pp_mclk_od.
> For min sclk, as i know there is no interface to get it set.
> >
> 
> I'd prefer to allow adjusting min and max clocks via this interface for
> consistency with vega10 and smu7.  I'd like to deprecate the pp_sclk_od and
> pp_mclk_od interfaces because the are percentage based and don't support
> underclocking.
> 
> Alex
> 
> >
> > > Are negative offsets supported?
> > Sure.
> >
> >
> > Regards,
> >
> > Evan
> >
> >
> > 
> > From: Alex Deucher 
> > Sent: Thursday, August 30, 2018 12:25:35 AM
> > To: Quan, Evan
> > Cc: amd-gfx list; Deucher, Alexander; Zhu, Rex
> > Subject: Re: [PATCH] drm/amd/powerplay: added vega20 overdrive
> support
> >
> > On Wed, Aug 29, 2018 at 5:13 AM Evan Quan 
> wrote:
> > >
> > > Vega20 supports only sclk voltage overdrive. And user can only tell
> > > three groups of . SMU firmware will
> > > recalculate the frequency/voltage curve. Other intermediate levles
> > > will be stretched/shrunk accordingly.
> > >
> >
> > What about the min/max sclk? Do the curve values take that into
> > account?  What about max mclk?
> >
> > > Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
> > > Signed-off-by: Evan Quan 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
> > >  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165
> +-
> > >  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
> > >  3 files changed, 192 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > index e2577518b9c6..6e0c8f583521 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > @@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device
> *dev,
> > >   * in each power level within a power state.  The pp_od_clk_voltage is
> used for
> > >   * this.
> > >   *
> > > + * < For Vega10 and previous ASICs >
> > > + *
> > >   * Reading the file will display:
> > >   *
> > >   * - a list of engine clock levels and voltages labeled OD_SCLK @@
> > > -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device
> *dev,
> > >   * "c" (commit) to the file to commit your changes.  If you want to reset
> to the
> > >   * default power levels, write "r" (reset) to the file to reset them.
> > >   *
> > > + *
> > > + * < For Vega20 >
> > > + *
> > > + * Reading the file will display:
> > > + *
> > > + * - three groups of engine clock and voltage offset labeled
> > > + OD_SCLK
> > > + *
> > > + * - a list of valid ranges for above three groups of sclk and voltage 
> > > offset
> > > + *   labeled OD_RANGE
> > > + *
> > > + * To manually adjust these settings:
> > > + *
> > > + * - First select manual using power_dpm_force_performance_level
> > > + *
> > > + * - Enter a new value for each group by writing a string that contains
> > > + *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 
> > > 20"
> > > + *   will update group1 sclk to be 500 MHz with voltage increased 20mV
> > > + *
> >
> > Are negative offsets supported?
> >
> > Alex
> >
> > > + * - When you have edited all of the states as needed, write "c" (commit)
> > > + *   to the file to commit your changes
> > > + *
> > > + * - If you want to reset to the default power levels, write "r" (reset)
> > > + *   to the file to reset them
> > > + *
> > >   */
> > >
> > >  static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> 

RE: [PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Quan, Evan
Thanks Paul,

Comment inline

> -Original Message-
> From: Paul Menzel 
> Sent: 2018年8月29日 20:23
> To: Quan, Evan 
> Cc: amd-gfx@lists.freedesktop.org; Alex Deucher
> ; Zhu, Rex 
> Subject: Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support
> 
> Dear Evan,
> 
> 
> On 08/29/18 11:12, Evan Quan wrote:
> > Vega20 supports only sclk voltage overdrive. And user can
> > only tell three groups of . SMU
> 
> Do you mean: The user can only configure three groups of …?
> 
[Quan, Evan] Yes.
> > firmware will recalculate the frequency/voltage curve. Other
> > intermediate levles will be stretched/shrunk accordingly.
> 
> levels
> 
> (A spell checker should detect simple typos.)
> 
[Quan, Evan] Thanks.
> > Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
> > Signed-off-by: Evan Quan 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
> >  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165
> +-
> >  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
> >  3 files changed, 192 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > index e2577518b9c6..6e0c8f583521 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > @@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device
> *dev,
> >   * in each power level within a power state.  The pp_od_clk_voltage is
> used for
> >   * this.
> >   *
> > + * < For Vega10 and previous ASICs >
> > + *
> >   * Reading the file will display:
> >   *
> >   * - a list of engine clock levels and voltages labeled OD_SCLK
> > @@ -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device
> *dev,
> >   * "c" (commit) to the file to commit your changes.  If you want to reset 
> > to
> the
> >   * default power levels, write "r" (reset) to the file to reset them.
> >   *
> > + *
> > + * < For Vega20 >
> > + *
> > + * Reading the file will display:
> > + *
> > + * - three groups of engine clock and voltage offset labeled OD_SCLK
> > + *
> > + * - a list of valid ranges for above three groups of sclk and voltage 
> > offset
> > + *   labeled OD_RANGE
> > + *
> > + * To manually adjust these settings:
> > + *
> > + * - First select manual using power_dpm_force_performance_level
> > + *
> > + * - Enter a new value for each group by writing a string that contains
> > + *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 20"
> > + *   will update group1 sclk to be 500 MHz with voltage increased 20mV
> 
> increased *by* 20mV? by or to?
> 
[Quan, Evan] It should be 'by' since here is an offset.
> > + *
> > + * - When you have edited all of the states as needed, write "c" (commit)
> > + *   to the file to commit your changes
> > + *
> > + * - If you want to reset to the default power levels, write "r" (reset)
> > + *   to the file to reset them
> > + *
> >   */
> >
> >  static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > index ececa2f7fe5f..9f6e070a76e0 100644
> > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > @@ -1096,6 +1096,13 @@ static int
> vega20_od8_initialize_default_settings(
> > }
> > }
> >
> > +   ret = vega20_copy_table_from_smc(hwmgr,
> > +   (uint8_t *)(>od_table),
> > +   TABLE_OVERDRIVE);
> > +   PP_ASSERT_WITH_CODE(!ret,
> > +   "Failed to export over drive table!",
> 
> *over drive* or *overdrive*
[Quan, Evan] Should be 'overdrive'. Will fix it.
> 
> > +   return ret);
> > +
> > return 0;
> >  }
> >
> > @@ -2506,11 +2513,112 @@ static int
> vega20_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
> > return 0;
> >  }
> >
> > +static int vega20_odn_edit_dpm_table(struct pp_hwmgr *hwmgr,
> > +   enum
> PP_OD_DPM_TABLE_COMMAND type,
> > +   long *input, uint32_t size)
> 
> What is size? Could it be `size_t` or just int?
[Quan, Evan] Number of elements in the input array.
> 
> > +{
> > +   struct vega20_hwmgr *data =
> > +   (struct vega20_hwmgr *)(hwmgr->backend);
> > +   struct vega20_od8_single_setting *od8_settings =
> > +   data->od8_settings.od8_settings_array;
> > +   OverDriveTable_t od_table;
> 
> Is CamelCase wanted?
> 
> > +   int32_t input_index, input_clk, input_vol, i;
> > +   int ret = 0;
> 
> Is the initialization needed?
> 
[Quan, Evan] Fine to have it uninitialized here since it will be initialized 
before use.
> > +
> > +   PP_ASSERT_WITH_CODE(input, "NULL user input for clock and
> voltage",
> > +   return -EINVAL);
> > +
> > +   switch (type) {
> > +   case PP_OD_EDIT_SCLK_VDDC_TABLE:
> > +   if 

Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Alex Deucher
On Wed, Aug 29, 2018 at 11:54 PM Quan, Evan  wrote:
>
> Hi Alex,
>
>
> >What about the min/max sclk? Do the curve values take that into
> >account?  What about max mclk?
>
> Voltage curve does not take these into consideration. And the max sclk and 
> mclk can be set through existing sysfs interface pp_sclk_od and pp_mclk_od. 
> For min sclk, as i know there is no interface to get it set.
>

I'd prefer to allow adjusting min and max clocks via this interface
for consistency with vega10 and smu7.  I'd like to deprecate the
pp_sclk_od and pp_mclk_od interfaces because the are percentage based
and don't support underclocking.

Alex

>
> > Are negative offsets supported?
> Sure.
>
>
> Regards,
>
> Evan
>
>
> 
> From: Alex Deucher 
> Sent: Thursday, August 30, 2018 12:25:35 AM
> To: Quan, Evan
> Cc: amd-gfx list; Deucher, Alexander; Zhu, Rex
> Subject: Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support
>
> On Wed, Aug 29, 2018 at 5:13 AM Evan Quan  wrote:
> >
> > Vega20 supports only sclk voltage overdrive. And user can
> > only tell three groups of . SMU
> > firmware will recalculate the frequency/voltage curve. Other
> > intermediate levles will be stretched/shrunk accordingly.
> >
>
> What about the min/max sclk? Do the curve values take that into
> account?  What about max mclk?
>
> > Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
> > Signed-off-by: Evan Quan 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
> >  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165 +-
> >  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
> >  3 files changed, 192 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > index e2577518b9c6..6e0c8f583521 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > @@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
> >   * in each power level within a power state.  The pp_od_clk_voltage is 
> > used for
> >   * this.
> >   *
> > + * < For Vega10 and previous ASICs >
> > + *
> >   * Reading the file will display:
> >   *
> >   * - a list of engine clock levels and voltages labeled OD_SCLK
> > @@ -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
> >   * "c" (commit) to the file to commit your changes.  If you want to reset 
> > to the
> >   * default power levels, write "r" (reset) to the file to reset them.
> >   *
> > + *
> > + * < For Vega20 >
> > + *
> > + * Reading the file will display:
> > + *
> > + * - three groups of engine clock and voltage offset labeled OD_SCLK
> > + *
> > + * - a list of valid ranges for above three groups of sclk and voltage 
> > offset
> > + *   labeled OD_RANGE
> > + *
> > + * To manually adjust these settings:
> > + *
> > + * - First select manual using power_dpm_force_performance_level
> > + *
> > + * - Enter a new value for each group by writing a string that contains
> > + *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 20"
> > + *   will update group1 sclk to be 500 MHz with voltage increased 20mV
> > + *
>
> Are negative offsets supported?
>
> Alex
>
> > + * - When you have edited all of the states as needed, write "c" (commit)
> > + *   to the file to commit your changes
> > + *
> > + * - If you want to reset to the default power levels, write "r" (reset)
> > + *   to the file to reset them
> > + *
> >   */
> >
> >  static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
> > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > index ececa2f7fe5f..9f6e070a76e0 100644
> > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > @@ -1096,6 +1096,13 @@ static int vega20_od8_initialize_default_settings(
> > }
> > }
> >
> > +   ret = vega20_copy_table_from_smc(hwmgr,
> > +   (uint8_t *)(>od_table),
> > +   TABLE_OVERDRIVE);
> > +   PP_ASSERT_WITH_CODE(!ret,
> > +   "Failed to export over drive table!",
> > +   return ret);
> > +
> > return 0;
> >  }
> >
> > @@ -2506,11 +2513,112 @@ static int 
> > vega20_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
> > return 0;
> >  }
> >
> > +static int vega20_odn_edit_dpm_table(struct pp_hwmgr *hwmgr,
> > +   enum PP_OD_DPM_TABLE_COMMAND type,
> > +   long *input, uint32_t size)
> > +{
> > +   struct vega20_hwmgr *data =
> > +   (struct vega20_hwmgr *)(hwmgr->backend);
> > +   struct vega20_od8_single_setting *od8_settings =
> > +   data->od8_settings.od8_settings_array;
> > +   OverDriveTable_t od_table;
> > +   int32_t 

Re: [PATCH] drm/amd/powerplay: correct data type to support under voltage

2018-08-29 Thread Alex Deucher
On Thu, Aug 30, 2018 at 12:49 AM Evan Quan  wrote:
>
> For under voltage, negative value will be applied to voltage
> offset. Update the data type to cover this case.
>
> Change-Id: I955da13fd9777320b0605b6b620133d596b573be
> Signed-off-by: Evan Quan 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h 
> b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h
> index 0a39a4c564d2..59e621ef33ac 100644
> --- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h
> +++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h
> @@ -569,11 +569,11 @@ typedef struct {
>uint16_t  GfxclkFmin;
>uint16_t  GfxclkFmax;
>uint16_t  GfxclkFreq1;
> -  uint16_t  GfxclkOffsetVolt1;
> +  int16_t  GfxclkOffsetVolt1;
>uint16_t  GfxclkFreq2;
> -  uint16_t  GfxclkOffsetVolt2;
> +  int16_t  GfxclkOffsetVolt2;
>uint16_t  GfxclkFreq3;
> -  uint16_t  GfxclkOffsetVolt3;
> +  int16_t  GfxclkOffsetVolt3;
>uint16_t  UclkFmax;
>int16_t   OverDrivePct;
>uint16_t  FanMaximumRpm;
> --
> 2.18.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: When to kmap PT BOs?

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/30/2018 06:30 AM, Felix Kuehling wrote:

Hi,

Currently PT BOs are kmapped in amdgpu_vm_update_directories. That
means, to avoid kernel oopses after page table evictions, I need to call
amdgpu_vm_update_directories before calling amdgpu_vm_bo_update.

But amdgpu_vm_bo_update can also move PTs on the vm->relocated list
during huge page handling. That means I also need to call
amdgpu_vm_update_directories after amdgpu_vm_bo_update.


Not very familiar with huge page handling.

But from code, maybe we can kmap the PTE entry right here.
Then it will update current non-huge page PTE later in amdgpu_vm_update_ptes().

Regards,
Jerry



I think a better solution is to move kmapping out of
amdgpu_vm_update_directories. But I'm not sure what's the right place
for it. Any suggestions? For a quick fix for kernel oopses after page
table evictions in the ROCm 1.9 release I'll call
amdgpu_vm_update_directories twice. If there are no new entries on the
vm->relocated lists, the second call won't add much overhead anyway.

Thanks,
   Felix


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: correct data type to support under voltage

2018-08-29 Thread Evan Quan
For under voltage, negative value will be applied to voltage
offset. Update the data type to cover this case.

Change-Id: I955da13fd9777320b0605b6b620133d596b573be
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h
index 0a39a4c564d2..59e621ef33ac 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if.h
@@ -569,11 +569,11 @@ typedef struct {
   uint16_t  GfxclkFmin;
   uint16_t  GfxclkFmax;
   uint16_t  GfxclkFreq1;
-  uint16_t  GfxclkOffsetVolt1;
+  int16_t  GfxclkOffsetVolt1;
   uint16_t  GfxclkFreq2;
-  uint16_t  GfxclkOffsetVolt2;
+  int16_t  GfxclkOffsetVolt2;
   uint16_t  GfxclkFreq3;
-  uint16_t  GfxclkOffsetVolt3;
+  int16_t  GfxclkOffsetVolt3;
   uint16_t  UclkFmax;
   int16_t   OverDrivePct;
   uint16_t  FanMaximumRpm;
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 6/7] drm/amdgpu: enable AGP aperture for GMC9

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 10:08 PM, Christian König wrote:

Enable the old AGP aperture to avoid GART mappings.

Signed-off-by: Christian König 

Reviewed-by: Junwei Zhang 


---
  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 10 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  1 +
  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 10 +-
  3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 3403ded39d13..ffd0ec9586d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -65,16 +65,16 @@ static void gfxhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
  {
uint64_t value;

-   /* Disable AGP. */
+   /* Program the AGP BAR */
WREG32_SOC15(GC, 0, mmMC_VM_AGP_BASE, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0x);
+   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, adev->gmc.agp_start >> 24);
+   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, adev->gmc.agp_end >> 24);

/* Program the system aperture low logical page number. */
WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
-adev->gmc.vram_start >> 18);
+min(adev->gmc.vram_start, adev->gmc.agp_start) >> 18);
WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
-adev->gmc.vram_end >> 18);
+max(adev->gmc.vram_end, adev->gmc.agp_end) >> 18);

/* Set default page address. */
value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 04d50893a6f2..719f45cdaf6a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -751,6 +751,7 @@ static void gmc_v9_0_vram_gtt_location(struct amdgpu_device 
*adev,
base = mmhub_v1_0_get_fb_location(adev);
amdgpu_gmc_vram_location(adev, >gmc, base);
amdgpu_gmc_gart_location(adev, mc);
+   amdgpu_gmc_agp_location(adev, mc);
/* base offset of vram pages */
adev->vm_manager.vram_base_offset = gfxhub_v1_0_get_mc_fb_offset(adev);
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 5f6a9c85488f..73d7c075dd33 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -76,16 +76,16 @@ static void mmhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
uint64_t value;
uint32_t tmp;

-   /* Disable AGP. */
+   /* Program the AGP BAR */
WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_BASE, 0);
-   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_BOT, 0x00FF);
+   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_BOT, adev->gmc.agp_start >> 24);
+   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_TOP, adev->gmc.agp_end >> 24);

/* Program the system aperture low logical page number. */
WREG32_SOC15(MMHUB, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
-adev->gmc.vram_start >> 18);
+min(adev->gmc.vram_start, adev->gmc.agp_start) >> 18);
WREG32_SOC15(MMHUB, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
-adev->gmc.vram_end >> 18);
+max(adev->gmc.vram_end, adev->gmc.agp_end) >> 18);

/* Set default page address. */
value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start +


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Quan, Evan
Hi Alex,


>What about the min/max sclk? Do the curve values take that into
>account?  What about max mclk?

Voltage curve does not take these into consideration. And the max sclk and mclk 
can be set through existing sysfs interface pp_sclk_od and pp_mclk_od. For min 
sclk, as i know there is no interface to get it set.


> Are negative offsets supported?
Sure.


Regards,

Evan



From: Alex Deucher 
Sent: Thursday, August 30, 2018 12:25:35 AM
To: Quan, Evan
Cc: amd-gfx list; Deucher, Alexander; Zhu, Rex
Subject: Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support

On Wed, Aug 29, 2018 at 5:13 AM Evan Quan  wrote:
>
> Vega20 supports only sclk voltage overdrive. And user can
> only tell three groups of . SMU
> firmware will recalculate the frequency/voltage curve. Other
> intermediate levles will be stretched/shrunk accordingly.
>

What about the min/max sclk? Do the curve values take that into
account?  What about max mclk?

> Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165 +-
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
>  3 files changed, 192 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index e2577518b9c6..6e0c8f583521 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
>   * in each power level within a power state.  The pp_od_clk_voltage is used 
> for
>   * this.
>   *
> + * < For Vega10 and previous ASICs >
> + *
>   * Reading the file will display:
>   *
>   * - a list of engine clock levels and voltages labeled OD_SCLK
> @@ -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
>   * "c" (commit) to the file to commit your changes.  If you want to reset to 
> the
>   * default power levels, write "r" (reset) to the file to reset them.
>   *
> + *
> + * < For Vega20 >
> + *
> + * Reading the file will display:
> + *
> + * - three groups of engine clock and voltage offset labeled OD_SCLK
> + *
> + * - a list of valid ranges for above three groups of sclk and voltage offset
> + *   labeled OD_RANGE
> + *
> + * To manually adjust these settings:
> + *
> + * - First select manual using power_dpm_force_performance_level
> + *
> + * - Enter a new value for each group by writing a string that contains
> + *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 20"
> + *   will update group1 sclk to be 500 MHz with voltage increased 20mV
> + *

Are negative offsets supported?

Alex

> + * - When you have edited all of the states as needed, write "c" (commit)
> + *   to the file to commit your changes
> + *
> + * - If you want to reset to the default power levels, write "r" (reset)
> + *   to the file to reset them
> + *
>   */
>
>  static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> index ececa2f7fe5f..9f6e070a76e0 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> @@ -1096,6 +1096,13 @@ static int vega20_od8_initialize_default_settings(
> }
> }
>
> +   ret = vega20_copy_table_from_smc(hwmgr,
> +   (uint8_t *)(>od_table),
> +   TABLE_OVERDRIVE);
> +   PP_ASSERT_WITH_CODE(!ret,
> +   "Failed to export over drive table!",
> +   return ret);
> +
> return 0;
>  }
>
> @@ -2506,11 +2513,112 @@ static int 
> vega20_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
> return 0;
>  }
>
> +static int vega20_odn_edit_dpm_table(struct pp_hwmgr *hwmgr,
> +   enum PP_OD_DPM_TABLE_COMMAND type,
> +   long *input, uint32_t size)
> +{
> +   struct vega20_hwmgr *data =
> +   (struct vega20_hwmgr *)(hwmgr->backend);
> +   struct vega20_od8_single_setting *od8_settings =
> +   data->od8_settings.od8_settings_array;
> +   OverDriveTable_t od_table;
> +   int32_t input_index, input_clk, input_vol, i;
> +   int ret = 0;
> +
> +   PP_ASSERT_WITH_CODE(input, "NULL user input for clock and voltage",
> +   return -EINVAL);
> +
> +   switch (type) {
> +   case PP_OD_EDIT_SCLK_VDDC_TABLE:
> +   if (!(od8_settings[OD8_SETTING_GFXCLK_FREQ1].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_FREQ2].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_FREQ3].feature_id &&
> +   

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread zhoucm1



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last signal 
point, and this wait PT is only signaled by related signal point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 -
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval is 
1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence 
*fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence and 
itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 

RE: [PATCH 01/10] drm/amdgpu: use only the lower address space on GMC9

2018-08-29 Thread Xiao, Jack
Can you explain what the benefits can be gained from AGP aperture enablement? 
Otherwise, it would increase our maintenance workload.

Regards,
Jack

-Original Message-
From: Christian König  
Sent: Wednesday, August 29, 2018 4:47 PM
To: Kuehling, Felix ; Koenig, Christian 
; Xiao, Jack ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address space on GMC9

Completely agree with Felix here.

It makes system memory access slightly simpler, but I would say that you 
accidentally corrupt the GART table and that you accidentally corrupt the the 
AGP aperture is equally likely.

Regards,
Christian.

Am 28.08.2018 um 20:12 schrieb Felix Kuehling:
> The GPU always has access to the entire guest physical address space.
> You can just program arbitrary addresses into page table entries and 
> set the system bit. That's how GART and GPUVM mappings work. They will 
> not go through the AGP aperture. And there is no mechanism (other than 
> an
> IOMMU) to protect system memory from GPU access.
>
> Regards,
>    Felix
>
>
> On 2018-08-28 07:41 AM, Christian König wrote:
>> Well that is indeed true, but we still have IOMMU between the GPU and 
>> the system memory.
>>
>> So we should still catch issues when something goes terrible wrong.
>>
>> Additional to that only the system domain, e.g. kernel copies, page 
>> table updates etc are allowed to use it.
>>
>> Regards,
>> Christian.
>>
>> Am 28.08.2018 um 09:06 schrieb Xiao, Jack:
>>> I mean it has risk to make GPU allowed to access to most system 
>>> memory without explicit claiming, it's easier to mask problem.
>>>
>>> Regards,
>>> Jack
>>>
>>> -Original Message-
>>> From: Koenig, Christian
>>> Sent: Tuesday, August 28, 2018 2:46 PM
>>> To: Xiao, Jack ; Kuehling, Felix 
>>> ; Christian König 
>>> ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address 
>>> space on GMC9
>>>
 This series patches seems to make AGP aperture allowed to access 
 any system memory (16GB) bypass GPU VM protection.
>>> The system aperture should only be active in the system domain, or 
>>> otherwise applications would have access to local memory as well.
>>>
>>> There are some bits in the VM registers to enable/disable that, but 
>>> when we would have that setting incorrect we would see quite a bunch 
>>> of other problems.
>>>
>>> Might still be a good idea to double check if all the bits are setup 
>>> correctly.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 28.08.2018 um 07:31 schrieb Xiao, Jack:
 This series patches seems to make AGP aperture allowed to access 
 any system memory (16GB) bypass GPU VM protection.
 If someone made a wrong logic requesting an illegal address which 
 occasionally was located inside AGP aperture, but without any VM 
 protection, the illegal address would be finally translated into a 
 system memory address; if GPU read/wrote such system memory 
 address, the system memory address might belong to kernel or any 
 user application, the r/w operation would result in any unpredictable 
 issue.
 The most important is that such kind of issue is so hard to be 
 addressed.
 Is it worth doing this, but exposing risk?

 Regards,
 Jack

 -Original Message-
 From: amd-gfx  On Behalf Of 
 Felix Kuehling
 Sent: Tuesday, August 28, 2018 3:03 AM
 To: Christian König ;
 amd-gfx@lists.freedesktop.org; Koenig, Christian 
 
 Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address 
 space on GMC9

 The point of this series seems to be to allow access to small 
 system memory BOs (one page) without a GART mapping. I'm guessing 
 that reduces pressure on the GART and removes the need for HDP and 
 TLB flushes. Why does Patch 10 only enable that on GFXv9? Is there 
 less benefit on older chips?

 Is this related to your recent changes to allow page tables in 
 system memory?

 See my replies to patch 6 and 8. Other than that, the series is
 Acked-by: Felix Kuehling 

 Regards,
      Felix


 On 2018-08-27 12:53 PM, Christian König wrote:
> Only use the lower address space on GMC9 for the system domain.
> Otherwise we would need to sign extend GMC addresses.
>
> Signed-off-by: Christian König 
> ---
>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
>     1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index e44b5191735d..d982956c8329 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -938,11 +938,10 @@ static int gmc_v9_0_sw_init(void *handle)
>     if (r)
>     return r;
>     -    /* Set the internal MC address mask
> - * This is the max address 

Re: [PATCH 5/7] drm/amdgpu: manually map the shadow BOs again

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 10:08 PM, Christian König wrote:

Otherwise we won't be able to use the AGP aperture.


do you mean we use AGP for GTT shadow only now?

Jerry


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +
  2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 0cbf651a88a6..de990bdcdd6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -163,10 +163,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)

if (domain & AMDGPU_GEM_DOMAIN_GTT) {
places[c].fpfn = 0;
-   if (flags & AMDGPU_GEM_CREATE_SHADOW)
-   places[c].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
-   else
-   places[c].lpfn = 0;
+   places[c].lpfn = 0;
places[c].flags = TTM_PL_FLAG_TT;
if (flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC)
places[c].flags |= TTM_PL_FLAG_WC |
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a3675c7b6190..abe1db4c63f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -346,6 +346,11 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
r = amdgpu_ttm_alloc_gart(>tbo);
if (r)
break;
+   if (bo->shadow) {
+   r = amdgpu_ttm_alloc_gart(>shadow->tbo);
+   if (r)
+   break;
+   }
list_move(_base->vm_status, >relocated);
}
}


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/7] drm/amdgpu: use the AGP aperture for system memory access v2

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 10:08 PM, Christian König wrote:

Start to use the old AGP aperture for system memory access.

v2: Move that to amdgpu_ttm_alloc_gart

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 58 ++---
  3 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 1d201fd3f4af..65aee57b35fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -79,6 +79,29 @@ uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo)
return pd_addr;
  }

+/**
+ * amdgpu_gmc_agp_addr - return the address in the AGP address space
+ *
+ * @tbo: TTM BO which needs the address, must be in GTT domain
+ *
+ * Tries to figure out how to access the BO through the AGP aperture. Returns
+ * AMDGPU_BO_INVALID_OFFSET if that is not possible.
+ */
+uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo)
+{
+   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
+   struct ttm_dma_tt *ttm;
+
+   if (bo->num_pages != 1 || bo->ttm->caching_state == tt_cached)
+   return AMDGPU_BO_INVALID_OFFSET;


If GTT bo size is 1 page, it will also access in AGP address space?

Jerry

+
+   ttm = container_of(bo->ttm, struct ttm_dma_tt, ttm);
+   if (ttm->dma_address[0] + PAGE_SIZE >= adev->gmc.agp_size)
+   return AMDGPU_BO_INVALID_OFFSET;
+
+   return adev->gmc.agp_start + ttm->dma_address[0];
+}
+
  /**
   * amdgpu_gmc_vram_location - try to find VRAM location
   *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index c9985e7dc9e5..265ca415c64c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -163,6 +163,7 @@ static inline uint64_t amdgpu_gmc_sign_extend(uint64_t addr)
  void amdgpu_gmc_get_pde_for_bo(struct amdgpu_bo *bo, int level,
   uint64_t *addr, uint64_t *flags);
  uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo);
+uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo);
  void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc 
*mc,
  u64 base);
  void amdgpu_gmc_gart_location(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d9f3201c9e5c..8a158ee922f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1081,41 +1081,49 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
struct ttm_mem_reg tmp;
struct ttm_placement placement;
struct ttm_place placements;
-   uint64_t flags;
+   uint64_t addr, flags;
int r;

if (bo->mem.start != AMDGPU_BO_INVALID_OFFSET)
return 0;

-   /* allocate GART space */
-   tmp = bo->mem;
-   tmp.mm_node = NULL;
-   placement.num_placement = 1;
-   placement.placement = 
-   placement.num_busy_placement = 1;
-   placement.busy_placement = 
-   placements.fpfn = 0;
-   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
-   placements.flags = (bo->mem.placement & ~TTM_PL_MASK_MEM) |
-   TTM_PL_FLAG_TT;
+   addr = amdgpu_gmc_agp_addr(bo);
+   if (addr != AMDGPU_BO_INVALID_OFFSET) {
+   bo->mem.start = addr >> PAGE_SHIFT;
+   } else {

-   r = ttm_bo_mem_space(bo, , , );
-   if (unlikely(r))
-   return r;
+   /* allocate GART space */
+   tmp = bo->mem;
+   tmp.mm_node = NULL;
+   placement.num_placement = 1;
+   placement.placement = 
+   placement.num_busy_placement = 1;
+   placement.busy_placement = 
+   placements.fpfn = 0;
+   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
+   placements.flags = (bo->mem.placement & ~TTM_PL_MASK_MEM) |
+   TTM_PL_FLAG_TT;
+
+   r = ttm_bo_mem_space(bo, , , );
+   if (unlikely(r))
+   return r;

-   /* compute PTE flags for this buffer object */
-   flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );
+   /* compute PTE flags for this buffer object */
+   flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );

-   /* Bind pages */
-   gtt->offset = ((u64)tmp.start << PAGE_SHIFT) - adev->gmc.gart_start;
-   r = amdgpu_ttm_gart_bind(adev, bo, flags);
-   if (unlikely(r)) {
-   ttm_bo_mem_put(bo, );
-   return r;
+   /* Bind pages */
+   gtt->offset = ((u64)tmp.start << PAGE_SHIFT) -
+   adev->gmc.gart_start;
+   r = amdgpu_ttm_gart_bind(adev, bo, flags);
+  

Re: [PATCH 3/7] drm/amdgpu: add amdgpu_gmc_agp_location v2

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 10:08 PM, Christian König wrote:

Helper to figure out the location of the AGP BAR.

v2: fix a couple of bugs

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 43 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  5 +++
  2 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index c6bcc4715373..1d201fd3f4af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -143,3 +143,46 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);
  }
+
+/**
+ * amdgpu_gmc_agp_location - try to find AGP location
+ * @adev: amdgpu device structure holding all necessary informations
+ * @mc: memory controller structure holding memory informations
+ *
+ * Function will place try to find a place for the AGP BAR in the MC address
+ * space.
+ *
+ * AGP BAR will be assigned the largest available hole in the address space.
+ * Should be called after VRAM and GART locations are setup.
+ */
+void amdgpu_gmc_agp_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc)
+{
+   const uint64_t sixteen_gb = 1ULL << 34;
+   const uint64_t sixteen_gb_mask = ~(sixteen_gb - 1);
+   u64 size_af, size_bf;
+
+   if (mc->vram_start > mc->gart_start) {
+   size_bf = (mc->vram_start & sixteen_gb_mask) -
+   ALIGN(mc->gart_end + 1, sixteen_gb);
+   size_af = mc->mc_mask + 1 - ALIGN(mc->vram_end, sixteen_gb);
+   } else {
+   size_bf = mc->vram_start & sixteen_gb_mask;
+   size_af = (mc->gart_start & sixteen_gb_mask) -
+   ALIGN(mc->vram_end, sixteen_gb);


we may need ALIGN(mc->vram_end + 1, sixteen_gb) for size_af.


+   }
+
+   if (size_bf > size_af) {
+   mc->agp_start = mc->vram_start > mc->gart_start ?
+   mc->gart_end + 1 : 0;
+   mc->agp_size = size_bf;
+   } else {
+   mc->agp_start = (mc->vram_start > mc->gart_start ?
+   mc->vram_end : mc->gart_end) + 1,
+   mc->agp_size = size_af;
+   }
+
+   mc->agp_start = ALIGN(mc->agp_start, sixteen_gb);
+   mc->agp_end = mc->agp_start + mc->agp_size - 1;
+   dev_info(adev->dev, "AGP: %lluM 0x%016llX - 0x%016llX\n",
+   mc->agp_size >> 20, mc->agp_start, mc->agp_end);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 48715dd5808a..c9985e7dc9e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -94,6 +94,9 @@ struct amdgpu_gmc {
 * about vram size near mc fb location */
u64 mc_vram_size;
u64 visible_vram_size;
+   u64 agp_size;
+   u64 agp_start;
+   u64 agp_end;
u64 gart_size;
u64 gart_start;
u64 gart_end;
@@ -164,5 +167,7 @@ void amdgpu_gmc_vram_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc,
  u64 base);
  void amdgpu_gmc_gart_location(struct amdgpu_device *adev,
  struct amdgpu_gmc *mc);
+void amdgpu_gmc_agp_location(struct amdgpu_device *adev,
+struct amdgpu_gmc *mc);

  #endif


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 2/2] glamor: Handle ihandle == -1 in amdgpu_glamor_set_shared_pixmap_backing

2018-08-29 Thread Alex Deucher
On Wed, Aug 29, 2018 at 11:39 AM Michel Dänzer  wrote:
>
> From: Michel Dänzer 
>
> (Ported from radeon commit de88ea2755611bdcb18d91d8234d2ab5be8ff2e9)
>
> Signed-off-by: Michel Dänzer 

Series is:
Acked-by: Alex Deucher 

> ---
>  src/amdgpu_glamor.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/amdgpu_glamor.c b/src/amdgpu_glamor.c
> index 13d68fe36..699861f73 100644
> --- a/src/amdgpu_glamor.c
> +++ b/src/amdgpu_glamor.c
> @@ -384,6 +384,7 @@ amdgpu_glamor_set_shared_pixmap_backing(PixmapPtr pixmap, 
> void *handle)
>  {
> ScreenPtr screen = pixmap->drawable.pScreen;
> ScrnInfoPtr scrn = xf86ScreenToScrn(screen);
> +   int ihandle = (int)(long)handle;
> struct amdgpu_pixmap *priv;
>
> if (!amdgpu_set_shared_pixmap_backing(pixmap, handle))
> @@ -391,7 +392,8 @@ amdgpu_glamor_set_shared_pixmap_backing(PixmapPtr pixmap, 
> void *handle)
>
> priv = amdgpu_get_pixmap_private(pixmap);
>
> -   if (!amdgpu_glamor_create_textured_pixmap(pixmap, priv->bo)) {
> +   if (ihandle != -1 &&
> +   !amdgpu_glamor_create_textured_pixmap(pixmap, priv->bo)) {
> xf86DrvMsg(scrn->scrnIndex, X_ERROR,
>"Failed to get PRIME drawable for glamor 
> pixmap.\n");
> return FALSE;
> --
> 2.18.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm/amdgpu: put GART away from VRAM v2

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 10:08 PM, Christian König wrote:

Always try to put the GART away from where VRAM is.

v2: correctly handle the 4GB limitation

Signed-off-by: Christian König 


Fix my concern :)

Reviewed-by: Junwei Zhang 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 265ec6807130..c6bcc4715373 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -116,6 +116,7 @@ void amdgpu_gmc_vram_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc,
   */
  void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc 
*mc)
  {
+   const uint64_t four_gb = 0x1ULL;
u64 size_af, size_bf;

mc->gart_size += adev->pm.smu_prv_buffer_size;
@@ -124,8 +125,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
 * the GART base on a 4GB boundary as well.
 */
size_bf = mc->vram_start;
-   size_af = adev->gmc.mc_mask + 1 -
-   ALIGN(mc->vram_end + 1, 0x1ULL);
+   size_af = adev->gmc.mc_mask + 1 - ALIGN(mc->vram_end + 1, four_gb);

if (mc->gart_size > max(size_bf, size_af)) {
dev_warn(adev->dev, "limiting GART\n");
@@ -136,7 +136,9 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
(size_af < mc->gart_size))
mc->gart_start = 0;
else
-   mc->gart_start = ALIGN(mc->vram_end + 1, 0x1ULL);
+   mc->gart_start = mc->mc_mask - mc->gart_size + 1;
+
+   mc->gart_start &= four_gb - 1;
mc->gart_end = mc->gart_start + mc->gart_size - 1;
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 11:17 PM, Philip Yang wrote:

kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary,
and in this case buggy because we only clear the first entry.

Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 


Good catch.

Reviewed-by: Junwei Zhang 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
   GFP_KERNEL | __GFP_ZERO);
if (!parent->entries)
return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
}

from = saddr >> shift;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


When to kmap PT BOs?

2018-08-29 Thread Felix Kuehling
Hi,

Currently PT BOs are kmapped in amdgpu_vm_update_directories. That
means, to avoid kernel oopses after page table evictions, I need to call
amdgpu_vm_update_directories before calling amdgpu_vm_bo_update.

But amdgpu_vm_bo_update can also move PTs on the vm->relocated list
during huge page handling. That means I also need to call
amdgpu_vm_update_directories after amdgpu_vm_bo_update.

I think a better solution is to move kmapping out of
amdgpu_vm_update_directories. But I'm not sure what's the right place
for it. Any suggestions? For a quick fix for kernel oopses after page
table evictions in the ROCm 1.9 release I'll call
amdgpu_vm_update_directories twice. If there are no new entries on the
vm->relocated lists, the second call won't add much overhead anyway.

Thanks,
  Felix

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Linux Compute Kernel
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _ _   _   _   _
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_/ |__/ \|   facebook.com/AMD | amd.com

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


PS4 Linux AMD Drivers Request.

2018-08-29 Thread Ugo Borde
Hello all Team,

I search with our Group (PS4Linux) request AMD drivers for PS4.
FailOverFlow Team write some drivers and one Guys, Marcan can run VULKAN !
Is it possible for you to help us ?
I send you the Telegram of our Group link:

@ps4linux4homebrews
https://t.me/ps4linux4homebrews

Or you can send me a mail.

astroma...@gmail.com

Thank you,

Astromatik.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[pull] amdgpu drm-fixes-4.19

2018-08-29 Thread Alex Deucher
Hi Dave,

Fixes for 4.19:
- SR-IOV fixes
- Kasan and page fault fix on device removal
- S3 stability fix for CZ/ST
- VCE regression fixes for CIK parts
- Avoid holding the mn_lock when allocating memory
- DC memory leak fix
- BO eviction fix

The following changes since commit 9d1d02ff36783f954a206dfbf7943b7f2057f58b:

  drm/amd/display: Don't build DCN1 when kcov is enabled (2018-08-21 14:33:59 
-0500)

are available in the git repository at:

  git://people.freedesktop.org/~agd5f/linux drm-fixes-4.19

for you to fetch changes up to 6ddd9769db4fc11a98bd7e58be1764e47fdb8384:

  drm/amdgpu: Need to set moved to true when evict bo (2018-08-28 12:42:48 
-0500)


Andrey Grodzovsky (1):
  drm/amdgpu: Fix page fault and kasan warning on pci device remove.

Christian König (3):
  drm/amdgpu: fix VM clearing for the root PD
  drm/amdgpu: fix preamble handling
  drm/amdgpu: fix holding mn_lock while allocating memory

Emily Deng (2):
  amdgpu: fix multi-process hang issue
  drm/amdgpu: Need to set moved to true when evict bo

Felix Kuehling (1):
  drm/amdgpu: Adjust the VM size based on system memory size v2

Rex Zhu (6):
  drm/amd/display: Fix bug use wrong pp interface
  drm/amdgpu: Enable/disable gfx PG feature in rlc safe mode
  drm/amdgpu: Fix vce initialize failed on Kaveri/Mullins
  drm/amdgpu: Update power state at the end of smu hw_init.
  drm/amdgpu: Power on uvd block when hw_fini
  drm/amdgpu: Remove duplicated power source update

SivapiriyanKumarasamy (1):
  drm/amd/display: Fix memory leak caused by missed dc_sink_release

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 47 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 35 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 11 -
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  9 +---
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  | 16 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 16 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 16 +--
 drivers/gpu/drm/amd/amdgpu/kv_dpm.c| 49 +-
 drivers/gpu/drm/amd/amdgpu/si_dpm.c|  3 +-
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c   | 12 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c  |  6 ++-
 14 files changed, 124 insertions(+), 109 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: How to gracefully handle pci remove

2018-08-29 Thread Deucher, Alexander
Take a look at what the udl drm driver does.  It's a usb display chip.


Alex


From: amd-gfx  on behalf of Andrey 
Grodzovsky 
Sent: Wednesday, August 29, 2018 2:28:42 PM
To: Daniel Vetter; Noralf Trønnes
Cc: Dave Airlie; amd-gfx@lists.freedesktop.org; ML dri-devel; Koenig, Christian
Subject: Re: How to gracefully handle pci remove

Actually, I've just spotted this drm_dev_unplug, does it make sense to
use it in our pci_driver.remove hook

instead of explicitly doing drm_dev_unregister and drm_dev_put(dev) ?

This way at least any following IOCTL will fail with ENODEV.

Andrey


On 08/29/2018 11:07 AM, Daniel Vetter wrote:
> On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky
>  wrote:
>> Just another ping...
>>
>> Daniel, Dave - maybe you could give some advise on that ?
>>
>> P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to
>> get some reference point, but it just hanged.
> drm_device hot-unplug is defacto unsolved. We've only just started to
> fix the most obvious races around the refcounting of drm_device
> it'self, see the work from Noralf Tronnes around drm_dev_get/put.
>
> No one has even started to think about what it would take to correctly
> refcount a full-blown memory manager to handle hotunplug. I'd expect
> lots of nightmares. The real horror is that it's not just the
> drm_device, but also lots of things we're exporting: dma_buf,
> dma_fence, ... All of that must be handled one way or the other.
>
> So expect your kernel to Oops when you unplug a device.
>
> Wrt userspace handling this: Probably an even bigger question. No
> idea, and will depend upon what userspace you're running.
> -Daniel
>
>> Andrey
>>
>>
>>
>>
>> On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:
>>> Hi everybody , I am trying to resolve various problems I observe when
>>> logically removing AMDGPU device from pci - echo 1 >
>>> /sys/class/drm/card0/device/remove
>>>
>>> One of the problems I encountered was hitting WARNs  in
>>> amdgpu_gem_force_release. It complaints  about still open client FDs and BOs
>>> allocations which is obvious since
>>>
>>> we didn't let user space clients know about the device removal and hence
>>> they won't release allocations and won't close their FDs.
>>>
>>> Question - how other drivers handle this use case, especially eGPUs since
>>> they indeed may be extracted in any moment, is there any way to notify Xorg
>>> and other clients about this so they may
>>>
>>> have a chance to release all their allocations and probably terminate ?
>>> Maybe some kind of uevent ?
>>>
>>> Andrey
>>>
>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: How to gracefully handle pci remove

2018-08-29 Thread Andrey Grodzovsky
Actually, I've just spotted this drm_dev_unplug, does it make sense to 
use it in our pci_driver.remove hook


instead of explicitly doing drm_dev_unregister and drm_dev_put(dev) ?

This way at least any following IOCTL will fail with ENODEV.

Andrey


On 08/29/2018 11:07 AM, Daniel Vetter wrote:

On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky
 wrote:

Just another ping...

Daniel, Dave - maybe you could give some advise on that ?

P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to
get some reference point, but it just hanged.

drm_device hot-unplug is defacto unsolved. We've only just started to
fix the most obvious races around the refcounting of drm_device
it'self, see the work from Noralf Tronnes around drm_dev_get/put.

No one has even started to think about what it would take to correctly
refcount a full-blown memory manager to handle hotunplug. I'd expect
lots of nightmares. The real horror is that it's not just the
drm_device, but also lots of things we're exporting: dma_buf,
dma_fence, ... All of that must be handled one way or the other.

So expect your kernel to Oops when you unplug a device.

Wrt userspace handling this: Probably an even bigger question. No
idea, and will depend upon what userspace you're running.
-Daniel


Andrey




On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:

Hi everybody , I am trying to resolve various problems I observe when
logically removing AMDGPU device from pci - echo 1 >
/sys/class/drm/card0/device/remove

One of the problems I encountered was hitting WARNs  in
amdgpu_gem_force_release. It complaints  about still open client FDs and BOs
allocations which is obvious since

we didn't let user space clients know about the device removal and hence
they won't release allocations and won't close their FDs.

Question - how other drivers handle this use case, especially eGPUs since
they indeed may be extracted in any moment, is there any way to notify Xorg
and other clients about this so they may

have a chance to release all their allocations and probably terminate ?
Maybe some kind of uevent ?

Andrey






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH xf86-video-amdgpu 2/3] Don't use xorg_list_for_each_entry_safe for signalled flips

2018-08-29 Thread Michel Dänzer
From: Michel Dänzer 

drm_wait_pending_flip can get called from drm_handle_event, in which
case xorg_list_for_each_entry_safe can end up processing the same entry
in both. To avoid this, just process the first list entry until the list
is empty.

Signed-off-by: Michel Dänzer 
---
 src/amdgpu_drm_queue.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/amdgpu_drm_queue.c b/src/amdgpu_drm_queue.c
index f9db81c08..ba841d1ef 100644
--- a/src/amdgpu_drm_queue.c
+++ b/src/amdgpu_drm_queue.c
@@ -257,8 +257,11 @@ amdgpu_drm_handle_event(int fd, drmEventContext 
*event_context)
 
r = drmHandleEvent(fd, event_context);
 
-   xorg_list_for_each_entry_safe(e, tmp, _drm_flip_signalled, list)
+   while (!xorg_list_is_empty(_drm_flip_signalled)) {
+   e = xorg_list_first_entry(_drm_flip_signalled,
+ struct amdgpu_drm_queue_entry, list);
amdgpu_drm_queue_handle_one(e);
+   }
 
xorg_list_for_each_entry_safe(e, tmp, _drm_vblank_signalled, 
list) {
drmmode_crtc_private_ptr drmmode_crtc = e->crtc->driver_private;
@@ -277,12 +280,15 @@ void amdgpu_drm_wait_pending_flip(xf86CrtcPtr crtc)
 {
drmmode_crtc_private_ptr drmmode_crtc = crtc->driver_private;
AMDGPUEntPtr pAMDGPUEnt = AMDGPUEntPriv(crtc->scrn);
-   struct amdgpu_drm_queue_entry *e, *tmp;
+   struct amdgpu_drm_queue_entry *e;
 
drmmode_crtc->wait_flip_nesting_level++;
 
-   xorg_list_for_each_entry_safe(e, tmp, _drm_flip_signalled, list)
+   while (!xorg_list_is_empty(_drm_flip_signalled)) {
+   e = xorg_list_first_entry(_drm_flip_signalled,
+ struct amdgpu_drm_queue_entry, list);
amdgpu_drm_queue_handle_one(e);
+   }
 
while (drmmode_crtc->flip_pending
   && amdgpu_drm_handle_event(pAMDGPUEnt->fd,
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH xf86-video-amdgpu 1/3] Always delete entry from list in drm_queue_handler

2018-08-29 Thread Michel Dänzer
From: Michel Dänzer 

We left entries without a handler hook in the list, so the list could
keep taking longer to process and use up more memory.

Signed-off-by: Michel Dänzer 
---
 src/amdgpu_drm_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amdgpu_drm_queue.c b/src/amdgpu_drm_queue.c
index b13d28014..f9db81c08 100644
--- a/src/amdgpu_drm_queue.c
+++ b/src/amdgpu_drm_queue.c
@@ -82,7 +82,7 @@ amdgpu_drm_queue_handler(struct xorg_list *signalled, 
unsigned int frame,
xorg_list_for_each_entry_safe(e, tmp, _drm_queue, list) {
if (e->seq == seq) {
if (!e->handler) {
-   e->abort(e->crtc, e->data);
+   amdgpu_drm_queue_handle_one(e);
break;
}
 
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH xf86-video-amdgpu 3/3] Bail early from drm_wait_pending_flip if there's no pending flip

2018-08-29 Thread Michel Dänzer
From: Michel Dänzer 

No need to process any events in that case.

Signed-off-by: Michel Dänzer 
---
 src/amdgpu_drm_queue.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/amdgpu_drm_queue.c b/src/amdgpu_drm_queue.c
index ba841d1ef..61732b11c 100644
--- a/src/amdgpu_drm_queue.c
+++ b/src/amdgpu_drm_queue.c
@@ -284,6 +284,9 @@ void amdgpu_drm_wait_pending_flip(xf86CrtcPtr crtc)
 
drmmode_crtc->wait_flip_nesting_level++;
 
+   if (!drmmode_crtc->flip_pending)
+   return;
+
while (!xorg_list_is_empty(_drm_flip_signalled)) {
e = xorg_list_first_entry(_drm_flip_signalled,
  struct amdgpu_drm_queue_entry, list);
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Alex Deucher
On Wed, Aug 29, 2018 at 5:13 AM Evan Quan  wrote:
>
> Vega20 supports only sclk voltage overdrive. And user can
> only tell three groups of . SMU
> firmware will recalculate the frequency/voltage curve. Other
> intermediate levles will be stretched/shrunk accordingly.
>

What about the min/max sclk? Do the curve values take that into
account?  What about max mclk?

> Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165 +-
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
>  3 files changed, 192 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index e2577518b9c6..6e0c8f583521 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
>   * in each power level within a power state.  The pp_od_clk_voltage is used 
> for
>   * this.
>   *
> + * < For Vega10 and previous ASICs >
> + *
>   * Reading the file will display:
>   *
>   * - a list of engine clock levels and voltages labeled OD_SCLK
> @@ -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
>   * "c" (commit) to the file to commit your changes.  If you want to reset to 
> the
>   * default power levels, write "r" (reset) to the file to reset them.
>   *
> + *
> + * < For Vega20 >
> + *
> + * Reading the file will display:
> + *
> + * - three groups of engine clock and voltage offset labeled OD_SCLK
> + *
> + * - a list of valid ranges for above three groups of sclk and voltage offset
> + *   labeled OD_RANGE
> + *
> + * To manually adjust these settings:
> + *
> + * - First select manual using power_dpm_force_performance_level
> + *
> + * - Enter a new value for each group by writing a string that contains
> + *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 20"
> + *   will update group1 sclk to be 500 MHz with voltage increased 20mV
> + *

Are negative offsets supported?

Alex

> + * - When you have edited all of the states as needed, write "c" (commit)
> + *   to the file to commit your changes
> + *
> + * - If you want to reset to the default power levels, write "r" (reset)
> + *   to the file to reset them
> + *
>   */
>
>  static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> index ececa2f7fe5f..9f6e070a76e0 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> @@ -1096,6 +1096,13 @@ static int vega20_od8_initialize_default_settings(
> }
> }
>
> +   ret = vega20_copy_table_from_smc(hwmgr,
> +   (uint8_t *)(>od_table),
> +   TABLE_OVERDRIVE);
> +   PP_ASSERT_WITH_CODE(!ret,
> +   "Failed to export over drive table!",
> +   return ret);
> +
> return 0;
>  }
>
> @@ -2506,11 +2513,112 @@ static int 
> vega20_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
> return 0;
>  }
>
> +static int vega20_odn_edit_dpm_table(struct pp_hwmgr *hwmgr,
> +   enum PP_OD_DPM_TABLE_COMMAND type,
> +   long *input, uint32_t size)
> +{
> +   struct vega20_hwmgr *data =
> +   (struct vega20_hwmgr *)(hwmgr->backend);
> +   struct vega20_od8_single_setting *od8_settings =
> +   data->od8_settings.od8_settings_array;
> +   OverDriveTable_t od_table;
> +   int32_t input_index, input_clk, input_vol, i;
> +   int ret = 0;
> +
> +   PP_ASSERT_WITH_CODE(input, "NULL user input for clock and voltage",
> +   return -EINVAL);
> +
> +   switch (type) {
> +   case PP_OD_EDIT_SCLK_VDDC_TABLE:
> +   if (!(od8_settings[OD8_SETTING_GFXCLK_FREQ1].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_FREQ2].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_FREQ3].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_VOLTAGE1].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_VOLTAGE2].feature_id &&
> +   od8_settings[OD8_SETTING_GFXCLK_VOLTAGE3].feature_id)) {
> +   pr_info("Sclk voltage overdrive not supported\n");
> +   return 0;
> +   }
> +
> +   for (i = 0; i < size; i += 3) {
> +   if (i + 3 > size) {
> +   pr_info("invalid clock voltage input\n");
> +   return 0;
> +   }
> +
> +   input_index = input[i];
> 

Re: [PATCH] drm/amdgpu: Relocate some definitions

2018-08-29 Thread Felix Kuehling

On 2018-08-29 10:42 AM, Amber Lin wrote:
> Move some KFD-related (but used in amdgpu_drv.c) definitions from
> kfd_priv.h to kgd_kfd_interface.h so we don't need to include kfd_priv.h
> in amdgpu_drv.c. This fixes a build failure when AMDGPU is enabled but
> MMU_NOTIFIER is not.
> This patch also disables KFD-related module options when HSA_AMD is not
> enabled.
>
> Signed-off-by: Amber Lin 

Reviewed-by: Felix Kuehling 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h   | 28 
> -
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 28 
> +
>  3 files changed, 38 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index dd6d8b1..e4d0d72 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -39,7 +39,6 @@
>  #include "amdgpu_gem.h"
>  
>  #include "amdgpu_amdkfd.h"
> -#include "kfd_priv.h"
>  
>  /*
>   * KMS wrapper.
> @@ -128,15 +127,6 @@ int amdgpu_compute_multipipe = -1;
>  int amdgpu_gpu_recovery = -1; /* auto */
>  int amdgpu_emu_mode = 0;
>  uint amdgpu_smu_memory_pool_size = 0;
> -/* KFD parameters */
> -int sched_policy = KFD_SCHED_POLICY_HWS;
> -int hws_max_conc_proc = 8;
> -int cwsr_enable = 1;
> -int max_num_of_queues_per_device = KFD_MAX_NUM_OF_QUEUES_PER_DEVICE_DEFAULT;
> -int send_sigterm;
> -int debug_largebar;
> -int ignore_crat;
> -int vega10_noretry;
>  
>  /**
>   * DOC: vramlimit (int)
> @@ -542,12 +532,14 @@ MODULE_PARM_DESC(smu_memory_pool_size,
>   "0x1 = 256Mbyte, 0x2 = 512Mbyte, 0x4 = 1 Gbyte, 0x8 = 2GByte");
>  module_param_named(smu_memory_pool_size, amdgpu_smu_memory_pool_size, uint, 
> 0444);
>  
> +#ifdef CONFIG_HSA_AMD
>  /**
>   * DOC: sched_policy (int)
>   * Set scheduling policy. Default is HWS(hardware scheduling) with 
> over-subscription.
>   * Setting 1 disables over-subscription. Setting 2 disables HWS and 
> statically
>   * assigns queues to HQDs.
>   */
> +int sched_policy = KFD_SCHED_POLICY_HWS;
>  module_param(sched_policy, int, 0444);
>  MODULE_PARM_DESC(sched_policy,
>   "Scheduling policy (0 = HWS (Default), 1 = HWS without 
> over-subscription, 2 = Non-HWS (Used for debugging only)");
> @@ -557,6 +549,7 @@ MODULE_PARM_DESC(sched_policy,
>   * Maximum number of processes that HWS can schedule concurrently. The 
> maximum is the
>   * number of VMIDs assigned to the HWS, which is also the default.
>   */
> +int hws_max_conc_proc = 8;
>  module_param(hws_max_conc_proc, int, 0444);
>  MODULE_PARM_DESC(hws_max_conc_proc,
>   "Max # processes HWS can execute concurrently when sched_policy=0 (0 = 
> no concurrency, #VMIDs for KFD = Maximum(default))");
> @@ -567,6 +560,7 @@ MODULE_PARM_DESC(hws_max_conc_proc,
>   * the middle of a compute wave. Default is 1 to enable this feature. 
> Setting 0
>   * disables it.
>   */
> +int cwsr_enable = 1;
>  module_param(cwsr_enable, int, 0444);
>  MODULE_PARM_DESC(cwsr_enable, "CWSR enable (0 = Off, 1 = On (Default))");
>  
> @@ -575,6 +569,7 @@ MODULE_PARM_DESC(cwsr_enable, "CWSR enable (0 = Off, 1 = 
> On (Default))");
>   * Maximum number of queues per device. Valid setting is between 1 and 4096. 
> Default
>   * is 4096.
>   */
> +int max_num_of_queues_per_device = KFD_MAX_NUM_OF_QUEUES_PER_DEVICE_DEFAULT;
>  module_param(max_num_of_queues_per_device, int, 0444);
>  MODULE_PARM_DESC(max_num_of_queues_per_device,
>   "Maximum number of supported queues per device (1 = Minimum, 4096 = 
> default)");
> @@ -584,6 +579,7 @@ MODULE_PARM_DESC(max_num_of_queues_per_device,
>   * Send sigterm to HSA process on unhandled exceptions. Default is not to 
> send sigterm
>   * but just print errors on dmesg. Setting 1 enables sending sigterm.
>   */
> +int send_sigterm;
>  module_param(send_sigterm, int, 0444);
>  MODULE_PARM_DESC(send_sigterm,
>   "Send sigterm to HSA process on unhandled exception (0 = disable, 1 = 
> enable)");
> @@ -595,6 +591,7 @@ MODULE_PARM_DESC(send_sigterm,
>   * size, usually 256MB.
>   * Default value is 0, diabled.
>   */
> +int debug_largebar;
>  module_param(debug_largebar, int, 0444);
>  MODULE_PARM_DESC(debug_largebar,
>   "Debug large-bar flag used to simulate large-bar capability on 
> non-large bar machine (0 = disable, 1 = enable)");
> @@ -605,6 +602,7 @@ MODULE_PARM_DESC(debug_largebar,
>   * table to get information about AMD APUs. This option can serve as a 
> workaround on
>   * systems with a broken CRAT table.
>   */
> +int ignore_crat;
>  module_param(ignore_crat, int, 0444);
>  MODULE_PARM_DESC(ignore_crat,
>   "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 
> = ignore CRAT)");
> @@ -615,9 +613,11 @@ MODULE_PARM_DESC(ignore_crat,
>   * Setting 1 disables retry.
>   * Retry is needed for recoverable page faults.
>   */
> +int vega10_noretry;
>  

Re: [PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Philip Yang 

Sent: Wednesday, August 29, 2018 11:17:44 AM
To: amd-gfx@lists.freedesktop.org
Cc: Yang, Philip
Subject: [PATCH] drm/amdgpu: remove redundant memset

kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary,
and in this case buggy because we only clear the first entry.

Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
GFP_KERNEL | __GFP_ZERO);
 if (!parent->entries)
 return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
 }

 from = saddr >> shift;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH xf86-video-amdgpu 1/2] Handle ihandle == -1 in amdgpu_set_shared_pixmap_backing

2018-08-29 Thread Michel Dänzer
From: Michel Dänzer 

It means to stop using the shared pixmap backing.

(Ported from radeon commit 1799680f7bd84e0618f34f4c7486799521ddaf83)

Signed-off-by: Michel Dänzer 
---
 src/amdgpu_bo_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/amdgpu_bo_helper.c b/src/amdgpu_bo_helper.c
index 98eb04a64..6fd68846f 100644
--- a/src/amdgpu_bo_helper.c
+++ b/src/amdgpu_bo_helper.c
@@ -400,6 +400,9 @@ Bool amdgpu_set_shared_pixmap_backing(PixmapPtr ppix, void 
*fd_handle)
uint32_t size = ppix->devKind * ppix->drawable.height;
Bool ret;
 
+   if (ihandle == -1)
+   return amdgpu_set_pixmap_bo(ppix, NULL);
+
if (info->gbm) {
struct amdgpu_buffer *bo;
struct gbm_import_fd_data data;
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH xf86-video-amdgpu 2/2] glamor: Handle ihandle == -1 in amdgpu_glamor_set_shared_pixmap_backing

2018-08-29 Thread Michel Dänzer
From: Michel Dänzer 

(Ported from radeon commit de88ea2755611bdcb18d91d8234d2ab5be8ff2e9)

Signed-off-by: Michel Dänzer 
---
 src/amdgpu_glamor.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/amdgpu_glamor.c b/src/amdgpu_glamor.c
index 13d68fe36..699861f73 100644
--- a/src/amdgpu_glamor.c
+++ b/src/amdgpu_glamor.c
@@ -384,6 +384,7 @@ amdgpu_glamor_set_shared_pixmap_backing(PixmapPtr pixmap, 
void *handle)
 {
ScreenPtr screen = pixmap->drawable.pScreen;
ScrnInfoPtr scrn = xf86ScreenToScrn(screen);
+   int ihandle = (int)(long)handle;
struct amdgpu_pixmap *priv;
 
if (!amdgpu_set_shared_pixmap_backing(pixmap, handle))
@@ -391,7 +392,8 @@ amdgpu_glamor_set_shared_pixmap_backing(PixmapPtr pixmap, 
void *handle)
 
priv = amdgpu_get_pixmap_private(pixmap);
 
-   if (!amdgpu_glamor_create_textured_pixmap(pixmap, priv->bo)) {
+   if (ihandle != -1 &&
+   !amdgpu_glamor_create_textured_pixmap(pixmap, priv->bo)) {
xf86DrvMsg(scrn->scrnIndex, X_ERROR,
   "Failed to get PRIME drawable for glamor pixmap.\n");
return FALSE;
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: How to gracefully handle pci remove

2018-08-29 Thread Andrey Grodzovsky

Thanks.

Andrey


On 08/29/2018 11:07 AM, Daniel Vetter wrote:

On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky
 wrote:

Just another ping...

Daniel, Dave - maybe you could give some advise on that ?

P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to
get some reference point, but it just hanged.

drm_device hot-unplug is defacto unsolved. We've only just started to
fix the most obvious races around the refcounting of drm_device
it'self, see the work from Noralf Tronnes around drm_dev_get/put.

No one has even started to think about what it would take to correctly
refcount a full-blown memory manager to handle hotunplug. I'd expect
lots of nightmares. The real horror is that it's not just the
drm_device, but also lots of things we're exporting: dma_buf,
dma_fence, ... All of that must be handled one way or the other.

So expect your kernel to Oops when you unplug a device.

Wrt userspace handling this: Probably an even bigger question. No
idea, and will depend upon what userspace you're running.
-Daniel


Andrey




On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:

Hi everybody , I am trying to resolve various problems I observe when
logically removing AMDGPU device from pci - echo 1 >
/sys/class/drm/card0/device/remove

One of the problems I encountered was hitting WARNs  in
amdgpu_gem_force_release. It complaints  about still open client FDs and BOs
allocations which is obvious since

we didn't let user space clients know about the device removal and hence
they won't release allocations and won't close their FDs.

Question - how other drivers handle this use case, especially eGPUs since
they indeed may be extracted in any moment, is there any way to notify Xorg
and other clients about this so they may

have a chance to release all their allocations and probably terminate ?
Maybe some kind of uevent ?

Andrey






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Philip Yang
kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary,
and in this case buggy because we only clear the first entry.

Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
   GFP_KERNEL | __GFP_ZERO);
if (!parent->entries)
return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
}
 
from = saddr >> shift;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Christian König

Am 29.08.2018 um 17:13 schrieb Philip Yang:

kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary


and in this case buggy because we only clear the first entry.



Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
   GFP_KERNEL | __GFP_ZERO);
if (!parent->entries)
return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
}
  
  	from = saddr >> shift;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Philip Yang
kvmalloc_array uses __GFP_ZERO flag ensures that the returned address
is zeroed already, memset it to zero again afterwards is unnecessary

Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
   GFP_KERNEL | __GFP_ZERO);
if (!parent->entries)
return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
}
 
from = saddr >> shift;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: How to gracefully handle pci remove

2018-08-29 Thread Daniel Vetter
On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky
 wrote:
> Just another ping...
>
> Daniel, Dave - maybe you could give some advise on that ?
>
> P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to
> get some reference point, but it just hanged.

drm_device hot-unplug is defacto unsolved. We've only just started to
fix the most obvious races around the refcounting of drm_device
it'self, see the work from Noralf Tronnes around drm_dev_get/put.

No one has even started to think about what it would take to correctly
refcount a full-blown memory manager to handle hotunplug. I'd expect
lots of nightmares. The real horror is that it's not just the
drm_device, but also lots of things we're exporting: dma_buf,
dma_fence, ... All of that must be handled one way or the other.

So expect your kernel to Oops when you unplug a device.

Wrt userspace handling this: Probably an even bigger question. No
idea, and will depend upon what userspace you're running.
-Daniel

>
> Andrey
>
>
>
>
> On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:
>>
>> Hi everybody , I am trying to resolve various problems I observe when
>> logically removing AMDGPU device from pci - echo 1 >
>> /sys/class/drm/card0/device/remove
>>
>> One of the problems I encountered was hitting WARNs  in
>> amdgpu_gem_force_release. It complaints  about still open client FDs and BOs
>> allocations which is obvious since
>>
>> we didn't let user space clients know about the device removal and hence
>> they won't release allocations and won't close their FDs.
>>
>> Question - how other drivers handle this use case, especially eGPUs since
>> they indeed may be extracted in any moment, is there any way to notify Xorg
>> and other clients about this so they may
>>
>> have a chance to release all their allocations and probably terminate ?
>> Maybe some kind of uevent ?
>>
>> Andrey
>>
>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Christian König

Am 29.08.2018 um 17:01 schrieb Philip Yang:

Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 


You need to add a commit message, explaining why that can be removed.

With that done the patch is Reviewed-by: Christian König 
.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
   GFP_KERNEL | __GFP_ZERO);
if (!parent->entries)
return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
}
  
  	from = saddr >> shift;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: remove redundant memset

2018-08-29 Thread Philip Yang
Change-Id: If94a59d3cbf2690dd2a1e2add71bc393df6a9686
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 153c9be..33d9ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -540,7 +540,6 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
   GFP_KERNEL | __GFP_ZERO);
if (!parent->entries)
return -ENOMEM;
-   memset(parent->entries, 0 , sizeof(struct amdgpu_vm_pt));
}
 
from = saddr >> shift;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

2018-08-29 Thread Christian König

Am 29.08.2018 um 16:51 schrieb Michel Dänzer:

On 2018-08-29 10:57 a.m., Christian König wrote:

Am 29.08.2018 um 09:52 schrieb Michel Dänzer:

On 2018-08-28 7:03 p.m., Michel Dänzer wrote:

On 2018-08-28 11:14 a.m., Michel Dänzer wrote:

On 2018-08-22 9:52 a.m., Huang Rui wrote:

The new bulk moving functionality is ready, the overhead of moving
PD/PT bos to
LRU is fixed. So move them on LRU again.

Signed-off-by: Huang Rui 
Tested-by: Mike Lothian 
Tested-by: Dieter Nützel 
Acked-by: Chunming Zhou 
Reviewed-by: Junwei Zhang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index db1f28a..d195a3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct
amdgpu_device *adev,
  struct amdgpu_vm_bo_base,
  vm_status);
   bo_base->moved = false;
-    list_del_init(_base->vm_status);
+    list_move(_base->vm_status, >idle);
     bo = bo_base->bo->parent;
   if (!bo)


Since this change, I'm getting various badness when running piglit
using
radeonsi on Bonaire, see the attached dmesg excerpt.

Reverting just this change on top of current amd-staging-drm-next
avoids
the problem.

Looks like some list manipulation isn't sufficiently protected against
concurrent execution?

KASAN pointed me to one issue:
https://patchwork.freedesktop.org/patch/246212/

However, this doesn't fully fix the problem.

Ray, any ideas yet for solving this? If not, let's revert this change
for now.

I've gone over this multiple times now as well, but can't find anything
obvious wrong either.

After looking at the code, one question: Why does vm->moved need a
spinlock, but not vm->idle? What is protecting against concurrent access
to the latter?


The moved state is occupied by both normal and per VM BOs, e.g. BOs with 
different reservation objects.


All other states are only used by per VM BOs or PDs/PTs, so we only put 
the BOs on those when the reservation object of the root BO is locked.


We could probably split the moved state into two separate ones to 
further avoid that lock.


Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

2018-08-29 Thread Michel Dänzer
On 2018-08-29 10:57 a.m., Christian König wrote:
> Am 29.08.2018 um 09:52 schrieb Michel Dänzer:
>> On 2018-08-28 7:03 p.m., Michel Dänzer wrote:
>>> On 2018-08-28 11:14 a.m., Michel Dänzer wrote:
 On 2018-08-22 9:52 a.m., Huang Rui wrote:
> The new bulk moving functionality is ready, the overhead of moving
> PD/PT bos to
> LRU is fixed. So move them on LRU again.
>
> Signed-off-by: Huang Rui 
> Tested-by: Mike Lothian 
> Tested-by: Dieter Nützel 
> Acked-by: Chunming Zhou 
> Reviewed-by: Junwei Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index db1f28a..d195a3d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct
> amdgpu_device *adev,
>  struct amdgpu_vm_bo_base,
>  vm_status);
>   bo_base->moved = false;
> -    list_del_init(_base->vm_status);
> +    list_move(_base->vm_status, >idle);
>     bo = bo_base->bo->parent;
>   if (!bo)
>
 Since this change, I'm getting various badness when running piglit
 using
 radeonsi on Bonaire, see the attached dmesg excerpt.

 Reverting just this change on top of current amd-staging-drm-next
 avoids
 the problem.

 Looks like some list manipulation isn't sufficiently protected against
 concurrent execution?
>>> KASAN pointed me to one issue:
>>> https://patchwork.freedesktop.org/patch/246212/
>>>
>>> However, this doesn't fully fix the problem.
>> Ray, any ideas yet for solving this? If not, let's revert this change
>> for now.
> 
> I've gone over this multiple times now as well, but can't find anything
> obvious wrong either.

After looking at the code, one question: Why does vm->moved need a
spinlock, but not vm->idle? What is protecting against concurrent access
to the latter?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: How to gracefully handle pci remove

2018-08-29 Thread Andrey Grodzovsky

Just another ping...

Daniel, Dave - maybe you could give some advise on that ?

P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the 
same to get some reference point, but it just hanged.


Andrey



On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote:
Hi everybody , I am trying to resolve various problems I observe when 
logically removing AMDGPU device from pci - echo 1 > 
/sys/class/drm/card0/device/remove


One of the problems I encountered was hitting WARNs  in 
amdgpu_gem_force_release. It complaints  about still open client FDs 
and BOs allocations which is obvious since


we didn't let user space clients know about the device removal and 
hence they won't release allocations and won't close their FDs.


Question - how other drivers handle this use case, especially eGPUs 
since they indeed may be extracted in any moment, is there any way to 
notify Xorg and other clients about this so they may


have a chance to release all their allocations and probably terminate 
? Maybe some kind of uevent ?


Andrey



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Relocate some definitions

2018-08-29 Thread Amber Lin
Move some KFD-related (but used in amdgpu_drv.c) definitions from
kfd_priv.h to kgd_kfd_interface.h so we don't need to include kfd_priv.h
in amdgpu_drv.c. This fixes a build failure when AMDGPU is enabled but
MMU_NOTIFIER is not.
This patch also disables KFD-related module options when HSA_AMD is not
enabled.

Signed-off-by: Amber Lin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 20 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   | 28 -
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 28 +
 3 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index dd6d8b1..e4d0d72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -39,7 +39,6 @@
 #include "amdgpu_gem.h"
 
 #include "amdgpu_amdkfd.h"
-#include "kfd_priv.h"
 
 /*
  * KMS wrapper.
@@ -128,15 +127,6 @@ int amdgpu_compute_multipipe = -1;
 int amdgpu_gpu_recovery = -1; /* auto */
 int amdgpu_emu_mode = 0;
 uint amdgpu_smu_memory_pool_size = 0;
-/* KFD parameters */
-int sched_policy = KFD_SCHED_POLICY_HWS;
-int hws_max_conc_proc = 8;
-int cwsr_enable = 1;
-int max_num_of_queues_per_device = KFD_MAX_NUM_OF_QUEUES_PER_DEVICE_DEFAULT;
-int send_sigterm;
-int debug_largebar;
-int ignore_crat;
-int vega10_noretry;
 
 /**
  * DOC: vramlimit (int)
@@ -542,12 +532,14 @@ MODULE_PARM_DESC(smu_memory_pool_size,
"0x1 = 256Mbyte, 0x2 = 512Mbyte, 0x4 = 1 Gbyte, 0x8 = 2GByte");
 module_param_named(smu_memory_pool_size, amdgpu_smu_memory_pool_size, uint, 
0444);
 
+#ifdef CONFIG_HSA_AMD
 /**
  * DOC: sched_policy (int)
  * Set scheduling policy. Default is HWS(hardware scheduling) with 
over-subscription.
  * Setting 1 disables over-subscription. Setting 2 disables HWS and statically
  * assigns queues to HQDs.
  */
+int sched_policy = KFD_SCHED_POLICY_HWS;
 module_param(sched_policy, int, 0444);
 MODULE_PARM_DESC(sched_policy,
"Scheduling policy (0 = HWS (Default), 1 = HWS without 
over-subscription, 2 = Non-HWS (Used for debugging only)");
@@ -557,6 +549,7 @@ MODULE_PARM_DESC(sched_policy,
  * Maximum number of processes that HWS can schedule concurrently. The maximum 
is the
  * number of VMIDs assigned to the HWS, which is also the default.
  */
+int hws_max_conc_proc = 8;
 module_param(hws_max_conc_proc, int, 0444);
 MODULE_PARM_DESC(hws_max_conc_proc,
"Max # processes HWS can execute concurrently when sched_policy=0 (0 = 
no concurrency, #VMIDs for KFD = Maximum(default))");
@@ -567,6 +560,7 @@ MODULE_PARM_DESC(hws_max_conc_proc,
  * the middle of a compute wave. Default is 1 to enable this feature. Setting 0
  * disables it.
  */
+int cwsr_enable = 1;
 module_param(cwsr_enable, int, 0444);
 MODULE_PARM_DESC(cwsr_enable, "CWSR enable (0 = Off, 1 = On (Default))");
 
@@ -575,6 +569,7 @@ MODULE_PARM_DESC(cwsr_enable, "CWSR enable (0 = Off, 1 = On 
(Default))");
  * Maximum number of queues per device. Valid setting is between 1 and 4096. 
Default
  * is 4096.
  */
+int max_num_of_queues_per_device = KFD_MAX_NUM_OF_QUEUES_PER_DEVICE_DEFAULT;
 module_param(max_num_of_queues_per_device, int, 0444);
 MODULE_PARM_DESC(max_num_of_queues_per_device,
"Maximum number of supported queues per device (1 = Minimum, 4096 = 
default)");
@@ -584,6 +579,7 @@ MODULE_PARM_DESC(max_num_of_queues_per_device,
  * Send sigterm to HSA process on unhandled exceptions. Default is not to send 
sigterm
  * but just print errors on dmesg. Setting 1 enables sending sigterm.
  */
+int send_sigterm;
 module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = 
enable)");
@@ -595,6 +591,7 @@ MODULE_PARM_DESC(send_sigterm,
  * size, usually 256MB.
  * Default value is 0, diabled.
  */
+int debug_largebar;
 module_param(debug_largebar, int, 0444);
 MODULE_PARM_DESC(debug_largebar,
"Debug large-bar flag used to simulate large-bar capability on 
non-large bar machine (0 = disable, 1 = enable)");
@@ -605,6 +602,7 @@ MODULE_PARM_DESC(debug_largebar,
  * table to get information about AMD APUs. This option can serve as a 
workaround on
  * systems with a broken CRAT table.
  */
+int ignore_crat;
 module_param(ignore_crat, int, 0444);
 MODULE_PARM_DESC(ignore_crat,
"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 
= ignore CRAT)");
@@ -615,9 +613,11 @@ MODULE_PARM_DESC(ignore_crat,
  * Setting 1 disables retry.
  * Retry is needed for recoverable page faults.
  */
+int vega10_noretry;
 module_param_named(noretry, vega10_noretry, int, 0644);
 MODULE_PARM_DESC(noretry,
"Set sh_mem_config.retry_disable on Vega10 (0 = retry enabled 
(default), 1 = retry disabled)");
+#endif
 
 static const struct pci_device_id pciidlist[] = {
 #ifdef  CONFIG_DRM_AMDGPU_SI
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 

[PATCH 6/7] drm/amdgpu: enable AGP aperture for GMC9

2018-08-29 Thread Christian König
Enable the old AGP aperture to avoid GART mappings.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 10 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  1 +
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 10 +-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 3403ded39d13..ffd0ec9586d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -65,16 +65,16 @@ static void gfxhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
 {
uint64_t value;
 
-   /* Disable AGP. */
+   /* Program the AGP BAR */
WREG32_SOC15(GC, 0, mmMC_VM_AGP_BASE, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, 0x);
+   WREG32_SOC15(GC, 0, mmMC_VM_AGP_BOT, adev->gmc.agp_start >> 24);
+   WREG32_SOC15(GC, 0, mmMC_VM_AGP_TOP, adev->gmc.agp_end >> 24);
 
/* Program the system aperture low logical page number. */
WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
-adev->gmc.vram_start >> 18);
+min(adev->gmc.vram_start, adev->gmc.agp_start) >> 18);
WREG32_SOC15(GC, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
-adev->gmc.vram_end >> 18);
+max(adev->gmc.vram_end, adev->gmc.agp_end) >> 18);
 
/* Set default page address. */
value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 04d50893a6f2..719f45cdaf6a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -751,6 +751,7 @@ static void gmc_v9_0_vram_gtt_location(struct amdgpu_device 
*adev,
base = mmhub_v1_0_get_fb_location(adev);
amdgpu_gmc_vram_location(adev, >gmc, base);
amdgpu_gmc_gart_location(adev, mc);
+   amdgpu_gmc_agp_location(adev, mc);
/* base offset of vram pages */
adev->vm_manager.vram_base_offset = gfxhub_v1_0_get_mc_fb_offset(adev);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 5f6a9c85488f..73d7c075dd33 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -76,16 +76,16 @@ static void mmhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
uint64_t value;
uint32_t tmp;
 
-   /* Disable AGP. */
+   /* Program the AGP BAR */
WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_BASE, 0);
-   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_TOP, 0);
-   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_BOT, 0x00FF);
+   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_BOT, adev->gmc.agp_start >> 24);
+   WREG32_SOC15(MMHUB, 0, mmMC_VM_AGP_TOP, adev->gmc.agp_end >> 24);
 
/* Program the system aperture low logical page number. */
WREG32_SOC15(MMHUB, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
-adev->gmc.vram_start >> 18);
+min(adev->gmc.vram_start, adev->gmc.agp_start) >> 18);
WREG32_SOC15(MMHUB, 0, mmMC_VM_SYSTEM_APERTURE_HIGH_ADDR,
-adev->gmc.vram_end >> 18);
+max(adev->gmc.vram_end, adev->gmc.agp_end) >> 18);
 
/* Set default page address. */
value = adev->vram_scratch.gpu_addr - adev->gmc.vram_start +
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/7] drm/amdgpu: add amdgpu_gmc_agp_location v2

2018-08-29 Thread Christian König
Helper to figure out the location of the AGP BAR.

v2: fix a couple of bugs

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 43 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  5 +++
 2 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index c6bcc4715373..1d201fd3f4af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -143,3 +143,46 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);
 }
+
+/**
+ * amdgpu_gmc_agp_location - try to find AGP location
+ * @adev: amdgpu device structure holding all necessary informations
+ * @mc: memory controller structure holding memory informations
+ *
+ * Function will place try to find a place for the AGP BAR in the MC address
+ * space.
+ *
+ * AGP BAR will be assigned the largest available hole in the address space.
+ * Should be called after VRAM and GART locations are setup.
+ */
+void amdgpu_gmc_agp_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc)
+{
+   const uint64_t sixteen_gb = 1ULL << 34;
+   const uint64_t sixteen_gb_mask = ~(sixteen_gb - 1);
+   u64 size_af, size_bf;
+
+   if (mc->vram_start > mc->gart_start) {
+   size_bf = (mc->vram_start & sixteen_gb_mask) -
+   ALIGN(mc->gart_end + 1, sixteen_gb);
+   size_af = mc->mc_mask + 1 - ALIGN(mc->vram_end, sixteen_gb);
+   } else {
+   size_bf = mc->vram_start & sixteen_gb_mask;
+   size_af = (mc->gart_start & sixteen_gb_mask) -
+   ALIGN(mc->vram_end, sixteen_gb);
+   }
+
+   if (size_bf > size_af) {
+   mc->agp_start = mc->vram_start > mc->gart_start ?
+   mc->gart_end + 1 : 0;
+   mc->agp_size = size_bf;
+   } else {
+   mc->agp_start = (mc->vram_start > mc->gart_start ?
+   mc->vram_end : mc->gart_end) + 1,
+   mc->agp_size = size_af;
+   }
+
+   mc->agp_start = ALIGN(mc->agp_start, sixteen_gb);
+   mc->agp_end = mc->agp_start + mc->agp_size - 1;
+   dev_info(adev->dev, "AGP: %lluM 0x%016llX - 0x%016llX\n",
+   mc->agp_size >> 20, mc->agp_start, mc->agp_end);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 48715dd5808a..c9985e7dc9e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -94,6 +94,9 @@ struct amdgpu_gmc {
 * about vram size near mc fb location */
u64 mc_vram_size;
u64 visible_vram_size;
+   u64 agp_size;
+   u64 agp_start;
+   u64 agp_end;
u64 gart_size;
u64 gart_start;
u64 gart_end;
@@ -164,5 +167,7 @@ void amdgpu_gmc_vram_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc,
  u64 base);
 void amdgpu_gmc_gart_location(struct amdgpu_device *adev,
  struct amdgpu_gmc *mc);
+void amdgpu_gmc_agp_location(struct amdgpu_device *adev,
+struct amdgpu_gmc *mc);
 
 #endif
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/7] drm/amdgpu: manually map the shadow BOs again

2018-08-29 Thread Christian König
Otherwise we won't be able to use the AGP aperture.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 0cbf651a88a6..de990bdcdd6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -163,10 +163,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo 
*abo, u32 domain)
 
if (domain & AMDGPU_GEM_DOMAIN_GTT) {
places[c].fpfn = 0;
-   if (flags & AMDGPU_GEM_CREATE_SHADOW)
-   places[c].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
-   else
-   places[c].lpfn = 0;
+   places[c].lpfn = 0;
places[c].flags = TTM_PL_FLAG_TT;
if (flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC)
places[c].flags |= TTM_PL_FLAG_WC |
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a3675c7b6190..abe1db4c63f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -346,6 +346,11 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
r = amdgpu_ttm_alloc_gart(>tbo);
if (r)
break;
+   if (bo->shadow) {
+   r = amdgpu_ttm_alloc_gart(>shadow->tbo);
+   if (r)
+   break;
+   }
list_move(_base->vm_status, >relocated);
}
}
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/7] drm/amdgpu: correctly sign extend 48bit addresses v3

2018-08-29 Thread Christian König
Correct sign extend the GMC addresses to 48bit.

v2: sign extending turned out easier than thought.
v3: clean up the defines and move them into amdgpu_gmc.h as well

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 10 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h| 26 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  8 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   |  6 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  7 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 13 ---
 9 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 8c652ecc4f9a..bc5ccfca68c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -135,7 +135,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
.gpuvm_size = min(adev->vm_manager.max_pfn
  << AMDGPU_GPU_PAGE_SHIFT,
- AMDGPU_VA_HOLE_START),
+ AMDGPU_GMC_HOLE_START),
.drm_render_minor = adev->ddev->render->index
};
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index dd734970e167..ef2bfc04b41c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -835,7 +835,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB)
continue;
 
-   va_start = chunk_ib->va_start & AMDGPU_VA_HOLE_MASK;
+   va_start = chunk_ib->va_start & AMDGPU_GMC_HOLE_MASK;
r = amdgpu_cs_find_mapping(p, va_start, , );
if (r) {
DRM_ERROR("IB va_start is invalid\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 71792d820ae0..d30a0838851b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -572,16 +572,16 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
*data,
return -EINVAL;
}
 
-   if (args->va_address >= AMDGPU_VA_HOLE_START &&
-   args->va_address < AMDGPU_VA_HOLE_END) {
+   if (args->va_address >= AMDGPU_GMC_HOLE_START &&
+   args->va_address < AMDGPU_GMC_HOLE_END) {
dev_dbg(>pdev->dev,
"va_address 0x%LX is in VA hole 0x%LX-0x%LX\n",
-   args->va_address, AMDGPU_VA_HOLE_START,
-   AMDGPU_VA_HOLE_END);
+   args->va_address, AMDGPU_GMC_HOLE_START,
+   AMDGPU_GMC_HOLE_END);
return -EINVAL;
}
 
-   args->va_address &= AMDGPU_VA_HOLE_MASK;
+   args->va_address &= AMDGPU_GMC_HOLE_MASK;
 
if ((args->flags & ~valid_flags) && (args->flags & ~prt_flags)) {
dev_dbg(>pdev->dev, "invalid flags combination 0x%08X\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 72fcc9338f5e..48715dd5808a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -30,6 +30,19 @@
 
 #include "amdgpu_irq.h"
 
+/* VA hole for 48bit addresses on Vega10 */
+#define AMDGPU_GMC_HOLE_START  0x8000ULL
+#define AMDGPU_GMC_HOLE_END0x8000ULL
+
+/*
+ * Hardware is programmed as if the hole doesn't exists with start and end
+ * address values.
+ *
+ * This mask is used to remove the upper 16bits of the VA and so come up with
+ * the linear addr value.
+ */
+#define AMDGPU_GMC_HOLE_MASK   0xULL
+
 struct firmware;
 
 /*
@@ -131,6 +144,19 @@ static inline bool amdgpu_gmc_vram_full_visible(struct 
amdgpu_gmc *gmc)
return (gmc->real_vram_size == gmc->visible_vram_size);
 }
 
+/**
+ * amdgpu_gmc_sign_extend - sign extend the given gmc address
+ *
+ * @addr: address to extend
+ */
+static inline uint64_t amdgpu_gmc_sign_extend(uint64_t addr)
+{
+   if (addr >= AMDGPU_GMC_HOLE_START)
+   addr |= AMDGPU_GMC_HOLE_END;
+
+   return addr;
+}
+
 void amdgpu_gmc_get_pde_for_bo(struct amdgpu_bo *bo, int level,
   uint64_t *addr, uint64_t *flags);
 uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 9c4e45936ade..29ac3873eeb0 100644
--- 

[PATCH 7/7] drm/amdgpu: try to make kernel allocations USWC

2018-08-29 Thread Christian König
Not 100% sure if that is a good idea or not. In theory only the writeback BO
should be read most of the time, but who knows?

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index de990bdcdd6c..794c874309d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -255,7 +255,8 @@ int amdgpu_bo_create_reserved(struct amdgpu_device *adev,
bp.byte_align = align;
bp.domain = domain;
bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
+   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+   AMDGPU_GEM_CREATE_CPU_GTT_USWC;
bp.type = ttm_bo_type_kernel;
bp.resv = NULL;
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/7] drm/amdgpu: put GART away from VRAM v2

2018-08-29 Thread Christian König
Always try to put the GART away from where VRAM is.

v2: correctly handle the 4GB limitation

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 265ec6807130..c6bcc4715373 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -116,6 +116,7 @@ void amdgpu_gmc_vram_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc,
  */
 void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc 
*mc)
 {
+   const uint64_t four_gb = 0x1ULL;
u64 size_af, size_bf;
 
mc->gart_size += adev->pm.smu_prv_buffer_size;
@@ -124,8 +125,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
 * the GART base on a 4GB boundary as well.
 */
size_bf = mc->vram_start;
-   size_af = adev->gmc.mc_mask + 1 -
-   ALIGN(mc->vram_end + 1, 0x1ULL);
+   size_af = adev->gmc.mc_mask + 1 - ALIGN(mc->vram_end + 1, four_gb);
 
if (mc->gart_size > max(size_bf, size_af)) {
dev_warn(adev->dev, "limiting GART\n");
@@ -136,7 +136,9 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
(size_af < mc->gart_size))
mc->gart_start = 0;
else
-   mc->gart_start = ALIGN(mc->vram_end + 1, 0x1ULL);
+   mc->gart_start = mc->mc_mask - mc->gart_size + 1;
+
+   mc->gart_start &= four_gb - 1;
mc->gart_end = mc->gart_start + mc->gart_size - 1;
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/7] drm/amdgpu: use the AGP aperture for system memory access v2

2018-08-29 Thread Christian König
Start to use the old AGP aperture for system memory access.

v2: Move that to amdgpu_ttm_alloc_gart

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 58 ++---
 3 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 1d201fd3f4af..65aee57b35fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -79,6 +79,29 @@ uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo)
return pd_addr;
 }
 
+/**
+ * amdgpu_gmc_agp_addr - return the address in the AGP address space
+ *
+ * @tbo: TTM BO which needs the address, must be in GTT domain
+ *
+ * Tries to figure out how to access the BO through the AGP aperture. Returns
+ * AMDGPU_BO_INVALID_OFFSET if that is not possible.
+ */
+uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo)
+{
+   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
+   struct ttm_dma_tt *ttm;
+
+   if (bo->num_pages != 1 || bo->ttm->caching_state == tt_cached)
+   return AMDGPU_BO_INVALID_OFFSET;
+
+   ttm = container_of(bo->ttm, struct ttm_dma_tt, ttm);
+   if (ttm->dma_address[0] + PAGE_SIZE >= adev->gmc.agp_size)
+   return AMDGPU_BO_INVALID_OFFSET;
+
+   return adev->gmc.agp_start + ttm->dma_address[0];
+}
+
 /**
  * amdgpu_gmc_vram_location - try to find VRAM location
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index c9985e7dc9e5..265ca415c64c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -163,6 +163,7 @@ static inline uint64_t amdgpu_gmc_sign_extend(uint64_t addr)
 void amdgpu_gmc_get_pde_for_bo(struct amdgpu_bo *bo, int level,
   uint64_t *addr, uint64_t *flags);
 uint64_t amdgpu_gmc_pd_addr(struct amdgpu_bo *bo);
+uint64_t amdgpu_gmc_agp_addr(struct ttm_buffer_object *bo);
 void amdgpu_gmc_vram_location(struct amdgpu_device *adev, struct amdgpu_gmc 
*mc,
  u64 base);
 void amdgpu_gmc_gart_location(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d9f3201c9e5c..8a158ee922f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1081,41 +1081,49 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
struct ttm_mem_reg tmp;
struct ttm_placement placement;
struct ttm_place placements;
-   uint64_t flags;
+   uint64_t addr, flags;
int r;
 
if (bo->mem.start != AMDGPU_BO_INVALID_OFFSET)
return 0;
 
-   /* allocate GART space */
-   tmp = bo->mem;
-   tmp.mm_node = NULL;
-   placement.num_placement = 1;
-   placement.placement = 
-   placement.num_busy_placement = 1;
-   placement.busy_placement = 
-   placements.fpfn = 0;
-   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
-   placements.flags = (bo->mem.placement & ~TTM_PL_MASK_MEM) |
-   TTM_PL_FLAG_TT;
+   addr = amdgpu_gmc_agp_addr(bo);
+   if (addr != AMDGPU_BO_INVALID_OFFSET) {
+   bo->mem.start = addr >> PAGE_SHIFT;
+   } else {
 
-   r = ttm_bo_mem_space(bo, , , );
-   if (unlikely(r))
-   return r;
+   /* allocate GART space */
+   tmp = bo->mem;
+   tmp.mm_node = NULL;
+   placement.num_placement = 1;
+   placement.placement = 
+   placement.num_busy_placement = 1;
+   placement.busy_placement = 
+   placements.fpfn = 0;
+   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
+   placements.flags = (bo->mem.placement & ~TTM_PL_MASK_MEM) |
+   TTM_PL_FLAG_TT;
+
+   r = ttm_bo_mem_space(bo, , , );
+   if (unlikely(r))
+   return r;
 
-   /* compute PTE flags for this buffer object */
-   flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );
+   /* compute PTE flags for this buffer object */
+   flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );
 
-   /* Bind pages */
-   gtt->offset = ((u64)tmp.start << PAGE_SHIFT) - adev->gmc.gart_start;
-   r = amdgpu_ttm_gart_bind(adev, bo, flags);
-   if (unlikely(r)) {
-   ttm_bo_mem_put(bo, );
-   return r;
+   /* Bind pages */
+   gtt->offset = ((u64)tmp.start << PAGE_SHIFT) -
+   adev->gmc.gart_start;
+   r = amdgpu_ttm_gart_bind(adev, bo, flags);
+   if (unlikely(r)) {
+   ttm_bo_mem_put(bo, );
+   return r;
+   }
+

Re: [PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Paul Menzel
Dear Evan,


On 08/29/18 11:12, Evan Quan wrote:
> Vega20 supports only sclk voltage overdrive. And user can
> only tell three groups of . SMU

Do you mean: The user can only configure three groups of …?

> firmware will recalculate the frequency/voltage curve. Other
> intermediate levles will be stretched/shrunk accordingly.

levels

(A spell checker should detect simple typos.)

> Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165 +-
>  .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
>  3 files changed, 192 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index e2577518b9c6..6e0c8f583521 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
>   * in each power level within a power state.  The pp_od_clk_voltage is used 
> for
>   * this.
>   *
> + * < For Vega10 and previous ASICs >
> + *
>   * Reading the file will display:
>   *
>   * - a list of engine clock levels and voltages labeled OD_SCLK
> @@ -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
>   * "c" (commit) to the file to commit your changes.  If you want to reset to 
> the
>   * default power levels, write "r" (reset) to the file to reset them.
>   *
> + *
> + * < For Vega20 >
> + *
> + * Reading the file will display:
> + *
> + * - three groups of engine clock and voltage offset labeled OD_SCLK
> + *
> + * - a list of valid ranges for above three groups of sclk and voltage offset
> + *   labeled OD_RANGE
> + *
> + * To manually adjust these settings:
> + *
> + * - First select manual using power_dpm_force_performance_level
> + *
> + * - Enter a new value for each group by writing a string that contains
> + *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 20"
> + *   will update group1 sclk to be 500 MHz with voltage increased 20mV

increased *by* 20mV? by or to?

> + *
> + * - When you have edited all of the states as needed, write "c" (commit)
> + *   to the file to commit your changes
> + *
> + * - If you want to reset to the default power levels, write "r" (reset)
> + *   to the file to reset them
> + *
>   */
>  
>  static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> index ececa2f7fe5f..9f6e070a76e0 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> @@ -1096,6 +1096,13 @@ static int vega20_od8_initialize_default_settings(
>   }
>   }
>  
> + ret = vega20_copy_table_from_smc(hwmgr,
> + (uint8_t *)(>od_table),
> + TABLE_OVERDRIVE);
> + PP_ASSERT_WITH_CODE(!ret,
> + "Failed to export over drive table!",

*over drive* or *overdrive*

> + return ret);
> +
>   return 0;
>  }
>  
> @@ -2506,11 +2513,112 @@ static int 
> vega20_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
>   return 0;
>  }
>  
> +static int vega20_odn_edit_dpm_table(struct pp_hwmgr *hwmgr,
> + enum PP_OD_DPM_TABLE_COMMAND type,
> + long *input, uint32_t size)

What is size? Could it be `size_t` or just int?

> +{
> + struct vega20_hwmgr *data =
> + (struct vega20_hwmgr *)(hwmgr->backend);
> + struct vega20_od8_single_setting *od8_settings =
> + data->od8_settings.od8_settings_array;
> + OverDriveTable_t od_table;

Is CamelCase wanted?

> + int32_t input_index, input_clk, input_vol, i;
> + int ret = 0;

Is the initialization needed?

> +
> + PP_ASSERT_WITH_CODE(input, "NULL user input for clock and voltage",
> + return -EINVAL);
> +
> + switch (type) {
> + case PP_OD_EDIT_SCLK_VDDC_TABLE:
> + if (!(od8_settings[OD8_SETTING_GFXCLK_FREQ1].feature_id &&
> + od8_settings[OD8_SETTING_GFXCLK_FREQ2].feature_id &&
> + od8_settings[OD8_SETTING_GFXCLK_FREQ3].feature_id &&
> + od8_settings[OD8_SETTING_GFXCLK_VOLTAGE1].feature_id &&
> + od8_settings[OD8_SETTING_GFXCLK_VOLTAGE2].feature_id &&
> + od8_settings[OD8_SETTING_GFXCLK_VOLTAGE3].feature_id)) {
> + pr_info("Sclk voltage overdrive not supported\n");
> + return 0;
> + }
> +
> + for (i = 0; i < size; i += 3) {
> + if (i + 3 > size) {
> + pr_info("invalid clock voltage input\n");

As this is log level *Info* I’d say it 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread Christian König

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* CPU query - A host operation that allows querying the payload of the
  timeline semaphore.
* CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
 a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
 b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 -
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+   kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.



+
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};


What are those two structures good for and why is the stub fence their base?


+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
  }
  EXPORT_SYMBOL(drm_syncobj_find);
  
-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,

-   struct drm_syncobj_cb *cb,
-   drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj_timeline
+  

[PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread Chunming Zhou
VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * CPU query - A host operation that allows querying the payload of the
 timeline semaphore.
   * CPU wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
b. normal syncobj wait op will create a wait pt with last signal point, and 
this wait PT is only signaled by related signal point PT.
2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/drm_syncobj.c  | 593 -
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 include/drm/drm_syncobj.h  |  78 +--
 include/uapi/drm/drm.h |   1 +
 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
 #include "drm_internal.h"
 #include 
 
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+   kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .release = drm_syncobj_stub_fence_release,
+};
+
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};
+
 /**
  * drm_syncobj_find - lookup and reference a sync object.
  * @file_private: drm file private pointer
@@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
 }
 EXPORT_SYMBOL(drm_syncobj_find);
 
-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
-   struct drm_syncobj_cb *cb,
-   drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj_timeline
+ *syncobj_timeline)
 {
-   cb->func = func;
-   list_add_tail(>node, >cb_list);
+   syncobj_timeline->timeline_context = dma_fence_context_alloc(1);
+   syncobj_timeline->timeline = 0;
+   syncobj_timeline->signal_point = 

[PATCH 4/5] drm: expand replace_fence to support timeline point v2

2018-08-29 Thread Chunming Zhou
we can place a fence to a timeline point after expanded.
v2: change func parameter order

Signed-off-by: Chunming Zhou 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  2 +-
 drivers/gpu/drm/drm_syncobj.c  | 14 --
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/v3d/v3d_gem.c  |  2 +-
 drivers/gpu/drm/vc4/vc4_gem.c  |  2 +-
 include/drm/drm_syncobj.h  |  2 +-
 6 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 0e8cf088175f..1dba9223927a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1151,7 +1151,7 @@ static void amdgpu_cs_post_dependencies(struct 
amdgpu_cs_parser *p)
int i;
 
for (i = 0; i < p->num_post_dep_syncobjs; ++i)
-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], 0, p->fence);
 }
 
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 2dcb60f4c0f7..ab43559398d0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -140,11 +140,13 @@ void drm_syncobj_remove_callback(struct drm_syncobj 
*syncobj,
 /**
  * drm_syncobj_replace_fence - replace fence in a sync object.
  * @syncobj: Sync object to replace fence in
+ * @point: timeline point
  * @fence: fence to install in sync file.
  *
- * This replaces the fence on a sync object.
+ * This replaces the fence on a sync object, or a timeline point fence.
  */
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
+  u64 point,
   struct dma_fence *fence)
 {
struct dma_fence *old_fence;
@@ -206,7 +208,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
   >lock, 0, 0);
dma_fence_signal(>base);
 
-   drm_syncobj_replace_fence(syncobj, >base);
+   drm_syncobj_replace_fence(syncobj, 0, >base);
 
dma_fence_put(>base);
 
@@ -257,7 +259,7 @@ void drm_syncobj_free(struct kref *kref)
struct drm_syncobj *syncobj = container_of(kref,
   struct drm_syncobj,
   refcount);
-   drm_syncobj_replace_fence(syncobj, NULL);
+   drm_syncobj_replace_fence(syncobj, 0, NULL);
kfree(syncobj);
 }
 EXPORT_SYMBOL(drm_syncobj_free);
@@ -297,7 +299,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
}
 
if (fence)
-   drm_syncobj_replace_fence(syncobj, fence);
+   drm_syncobj_replace_fence(syncobj, 0, fence);
 
*out_syncobj = syncobj;
return 0;
@@ -482,7 +484,7 @@ static int drm_syncobj_import_sync_file_fence(struct 
drm_file *file_private,
return -ENOENT;
}
 
-   drm_syncobj_replace_fence(syncobj, fence);
+   drm_syncobj_replace_fence(syncobj, 0, fence);
dma_fence_put(fence);
drm_syncobj_put(syncobj);
return 0;
@@ -964,7 +966,7 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
return ret;
 
for (i = 0; i < args->count_handles; i++)
-   drm_syncobj_replace_fence(syncobjs[i], NULL);
+   drm_syncobj_replace_fence(syncobjs[i], 0, NULL);
 
drm_syncobj_array_free(syncobjs, args->count_handles);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 60dc2a865f5f..7209dd832d39 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2211,7 +2211,7 @@ signal_fence_array(struct i915_execbuffer *eb,
if (!(flags & I915_EXEC_FENCE_SIGNAL))
continue;
 
-   drm_syncobj_replace_fence(syncobj, fence);
+   drm_syncobj_replace_fence(syncobj, 0, fence);
}
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index d25c35c45c33..edb4b3651e1d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -586,7 +586,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
/* Update the return sync object for the */
sync_out = drm_syncobj_find(file_priv, args->out_sync);
if (sync_out) {
-   drm_syncobj_replace_fence(sync_out,
+   drm_syncobj_replace_fence(sync_out, 0,
  >render.base.s_fence->finished);
drm_syncobj_put(sync_out);
}
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 928718b467bd..5b22e996af6c 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -681,7 

[PATCH 3/5] drm: expand drm_syncobj_find_fence to support timeline point v2

2018-08-29 Thread Chunming Zhou
we can fetch timeline point fence after expanded.
v2: The parameter fence is the result of the function and should come last.

Signed-off-by: Chunming Zhou 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 drivers/gpu/drm/drm_syncobj.c  | 5 +++--
 drivers/gpu/drm/v3d/v3d_gem.c  | 4 ++--
 drivers/gpu/drm/vc4/vc4_gem.c  | 2 +-
 include/drm/drm_syncobj.h  | 2 +-
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7a625f3989a0..0e8cf088175f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1062,7 +1062,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
 {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, 0, );
if (r)
return r;
 
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index d4b48fb410a1..2dcb60f4c0f7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -217,6 +217,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
  * drm_syncobj_find_fence - lookup and reference the fence in a sync object
  * @file_private: drm file private pointer
  * @handle: sync object handle to lookup.
+ * @point: timeline point
  * @fence: out parameter for the fence
  *
  * This is just a convenience function that combines drm_syncobj_find() and
@@ -227,7 +228,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
  * dma_fence_put().
  */
 int drm_syncobj_find_fence(struct drm_file *file_private,
-  u32 handle,
+  u32 handle, u64 point,
   struct dma_fence **fence)
 {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
@@ -498,7 +499,7 @@ static int drm_syncobj_export_sync_file(struct drm_file 
*file_private,
if (fd < 0)
return fd;
 
-   ret = drm_syncobj_find_fence(file_private, handle, );
+   ret = drm_syncobj_find_fence(file_private, handle, 0, );
if (ret)
goto err_put_fd;
 
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index e1fcbb4cd0ae..d25c35c45c33 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -521,12 +521,12 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
kref_init(>refcount);
 
ret = drm_syncobj_find_fence(file_priv, args->in_sync_bcl,
->bin.in_fence);
+0, >bin.in_fence);
if (ret == -EINVAL)
goto fail;
 
ret = drm_syncobj_find_fence(file_priv, args->in_sync_rcl,
->render.in_fence);
+0, >render.in_fence);
if (ret == -EINVAL)
goto fail;
 
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 7910b9acedd6..928718b467bd 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -1173,7 +1173,7 @@ vc4_submit_cl_ioctl(struct drm_device *dev, void *data,
 
if (args->in_sync) {
ret = drm_syncobj_find_fence(file_priv, args->in_sync,
-_fence);
+0, _fence);
if (ret)
goto fail;
 
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index e419c79ba94d..ab9055f943c7 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -134,7 +134,7 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
   struct dma_fence *fence);
 int drm_syncobj_find_fence(struct drm_file *file_private,
-  u32 handle,
+  u32 handle, u64 point,
   struct dma_fence **fence);
 void drm_syncobj_free(struct kref *kref);
 int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/5] drm: rename null fence to stub fence in syncobj

2018-08-29 Thread Chunming Zhou
stub fence will be used by timeline syncobj as well.

Signed-off-by: Chunming Zhou 
Cc: Jason Ekstrand 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/drm_syncobj.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index d17ed75ac7e2..d4b48fb410a1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -172,37 +172,37 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
 }
 EXPORT_SYMBOL(drm_syncobj_replace_fence);
 
-struct drm_syncobj_null_fence {
+struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
 };
 
-static const char *drm_syncobj_null_fence_get_name(struct dma_fence *fence)
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
 {
-return "syncobjnull";
+return "syncobjstub";
 }
 
-static bool drm_syncobj_null_fence_enable_signaling(struct dma_fence *fence)
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
 {
 return !dma_fence_is_signaled(fence);
 }
 
-static const struct dma_fence_ops drm_syncobj_null_fence_ops = {
-   .get_driver_name = drm_syncobj_null_fence_get_name,
-   .get_timeline_name = drm_syncobj_null_fence_get_name,
-   .enable_signaling = drm_syncobj_null_fence_enable_signaling,
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
.release = NULL,
 };
 
 static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
 {
-   struct drm_syncobj_null_fence *fence;
+   struct drm_syncobj_stub_fence *fence;
fence = kzalloc(sizeof(*fence), GFP_KERNEL);
if (fence == NULL)
return -ENOMEM;
 
spin_lock_init(>lock);
-   dma_fence_init(>base, _syncobj_null_fence_ops,
+   dma_fence_init(>base, _syncobj_stub_fence_ops,
   >lock, 0, 0);
dma_fence_signal(>base);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/5] drm: fix syncobj null_fence_enable_signaling

2018-08-29 Thread Chunming Zhou
That is certainly totally nonsense. dma_fence_enable_sw_signaling()
is the function who is calling this callback.

Signed-off-by: Chunming Zhou 
Cc: Jason Ekstrand 
Reviewed-by: Christian König 
Acked-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_syncobj.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 3a8837c49639..d17ed75ac7e2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -184,7 +184,6 @@ static const char *drm_syncobj_null_fence_get_name(struct 
dma_fence *fence)
 
 static bool drm_syncobj_null_fence_enable_signaling(struct dma_fence *fence)
 {
-dma_fence_enable_sw_signaling(fence);
 return !dma_fence_is_signaled(fence);
 }
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 03/10] drm/amdgpu: fix amdgpu_gmc_gart_location a little bit

2018-08-29 Thread Christian König

Am 27.08.2018 um 21:03 schrieb Alex Deucher:

On Mon, Aug 27, 2018 at 12:55 PM Christian König
 wrote:

Improve the VCE limitation handling.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 28 -
  1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 72dffa3fd194..8269197df8e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -120,24 +120,22 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)

 mc->gart_size += adev->pm.smu_prv_buffer_size;

-   size_af = adev->gmc.mc_mask - mc->vram_end;
+   /* VCE doesn't like it when BOs cross a 4GB segment, so align
+* the GART base on a 4GB boundary as well.
+*/
 size_bf = mc->vram_start;
-   if (size_bf > size_af) {
-   if (mc->gart_size > size_bf) {
-   dev_warn(adev->dev, "limiting GART\n");
-   mc->gart_size = size_bf;
-   }
+   size_af = adev->gmc.mc_mask + 1 -
+   ALIGN(mc->vram_end + 1, 0x1ULL);

Is it worth limiting this to asics with VCE support?


Actually I think that a couple of blocks might have a problem with that. 
E.g. DCE/DCN seem to use a similar approach as VCE/VCN. Not 100% sure 
about UVD.


Only GFX/Compute/SDMA seems to have a true 48bit interface to the MC.

Regards,
Christian.


   Probably not a
big deal either way.
Reviewed-by: Alex Deucher 


+
+   if (mc->gart_size > max(size_bf, size_af)) {
+   dev_warn(adev->dev, "limiting GART\n");
+   mc->gart_size = max(size_bf, size_af);
+   }
+
+   if (size_bf > size_af)
 mc->gart_start = 0;
-   } else {
-   if (mc->gart_size > size_af) {
-   dev_warn(adev->dev, "limiting GART\n");
-   mc->gart_size = size_af;
-   }
-   /* VCE doesn't like it when BOs cross a 4GB segment, so align
-* the GART base on a 4GB boundary as well.
-*/
+   else
 mc->gart_start = ALIGN(mc->vram_end + 1, 0x1ULL);
-   }
 mc->gart_end = mc->gart_start + mc->gart_size - 1;
 dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
 mc->gart_size >> 20, mc->gart_start, mc->gart_end);
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: correctly sign extend 48bit addresses v3

2018-08-29 Thread Christian König

Am 29.08.2018 um 04:03 schrieb Zhang, Jerry (Junwei):

On 08/28/2018 08:17 PM, Christian König wrote:

Correct sign extend the GMC addresses to 48bit.


Could you explain a bit more why to extend the sign?


The hardware works like this, in other words when bit 47 is set we must 
extend that into bits 48-63.



the address is uint64_t. is if failed in some case?


What do you mean?



> -/* VA hole for 48bit addresses on Vega10 */
> -#define AMDGPU_VA_HOLE_START 0x8000ULL
> -#define AMDGPU_VA_HOLE_END    0x8000ULL

BTW, the hole for 48bit is actually 47 bit left, any background for that?


Well bits start counting at zero. So the 48bit addresses have bits 0-47.

Regards,
Christian.



Regards,
Jerry



v2: sign extending turned out easier than thought.
v3: clean up the defines and move them into amdgpu_gmc.h as well

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 10 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h    | 26 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  8 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   |  6 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  7 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 13 ---
  9 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

index 8c652ecc4f9a..bc5ccfca68c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -135,7 +135,7 @@ void amdgpu_amdkfd_device_init(struct 
amdgpu_device *adev)

  .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
  .gpuvm_size = min(adev->vm_manager.max_pfn
    << AMDGPU_GPU_PAGE_SHIFT,
-  AMDGPU_VA_HOLE_START),
+  AMDGPU_GMC_HOLE_START),
  .drm_render_minor = adev->ddev->render->index
  };

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index dd734970e167..ef2bfc04b41c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -835,7 +835,7 @@ static int amdgpu_cs_vm_handling(struct 
amdgpu_cs_parser *p)

  if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB)
  continue;

-    va_start = chunk_ib->va_start & AMDGPU_VA_HOLE_MASK;
+    va_start = chunk_ib->va_start & AMDGPU_GMC_HOLE_MASK;
  r = amdgpu_cs_find_mapping(p, va_start, , );
  if (r) {
  DRM_ERROR("IB va_start is invalid\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 71792d820ae0..d30a0838851b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -572,16 +572,16 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, 
void *data,

  return -EINVAL;
  }

-    if (args->va_address >= AMDGPU_VA_HOLE_START &&
-    args->va_address < AMDGPU_VA_HOLE_END) {
+    if (args->va_address >= AMDGPU_GMC_HOLE_START &&
+    args->va_address < AMDGPU_GMC_HOLE_END) {
  dev_dbg(>pdev->dev,
  "va_address 0x%LX is in VA hole 0x%LX-0x%LX\n",
-    args->va_address, AMDGPU_VA_HOLE_START,
-    AMDGPU_VA_HOLE_END);
+    args->va_address, AMDGPU_GMC_HOLE_START,
+    AMDGPU_GMC_HOLE_END);
  return -EINVAL;
  }

-    args->va_address &= AMDGPU_VA_HOLE_MASK;
+    args->va_address &= AMDGPU_GMC_HOLE_MASK;

  if ((args->flags & ~valid_flags) && (args->flags & ~prt_flags)) {
  dev_dbg(>pdev->dev, "invalid flags combination 0x%08X\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h

index 0d2c9f65ca13..9d9c7a9f54e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -30,6 +30,19 @@

  #include "amdgpu_irq.h"

+/* VA hole for 48bit addresses on Vega10 */
+#define AMDGPU_GMC_HOLE_START    0x8000ULL
+#define AMDGPU_GMC_HOLE_END    0x8000ULL
+
+/*
+ * Hardware is programmed as if the hole doesn't exists with start 
and end

+ * address values.
+ *
+ * This mask is used to remove the upper 16bits of the VA and so 
come up with

+ * the linear addr value.
+ */
+#define AMDGPU_GMC_HOLE_MASK    0xULL
+
  struct firmware;

  /*
@@ -131,6 +144,19 @@ static inline bool 
amdgpu_gmc_vram_full_visible(struct amdgpu_gmc *gmc)

  return (gmc->real_vram_size == gmc->visible_vram_size);
  }

+/**
+ * amdgpu_gmc_sign_extend - sign extend the given gmc address
+ *
+ * @addr: address to extend
+ */
+static inline uint64_t amdgpu_gmc_sign_extend(uint64_t addr)
+{
+    if (addr >= 

Re: [PATCH] drm/amd/powerplay: expose vega20 OD features

2018-08-29 Thread Quan, Evan
I just sent out a new patch which reuses existing APIs. Let's move future 
discussion there.

Regards,
Evan

From: Quan, Evan
Sent: Wednesday, August 29, 2018 9:12 AM
To: Alex Deucher; Zhu, Rex
Cc: Deucher, Alexander; amd-gfx list; Xu, Feifei; Kuehling, Felix; Zhang, 
Hawking
Subject: Re: [PATCH] drm/amd/powerplay: expose vega20 OD features


Approach 3 seems not workable since we do not know the default value.

I would prefer approach 1 if we have to fit into original APIs. There needs 
three pairs of  to recalculate the voltage curve. Instead of 
silently calucating dpm_mid_leve’s offset in our driver, i would prefer to have 
it specified by user.


Also, we may need to update the description for the OD apis to tell user the 
different behaviors on vega20 and previous ASICs.


/**
 * DOC: pp_od_clk_voltage
 *
 * The amdgpu driver provides a sysfs API for adjusting the clocks and voltages
 * in each power level within a power state.  The pp_od_clk_voltage is used for
 * this.
 *
 * Reading the file will display:
 *
 * - a list of engine clock levels and voltages labeled OD_SCLK
 *
 * - a list of memory clock levels and voltages labeled OD_MCLK
 *
 * - a list of valid ranges for sclk, mclk, and voltage labeled OD_RANGE
 *
 * To manually adjust these settings, first select manual using
 * power_dpm_force_performance_level. Enter a new value for each
 * level by writing a string that contains "s/m level clock voltage" to
 * the file.  E.g., "s 1 500 820" will update sclk level 1 to be 500 MHz
 * at 820 mV; "m 0 350 810" will update mclk level 0 to be 350 MHz at
 * 810 mV.  When you have edited all of the states as needed, write
 * "c" (commit) to the file to commit your changes.  If you want to reset to the
 * default power levels, write "r" (reset) to the file to reset them.
 *
 */

Regards,

Evan


From: Alex Deucher 
Sent: Wednesday, August 29, 2018 1:26:42 AM
To: Zhu, Rex
Cc: Deucher, Alexander; Quan, Evan; amd-gfx list; Xu, Feifei; Kuehling, Felix; 
Zhang, Hawking
Subject: Re: [PATCH] drm/amd/powerplay: expose vega20 OD features

On Tue, Aug 28, 2018 at 12:35 PM Zhu, Rex  wrote:
>
> Got it. Thanks Alex.
>
>
>
> I think we can simplify the od_clk_voltage on vega20.
>
>
>
> When user cat pp_od_clk_voltage
>
> The output can be simplified as:
>
>
>
> Sclk:
>
> 0  clock_value   voltage_offset
>
> 7 clock_valuevoltage_offset
>
> Mclk
>
> 3  mclk_value
>
> Range:
>
> sclk {min, max}
>
> mclk {min, max}
>
> voltage{min, max}
>
>
>
> for sclk,
>
> user can only set the sclk in dpm level0 and level7, the clock in other 
> levels will be recalculated by smu automatically based on user’s setting.
>
>
>
> For voltage, the avfs is enabled, so the voltage in each level is not a 
> unique value.
>
> It is reasonable that export voltage offset to user.
>
> The voltage offset is 0 by default.
>
>
>
> The voltage offset in dpm_0, dpm_mid_level, dpm7_level can be configured.
>
> But in order to keep in consistent with sclk’s od,
>
> We only let user change the offset in dpm0 and dpm7, for dpm_mid_leve’s 
> offset, we can calculate in driver by (offset0 + offset7)/2.
>

Seems pretty good.  How about something like the following:

OD_SCLK:
min: clock_value
max: clock_value
OD_MCLK:
max: clock_value
OD_VDDC:
0: clock_value voltage_offset
1: clock_value voltage_offset
2: clock_value voltage_offset
OD_RANGE:
SCLK: clock_value   clock_value
MCLK: clock_value   clock_value
VDDC: voltage_valuevoltage_value

OD_SCLK:
min: 300Mhz
max: 1200Mhz
OD_MCLK:
max: 1500Mhz
OD_VDDC:
0: 300Mhz 0mV
1: 600Mhz 0mV
2: 1200Mhz 0mV
OD_RANGE:
SCLK: 300MHz   1200MHz
MCLK: 300MHz   1500MHz
VDDC: 700mV1200mV

That would give you the fullest control.  Alternatively, we could
simplify this a bit by deriving the mid point in the voltage curve.
E.g.,

OD_SCLK:
min: clock_value
max: clock_value
OD_MCLK:
max: clock_value
OD_VDDC:
min: clock_value voltage_offset
max: clock_value voltage_offset
OD_RANGE:
SCLK: clock_value   clock_value
MCLK: clock_value   clock_value
VDDC: voltage_valuevoltage_value

OD_SCLK:
min: 300Mhz
max: 1200Mhz
OD_MCLK:
max 1500Mhz
OD_VDDC:
min: 300Mhz 0mV
max: 1200Mhz 0mV
OD_RANGE:
SCLK: 300MHz   1200MHz
MCLK: 300MHz   1500MHz
VDDC: 700mV1200mV

I think we could also make the interface backwards compatible.  E.g.,

OD_SCLK:
0 300Mhz 900mV
7 1200Mhz 1000mV
OD_MCLK:
3 1500Mhz 900mV
OD_RANGE:
SCLK: 300MHz   1200MHz
MCLK: 300MHz   1500MHz
VDDC: 700mV1200mV

We can ignore the voltage for OD_MCLK and then calculate the offsets
based on the absolute value specified in OD_SCLK (e.g., default value
- specified value = offset), then calculate the midpoint based on the
end points.  It would be backwards compatible, but somewhat more
confusing for users.  Probably one of the first two would be better.
Thoughts?

Alex

>
>
> For mclk,
>
> Smu 

[PATCH] drm/amd/powerplay: added vega20 overdrive support

2018-08-29 Thread Evan Quan
Vega20 supports only sclk voltage overdrive. And user can
only tell three groups of . SMU
firmware will recalculate the frequency/voltage curve. Other
intermediate levles will be stretched/shrunk accordingly.

Change-Id: I403cb38a95863db664cf06d030ac42a19bff6b33
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  26 +++
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 165 +-
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.h|   2 +
 3 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index e2577518b9c6..6e0c8f583521 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -474,6 +474,8 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
  * in each power level within a power state.  The pp_od_clk_voltage is used for
  * this.
  *
+ * < For Vega10 and previous ASICs >
+ *
  * Reading the file will display:
  *
  * - a list of engine clock levels and voltages labeled OD_SCLK
@@ -491,6 +493,30 @@ static ssize_t amdgpu_set_pp_table(struct device *dev,
  * "c" (commit) to the file to commit your changes.  If you want to reset to 
the
  * default power levels, write "r" (reset) to the file to reset them.
  *
+ *
+ * < For Vega20 >
+ *
+ * Reading the file will display:
+ *
+ * - three groups of engine clock and voltage offset labeled OD_SCLK
+ *
+ * - a list of valid ranges for above three groups of sclk and voltage offset
+ *   labeled OD_RANGE
+ *
+ * To manually adjust these settings:
+ *
+ * - First select manual using power_dpm_force_performance_level
+ *
+ * - Enter a new value for each group by writing a string that contains
+ *   "s group_index clock voltage_offset" to the file.  E.g., "s 0 500 20"
+ *   will update group1 sclk to be 500 MHz with voltage increased 20mV
+ *
+ * - When you have edited all of the states as needed, write "c" (commit)
+ *   to the file to commit your changes
+ *
+ * - If you want to reset to the default power levels, write "r" (reset)
+ *   to the file to reset them
+ *
  */
 
 static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index ececa2f7fe5f..9f6e070a76e0 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -1096,6 +1096,13 @@ static int vega20_od8_initialize_default_settings(
}
}
 
+   ret = vega20_copy_table_from_smc(hwmgr,
+   (uint8_t *)(>od_table),
+   TABLE_OVERDRIVE);
+   PP_ASSERT_WITH_CODE(!ret,
+   "Failed to export over drive table!",
+   return ret);
+
return 0;
 }
 
@@ -2506,11 +2513,112 @@ static int 
vega20_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
return 0;
 }
 
+static int vega20_odn_edit_dpm_table(struct pp_hwmgr *hwmgr,
+   enum PP_OD_DPM_TABLE_COMMAND type,
+   long *input, uint32_t size)
+{
+   struct vega20_hwmgr *data =
+   (struct vega20_hwmgr *)(hwmgr->backend);
+   struct vega20_od8_single_setting *od8_settings =
+   data->od8_settings.od8_settings_array;
+   OverDriveTable_t od_table;
+   int32_t input_index, input_clk, input_vol, i;
+   int ret = 0;
+
+   PP_ASSERT_WITH_CODE(input, "NULL user input for clock and voltage",
+   return -EINVAL);
+
+   switch (type) {
+   case PP_OD_EDIT_SCLK_VDDC_TABLE:
+   if (!(od8_settings[OD8_SETTING_GFXCLK_FREQ1].feature_id &&
+   od8_settings[OD8_SETTING_GFXCLK_FREQ2].feature_id &&
+   od8_settings[OD8_SETTING_GFXCLK_FREQ3].feature_id &&
+   od8_settings[OD8_SETTING_GFXCLK_VOLTAGE1].feature_id &&
+   od8_settings[OD8_SETTING_GFXCLK_VOLTAGE2].feature_id &&
+   od8_settings[OD8_SETTING_GFXCLK_VOLTAGE3].feature_id)) {
+   pr_info("Sclk voltage overdrive not supported\n");
+   return 0;
+   }
+
+   for (i = 0; i < size; i += 3) {
+   if (i + 3 > size) {
+   pr_info("invalid clock voltage input\n");
+   return 0;
+   }
+
+   input_index = input[i];
+   input_clk = input[i + 1];
+   input_vol = input[i + 2];
+   if (input_index > 2 ||
+   input_clk < od8_settings[OD8_SETTING_GFXCLK_FREQ1 + 
input_index * 2].min_value ||
+   input_clk > od8_settings[OD8_SETTING_GFXCLK_FREQ1 + 
input_index * 2].max_value ||
+   input_vol < 

Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-29 Thread Zhang, Jerry (Junwei)

On 08/29/2018 04:53 PM, Christian König wrote:

Am 29.08.2018 um 04:52 schrieb Zhang, Jerry (Junwei):

On 08/28/2018 08:40 PM, Emily Deng wrote:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.


IMO, the evicted bo should be created previously.
On BO creation we will add it to the bo->va as below:

amdgpu_gem_create_ioctl
  drm_gem_handle_create
amdgpu_gem_object_open


And here is the problem. Between creating the BO and opening it in the client 
the BO can be evicted.


Thanks to explain that.

It's the key point, falling in Murphy's law as well.

Jerry



That's what Emily's patch is handling here.

Christian.


amdgpu_vm_bo_add
amdgpu_vm_bo_base_init
  list_add_tail(>bo_list, >va)

Then it could be set moved in bo invalidate when evicting.

could you provide a bit more backgroud about the issue?
looks a per vm bo is evicted and a new same bo created.

Jerry


3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
   * is validated on next vm use to avoid fault.
   * */
  list_move_tail(>vm_status, >evicted);
+base->moved = true;
  }

  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdkfd: Release an acquired process vm

2018-08-29 Thread Christian König

Am 28.08.2018 um 19:34 schrieb Oak Zeng:

For compute vm acquired from amdgpu, vm.pasid is managed
by kfd. Decouple pasid from such vm on process destroy
to avoid duplicate pasid release.

Signed-off-by: Oak Zeng 


One minor nit pick below, apart from that Acked-by: Christian König 




---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 19 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 20 
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  4 +++-
  drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  1 +
  9 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index e1f6454..b414889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -165,6 +165,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev 
*kgd,
  void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
struct amdgpu_vm *vm);
  void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm);
+void amdgpu_amdkfd_gpuvm_release_process_vm(struct kgd_dev *kgd, void *vm);
  uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm);
  int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct kgd_dev *kgd, uint64_t va, uint64_t size,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index ea79908..03a83d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -204,6 +204,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
.acquire_process_vm = amdgpu_amdkfd_gpuvm_acquire_process_vm,
.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+   .release_process_vm = amdgpu_amdkfd_gpuvm_release_process_vm,
.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
.set_vm_context_page_table_base = set_vm_context_page_table_base,
.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 19dd665..b2b9c5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -164,6 +164,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
.acquire_process_vm = amdgpu_amdkfd_gpuvm_acquire_process_vm,
.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+   .release_process_vm = amdgpu_amdkfd_gpuvm_release_process_vm,
.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
.set_vm_context_page_table_base = set_vm_context_page_table_base,
.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 1db60aa..3722bbd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -201,6 +201,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
.acquire_process_vm = amdgpu_amdkfd_gpuvm_acquire_process_vm,
.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+   .release_process_vm = amdgpu_amdkfd_gpuvm_release_process_vm,
.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
.set_vm_context_page_table_base = set_vm_context_page_table_base,
.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0980a1f..bc4413f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1117,6 +1117,25 @@ void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct 
kgd_dev *kgd, void *vm)
kfree(vm);
  }
  
+void amdgpu_amdkfd_gpuvm_release_process_vm(struct kgd_dev *kgd, void *vm)

+{
+   struct amdgpu_device *adev = get_amdgpu_device(kgd);
+struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
+
+   if (WARN_ON(!kgd || !vm))
+return;
+
+pr_debug("Releasing process vm %p\n", vm);
+
+/* The original pasid of amdgpu vm has already been
+ * released during making a amdgpu vm to a compute vm
+ * The current pasid 

Re: [PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

2018-08-29 Thread Michel Dänzer
On 2018-08-29 10:57 a.m., Christian König wrote:
> Am 29.08.2018 um 09:52 schrieb Michel Dänzer:
>> On 2018-08-28 7:03 p.m., Michel Dänzer wrote:
>>> On 2018-08-28 11:14 a.m., Michel Dänzer wrote:
 On 2018-08-22 9:52 a.m., Huang Rui wrote:
> The new bulk moving functionality is ready, the overhead of moving
> PD/PT bos to
> LRU is fixed. So move them on LRU again.
>
> Signed-off-by: Huang Rui 
> Tested-by: Mike Lothian 
> Tested-by: Dieter Nützel 
> Acked-by: Chunming Zhou 
> Reviewed-by: Junwei Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index db1f28a..d195a3d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct
> amdgpu_device *adev,
>  struct amdgpu_vm_bo_base,
>  vm_status);
>   bo_base->moved = false;
> -    list_del_init(_base->vm_status);
> +    list_move(_base->vm_status, >idle);
>     bo = bo_base->bo->parent;
>   if (!bo)
>
 Since this change, I'm getting various badness when running piglit
 using
 radeonsi on Bonaire, see the attached dmesg excerpt.

 Reverting just this change on top of current amd-staging-drm-next
 avoids
 the problem.

 Looks like some list manipulation isn't sufficiently protected against
 concurrent execution?
>>> KASAN pointed me to one issue:
>>> https://patchwork.freedesktop.org/patch/246212/
>>>
>>> However, this doesn't fully fix the problem.
>> Ray, any ideas yet for solving this? If not, let's revert this change
>> for now.
> 
> I've gone over this multiple times now as well, but can't find anything
> obvious wrong either.

Thanks for looking into it.


> If we don't have any more ideas I would say revert it for now and try to
> debug it further.

Yep.


> BTW: Any idea how to force the issue?

Not specifically. It happens reliably and pretty quickly for me when
running the piglit gpu profile.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

2018-08-29 Thread Christian König

Am 29.08.2018 um 09:52 schrieb Michel Dänzer:

On 2018-08-28 7:03 p.m., Michel Dänzer wrote:

On 2018-08-28 11:14 a.m., Michel Dänzer wrote:

On 2018-08-22 9:52 a.m., Huang Rui wrote:

The new bulk moving functionality is ready, the overhead of moving PD/PT bos to
LRU is fixed. So move them on LRU again.

Signed-off-by: Huang Rui 
Tested-by: Mike Lothian 
Tested-by: Dieter Nützel 
Acked-by: Chunming Zhou 
Reviewed-by: Junwei Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index db1f28a..d195a3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
*adev,
   struct amdgpu_vm_bo_base,
   vm_status);
bo_base->moved = false;
-   list_del_init(_base->vm_status);
+   list_move(_base->vm_status, >idle);
  
  		bo = bo_base->bo->parent;

if (!bo)


Since this change, I'm getting various badness when running piglit using
radeonsi on Bonaire, see the attached dmesg excerpt.

Reverting just this change on top of current amd-staging-drm-next avoids
the problem.

Looks like some list manipulation isn't sufficiently protected against
concurrent execution?

KASAN pointed me to one issue:
https://patchwork.freedesktop.org/patch/246212/

However, this doesn't fully fix the problem.

Ray, any ideas yet for solving this? If not, let's revert this change
for now.


I've gone over this multiple times now as well, but can't find anything 
obvious wrong either.


If we don't have any more ideas I would say revert it for now and try to 
debug it further.


BTW: Any idea how to force the issue?

Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-29 Thread Christian König

Am 29.08.2018 um 04:52 schrieb Zhang, Jerry (Junwei):

On 08/28/2018 08:40 PM, Emily Deng wrote:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.


IMO, the evicted bo should be created previously.
On BO creation we will add it to the bo->va as below:

amdgpu_gem_create_ioctl
  drm_gem_handle_create
    amdgpu_gem_object_open


And here is the problem. Between creating the BO and opening it in the 
client the BO can be evicted.


That's what Emily's patch is handling here.

Christian.


amdgpu_vm_bo_add
    amdgpu_vm_bo_base_init
  list_add_tail(>bo_list, >va)

Then it could be set moved in bo invalidate when evicting.

could you provide a bit more backgroud about the issue?
looks a per vm bo is evicted and a new same bo created.

Jerry


3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

   * is validated on next vm use to avoid fault.
   * */
  list_move_tail(>vm_status, >evicted);
+    base->moved = true;
  }

  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 01/10] drm/amdgpu: use only the lower address space on GMC9

2018-08-29 Thread Christian König

Completely agree with Felix here.

It makes system memory access slightly simpler, but I would say that you 
accidentally corrupt the GART table and that you accidentally corrupt 
the the AGP aperture is equally likely.


Regards,
Christian.

Am 28.08.2018 um 20:12 schrieb Felix Kuehling:

The GPU always has access to the entire guest physical address space.
You can just program arbitrary addresses into page table entries and set
the system bit. That's how GART and GPUVM mappings work. They will not
go through the AGP aperture. And there is no mechanism (other than an
IOMMU) to protect system memory from GPU access.

Regards,
   Felix


On 2018-08-28 07:41 AM, Christian König wrote:

Well that is indeed true, but we still have IOMMU between the GPU and
the system memory.

So we should still catch issues when something goes terrible wrong.

Additional to that only the system domain, e.g. kernel copies, page
table updates etc are allowed to use it.

Regards,
Christian.

Am 28.08.2018 um 09:06 schrieb Xiao, Jack:

I mean it has risk to make GPU allowed to access to most system
memory without explicit claiming, it's easier to mask problem.

Regards,
Jack

-Original Message-
From: Koenig, Christian
Sent: Tuesday, August 28, 2018 2:46 PM
To: Xiao, Jack ; Kuehling, Felix
; Christian König
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address
space on GMC9


This series patches seems to make AGP aperture allowed to access any
system memory (16GB) bypass GPU VM protection.

The system aperture should only be active in the system domain, or
otherwise applications would have access to local memory as well.

There are some bits in the VM registers to enable/disable that, but
when we would have that setting incorrect we would see quite a bunch
of other problems.

Might still be a good idea to double check if all the bits are setup
correctly.

Regards,
Christian.

Am 28.08.2018 um 07:31 schrieb Xiao, Jack:

This series patches seems to make AGP aperture allowed to access any
system memory (16GB) bypass GPU VM protection.
If someone made a wrong logic requesting an illegal address which
occasionally was located inside AGP aperture, but without any VM
protection, the illegal address would be finally translated into a
system memory address; if GPU read/wrote such system memory address,
the system memory address might belong to kernel or any user
application, the r/w operation would result in any unpredictable issue.
The most important is that such kind of issue is so hard to be
addressed.
Is it worth doing this, but exposing risk?

Regards,
Jack

-Original Message-
From: amd-gfx  On Behalf Of
Felix Kuehling
Sent: Tuesday, August 28, 2018 3:03 AM
To: Christian König ;
amd-gfx@lists.freedesktop.org; Koenig, Christian

Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address
space on GMC9

The point of this series seems to be to allow access to small system
memory BOs (one page) without a GART mapping. I'm guessing that
reduces pressure on the GART and removes the need for HDP and TLB
flushes. Why does Patch 10 only enable that on GFXv9? Is there less
benefit on older chips?

Is this related to your recent changes to allow page tables in
system memory?

See my replies to patch 6 and 8. Other than that, the series is
Acked-by: Felix Kuehling 

Regards,
     Felix


On 2018-08-27 12:53 PM, Christian König wrote:

Only use the lower address space on GMC9 for the system domain.
Otherwise we would need to sign extend GMC addresses.

Signed-off-by: Christian König 
---
    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
    1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index e44b5191735d..d982956c8329 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -938,11 +938,10 @@ static int gmc_v9_0_sw_init(void *handle)
    if (r)
    return r;
    -    /* Set the internal MC address mask
- * This is the max address of the GPU's
- * internal address space.
+    /* Use only the lower range for the internal MC address mask.
This is
+ * the max address of the GPU's internal address space.
     */
-    adev->gmc.mc_mask = 0xULL; /* 48 bit MC */
+    adev->gmc.mc_mask = 0x7fffULL;
       /* set DMA mask + need_dma32 flags.
     * PCIE - can handle 44-bits.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v5 5/5] drm/amdgpu: move PD/PT bos on LRU again

2018-08-29 Thread Michel Dänzer
On 2018-08-28 7:03 p.m., Michel Dänzer wrote:
> On 2018-08-28 11:14 a.m., Michel Dänzer wrote:
>> On 2018-08-22 9:52 a.m., Huang Rui wrote:
>>> The new bulk moving functionality is ready, the overhead of moving PD/PT 
>>> bos to
>>> LRU is fixed. So move them on LRU again.
>>>
>>> Signed-off-by: Huang Rui 
>>> Tested-by: Mike Lothian 
>>> Tested-by: Dieter Nützel 
>>> Acked-by: Chunming Zhou 
>>> Reviewed-by: Junwei Zhang 
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index db1f28a..d195a3d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -1107,7 +1107,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
>>> *adev,
>>>struct amdgpu_vm_bo_base,
>>>vm_status);
>>> bo_base->moved = false;
>>> -   list_del_init(_base->vm_status);
>>> +   list_move(_base->vm_status, >idle);
>>>  
>>> bo = bo_base->bo->parent;
>>> if (!bo)
>>>
>>
>> Since this change, I'm getting various badness when running piglit using
>> radeonsi on Bonaire, see the attached dmesg excerpt.
>>
>> Reverting just this change on top of current amd-staging-drm-next avoids
>> the problem.
>>
>> Looks like some list manipulation isn't sufficiently protected against
>> concurrent execution?
> 
> KASAN pointed me to one issue:
> https://patchwork.freedesktop.org/patch/246212/
> 
> However, this doesn't fully fix the problem.

Ray, any ideas yet for solving this? If not, let's revert this change
for now.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add missing CHIP_HAINAN in amdgpu_ucode_get_load_type

2018-08-29 Thread Michel Dänzer
On 2018-08-28 9:19 p.m., Alex Deucher wrote:
> This caused a confusing error message, but there is functionally
> no problem since the default method is DIRECT.
> 
> Signed-off-by: Alex Deucher 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index b419d6e33b3a..a942fd28dae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -277,6 +277,7 @@ amdgpu_ucode_get_load_type(struct amdgpu_device *adev, 
> int load_type)
>   case CHIP_PITCAIRN:
>   case CHIP_VERDE:
>   case CHIP_OLAND:
> + case CHIP_HAINAN:
>   return AMDGPU_FW_LOAD_DIRECT;
>  #endif
>  #ifdef CONFIG_DRM_AMDGPU_CIK
> 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 01/10] drm/amdgpu: use only the lower address space on GMC9

2018-08-29 Thread Xiao, Jack
> Well that is indeed true, but we still have IOMMU between the GPU and the 
> system memory.
Some platform hasn't IOMMU, or IOMMU isn’t enabled by default. 
If such kind of issue was reported by customer, it was so hard to narrow down.

> Additional to that only the system domain, e.g. kernel copies, page table 
> updates etc are allowed to use it.
What was the motivation of this series patch? 
I guess it is for performance improvement, right? But AGP aperture memory is 
translated as mtype=UC same with GART (mtype=UC). 
The more overhead on GART is gpuVM pagetable walker, but I guess the overhead 
is little.

Regards,
Jack

-Original Message-
From: Christian König  
Sent: Tuesday, August 28, 2018 7:42 PM
To: Xiao, Jack ; Koenig, Christian 
; Kuehling, Felix ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address space on GMC9

Well that is indeed true, but we still have IOMMU between the GPU and the 
system memory.

So we should still catch issues when something goes terrible wrong.

Additional to that only the system domain, e.g. kernel copies, page table 
updates etc are allowed to use it.

Regards,
Christian.

Am 28.08.2018 um 09:06 schrieb Xiao, Jack:
> I mean it has risk to make GPU allowed to access to most system memory 
> without explicit claiming, it's easier to mask problem.
>
> Regards,
> Jack
>
> -Original Message-
> From: Koenig, Christian
> Sent: Tuesday, August 28, 2018 2:46 PM
> To: Xiao, Jack ; Kuehling, Felix 
> ; Christian König 
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address 
> space on GMC9
>
>> This series patches seems to make AGP aperture allowed to access any system 
>> memory (16GB) bypass GPU VM protection.
> The system aperture should only be active in the system domain, or otherwise 
> applications would have access to local memory as well.
>
> There are some bits in the VM registers to enable/disable that, but when we 
> would have that setting incorrect we would see quite a bunch of other 
> problems.
>
> Might still be a good idea to double check if all the bits are setup 
> correctly.
>
> Regards,
> Christian.
>
> Am 28.08.2018 um 07:31 schrieb Xiao, Jack:
>> This series patches seems to make AGP aperture allowed to access any system 
>> memory (16GB) bypass GPU VM protection.
>> If someone made a wrong logic requesting an illegal address which 
>> occasionally was located inside AGP aperture, but without any VM 
>> protection, the illegal address would be finally translated into a 
>> system memory address; if GPU read/wrote such system memory address, the 
>> system memory address might belong to kernel or any user application, the 
>> r/w operation would result in any unpredictable issue.
>> The most important is that such kind of issue is so hard to be addressed.
>> Is it worth doing this, but exposing risk?
>>
>> Regards,
>> Jack
>>
>> -Original Message-
>> From: amd-gfx  On Behalf Of 
>> Felix Kuehling
>> Sent: Tuesday, August 28, 2018 3:03 AM
>> To: Christian König ;
>> amd-gfx@lists.freedesktop.org; Koenig, Christian 
>> 
>> Subject: Re: [PATCH 01/10] drm/amdgpu: use only the lower address 
>> space on GMC9
>>
>> The point of this series seems to be to allow access to small system memory 
>> BOs (one page) without a GART mapping. I'm guessing that reduces pressure on 
>> the GART and removes the need for HDP and TLB flushes. Why does Patch 10 
>> only enable that on GFXv9? Is there less benefit on older chips?
>>
>> Is this related to your recent changes to allow page tables in system memory?
>>
>> See my replies to patch 6 and 8. Other than that, the series is
>> Acked-by: Felix Kuehling 
>>
>> Regards,
>>     Felix
>>
>>
>> On 2018-08-27 12:53 PM, Christian König wrote:
>>> Only use the lower address space on GMC9 for the system domain.
>>> Otherwise we would need to sign extend GMC addresses.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
>>>1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index e44b5191735d..d982956c8329 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -938,11 +938,10 @@ static int gmc_v9_0_sw_init(void *handle)
>>> if (r)
>>> return r;
>>>
>>> -   /* Set the internal MC address mask
>>> -* This is the max address of the GPU's
>>> -* internal address space.
>>> +   /* Use only the lower range for the internal MC address mask. This is
>>> +* the max address of the GPU's internal address space.
>>>  */
>>> -   adev->gmc.mc_mask = 0xULL; /* 48 bit MC */
>>> +   adev->gmc.mc_mask = 0x7fffULL;
>>>
>>> /* set DMA mask + need_dma32 flags.
>>>  * PCIE - can handle 44-bits.
>> ___
>> amd-gfx