from:"Deucher, Alexander"

RE: [PATCH v2] ASoC: amd: add support for rt5682 codec in machine driver

2021-03-15 Thread Deucher, Alexander

[AMD Public Use]



> -Original Message-
> From: Vijendar Mukunda 
> Sent: Saturday, March 13, 2021 12:16 AM
> To: broo...@kernel.org; alsa-de...@alsa-project.org
> Cc: Liang, Prike ; Deucher, Alexander
> ; Vemuri, Murali-krishna  krishna.vem...@amd.com>; Arya, Virendra-Pratap  pratap.a...@amd.com>; Mukunda, Vijendar
> ; Liam Girdwood ;
> Jaroslav Kysela ; Takashi Iwai ; Pierre-
> Louis Bossart ; Arnd Bergmann
> ; RAVULAPATI, VISHNU VARDHAN RAO
> ; Kuninori Morimoto
> ; Chuhong Yuan
> ; open list 
> Subject: [PATCH v2] ASoC: amd: add support for rt5682 codec in machine
> driver
> 
> Add support for RT5682 codec in machine driver.
> 
> Signed-off-by: Vijendar Mukunda 
> ---
> v1->v2 : updated kconfig, spdx license, removed unnecessary
> v1->initialization and
> updated comment
>  sound/soc/amd/Kconfig|   5 +-
>  sound/soc/amd/acp-da7219-max98357a.c | 380
> +++
>  2 files changed, 343 insertions(+), 42 deletions(-)
> 
> diff --git a/sound/soc/amd/Kconfig b/sound/soc/amd/Kconfig index
> a6ce000..43f5d29 100644
> --- a/sound/soc/amd/Kconfig
> +++ b/sound/soc/amd/Kconfig
> @@ -5,14 +5,15 @@ config SND_SOC_AMD_ACP
>This option enables ACP DMA support on AMD platform.
> 
>  config SND_SOC_AMD_CZ_DA7219MX98357_MACH
> - tristate "AMD CZ support for DA7219 and MAX9835"
> + tristate "AMD CZ support for DA7219, RT5682 and MAX9835"
>   select SND_SOC_DA7219
> + select SND_SOC_RT5682_I2C
>   select SND_SOC_MAX98357A
>   select SND_SOC_ADAU7002
>   select REGULATOR
>   depends on SND_SOC_AMD_ACP && I2C && GPIOLIB
>   help
> -  This option enables machine driver for DA7219 and MAX9835.
> +  This option enables machine driver for DA7219, RT5682 and
> MAX9835.
> 
>  config SND_SOC_AMD_CZ_RT5645_MACH
>   tristate "AMD CZ support for RT5645"
> diff --git a/sound/soc/amd/acp-da7219-max98357a.c b/sound/soc/amd/acp-
> da7219-max98357a.c
> index 849288d..9b3520e 100644
> --- a/sound/soc/amd/acp-da7219-max98357a.c
> +++ b/sound/soc/amd/acp-da7219-max98357a.c
> @@ -1,27 +1,8 @@
> -/*
> - * Machine driver for AMD ACP Audio engine using DA7219 & MAX98357
> codec
> - *
> - * Copyright 2017 Advanced Micro Devices, Inc.
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the
> "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice shall be included in
> - * all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> EVENT SHALL
> - * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> DAMAGES OR
> - * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE,
> - * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
> THE USE OR
> - * OTHER DEALINGS IN THE SOFTWARE.
> - *
> - */
> +// SPDX-License-Identifier: GPL-2.0+

You are changing the license here.  The original license was MIT.  This should 
be:
SPDX-License-Identifier: MIT

Alex

> +//
> +// Machine driver for AMD ACP Audio engine using DA7219, RT5682 &
> +MAX98357 codec // //Copyright 2017-2021 Advanced Micro Devices, Inc.
> 
>  #include 
>  #include 
> @@ -41,14 +22,19 @@
>  #include "acp.h"
>  #include "../codecs/da7219.h"
>  #include "../codecs/da7219-aad.h"
> +#include "../codecs/rt5682.h"
> 
>  #define CZ_PLAT_CLK 4800
>  #define DUAL_CHANNEL 2
> +#define RT5682_PLL_FREQ (48000 * 512)
> 
>  static struct snd_soc_jack cz_jack;
>  static struct clk *da7219_dai_wclk;
>  static struct clk *da7219_dai_bclk;
> -extern bool bt_uart_enable;
> +static struct clk *rt5682_dai_wclk;
> +static struct clk *rt5682_dai_bclk;
> +extern int bt_uart_enable;
> +void *soc_is_rltk_max(struct device *dev);
> 
>  static int cz_da7219_init(struct snd_soc_pcm_runtime *rtd)  { @@ -128,6
> +114,88 @@ static void da7219_clk_disable(void)
>   clk_disable_unprepare(da7219_dai_bclk);
>  }
> 
> +static in

RE: amdgpu, WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xd5/0x100

2021-03-12 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Borislav Petkov 
> Sent: Friday, March 12, 2021 1:15 PM
> To: amd-...@lists.freedesktop.org
> Cc: Wentland, Harry ; Li, Sun peng (Leo)
> ; Deucher, Alexander
> ; Koenig, Christian
> ; lkml ; x86-ml
> 
> Subject: amdgpu, WARNING: CPU: 12 PID: 389 at
> arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xd5/0x100
> 
> Hi folks,
> 
> I get the below on -rc2+tip/master. I added printks to your FPU macros:
> 
> ---
> diff --git a/drivers/gpu/drm/amd/display/dc/os_types.h
> b/drivers/gpu/drm/amd/display/dc/os_types.h
> index 126c2f3a4dd3..49629dc03f99 100644
> --- a/drivers/gpu/drm/amd/display/dc/os_types.h
> +++ b/drivers/gpu/drm/amd/display/dc/os_types.h
> @@ -53,8 +53,18 @@
>  #if defined(CONFIG_DRM_AMD_DC_DCN)
>  #if defined(CONFIG_X86)
>  #include 
> -#define DC_FP_START() kernel_fpu_begin() -#define DC_FP_END()
> kernel_fpu_end()
> +#define DC_FP_START()\
> +({   \
> + pr_emerg("%s: DC_FP_START\n", __func__);\
> + kernel_fpu_begin(); \
> +})
> +
> +#define DC_FP_END()  \
> +({   \
> + pr_emerg("%s: DC_FP_END\n", __func__);  \
> + kernel_fpu_end();   \
> +})
> +
>  #elif defined(CONFIG_PPC64)
>  #include 
>  #include 
> 
> and I get wrong nesting of FPU usage with amdgpu:
> 
> ...
> [2.480080] [drm] reserve 0x40 from 0xf41f80 for PSP TMR
> [2.577011] amdgpu :06:00.0: amdgpu: RAS: optional ras ta ucode is not
> available
> [2.585556] amdgpu :06:00.0: amdgpu: RAP: optional rap ta ucode is not
> available
> [2.585567] amdgpu :06:00.0: amdgpu: SECUREDISPLAY: securedisplay
> ta ucode is not available
> [2.586024] amdgpu :06:00.0: amdgpu: SMU is initialized successfully!
> [2.587396] [drm] kiq ring mec 2 pipe 1 q 0
> [2.588930] [drm] Display Core initialized with v3.2.122!
> [2.601665] [drm] DMUB hardware initialized: version=0x0100
> [2.620813] snd_hda_intel :06:00.1: bound :06:00.0 (ops
> amdgpu_dm_audio_component_bind_ops [amdgpu])
> [2.698383] input: TPPS/2 Elan TrackPoint as
> /devices/platform/i8042/serio1/serio2/input/input15
> [2.713147] [drm] VCN decode and encode initialized successfully(under
> DPG Mode).
> [2.713180] [drm] JPEG decode initialized successfully.
> [2.715003] kfd kfd: Allocated 3969056 bytes on gart
> [2.715251] Virtual CRAT table created for GPU
> [2.715412] amdgpu: Topology: Add dGPU node [0x1636:0x1002]
> [2.715421] kfd kfd: added device 1002:1636
> [2.715428] amdgpu :06:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 18,
> active_cu_number 27
> [2.716496] [drm] fb mappable at 0x410CE
> [2.716510] [drm] vram apper at 0x41000
> [2.716515] [drm] size 8294400
> [2.716518] [drm] fb depth is 24
> [2.716522] [drm]pitch is 7680
> [2.716710] fbcon: amdgpudrmfb (fb0) is primary device
> [2.716922] dcn21_validate_bandwidth: DC_FP_START
> [2.716969] patch_bounding_box: DC_FP_START
> 
> This should not happen. You need DC_FP_END before the next
> DC_FP_START because FPU usage cannot nest.
> 
> But who knows, maybe this is fixed already...

Should be fixed with these patches:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=15e8b95d5f7509e0b09289be8c422c459c9f0412
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=680174cfd1e1cea70a8f30ccb44d8fbdf996018e

Alex

> 
> [2.716973] [ cut here ]
> [2.716974] WARNING: CPU: 12 PID: 389 at arch/x86/kernel/fpu/core.c:129
> kernel_fpu_begin_mask+0xd5/0x100
> [2.716986] Modules linked in: joydev edac_mce_amd edac_core iwlmvm
> kvm_amd mac80211 libarc4 kvm irqbypass crct10dif_pclmul crc32_pclmul
> iwlwifi crc32c_intel snd_hda_codec_realtek snd_hda_codec_generic
> amdgpu(+) ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_intel
> snd_intel_dspcfg snd_hda_codec rtsx_pci_sdmmc snd_hwdep mmc_core
> snd_hda_core aesni_intel libaes crypto_simd wmi_bmof thinkpad_acpi
> sp5100_tco snd_pcm cryptd nvram ucsi_acpi(+) ledtrig_audio watchdog rapl
> rtsx_pci platform_profile snd_timer pcspkr cfg80211 efi_pstore typec_ucsi
> k10temp ccp i2c_piix4 gpu_sched mfd_core r8169 roles snd typec wmi
> soundcore ac battery video i2c_scmi acpi_cpufreq button psmouse
> serio_raw nvme nvme_core
> [2.717057] CPU: 12 PID: 389 Comm: systemd-udevd Not tainted 5.12.0-rc2+
> #1
> [2.717062] Hardware name: LENOVO 20Y2CC/

RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: David Hildenbrand 
> Sent: Friday, March 12, 2021 10:48 AM
> To: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; amd-gfx list ;
> Andrew Morton ; Liang, Liang (Leo)
> 
> Cc: Huang, Ray ; Koenig, Christian
> ; Mike Rapoport ;
> Rafael J. Wysocki ; George Kennedy
> 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
> tail in __free_pages_core()")
> 
> > 8G (with some carve out for the integrated GPU).
> > [0.044181] Memory: 6858688K/7200304K available (14345K kernel code,
> 9659K rwdata, 4980K rodata, 2484K init, 12292K bss, 341360K reserved, 0K
> cma-reserved)
> >
> > Nothing particularly special about these systems that I am aware of.  I'll 
> > see
> if we can repro this issue on any other platforms, but so far, not one has
> noticed any problems.
> >
> >>
> >> Increasing the boot time from a few seconds to 2-3 minutes does not
> >> smell like some corner case cache effects we might be hitting in this
> >> particular instance - there have been minor reports that it either
> >> slightly increased or slightly decreases initial system performance, but 
> >> that
> was about it.
> >>
> >> Either, yet another latent BUG (but why? why should memory access
> >> suddenly be that slow? I could only guess that we are now making
> >> sooner use of very slow memory), or there is really something else weird
> going on.
> >
> > Looks like pretty much everything is slower based on the timestamps in the
> dmesg output.  There is a big jump here:
> 
> If we're really dealing with some specific slow memory regions and that
> memory gets allocated for something that gets used regularly, then we might
> get a general slowdown. Hard to identify, though :)
> 
> >
> >> [3.758596] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
> >> [3.759372] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
> >> [   16.177983] ACPI: 13 ACPI AML tables successfully acquired and loaded
> >> [   17.099316] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> >> [   18.969959] ACPI: EC: EC started
> >
> > And here:
> >
> >> [   36.566608] PCI: CLS 64 bytes, default 64
> >> [   36.575383] Trying to unpack rootfs image as initramfs...
> >> [   44.594348] Initramfs unpacking failed: Decoding failed
> >> [   44.765141] Freeing initrd memory: 46348K
> >
> > Also seeing soft lockups:
> >> [  124.588634] watchdog: BUG: soft lockup - CPU#1 stuck for 23s!
> >> [swapper/1:0]
> 
> Yes, I noticed that -- there is a heavy slowdown somewhere.
> 
> As that patch is v5.10 already (and we're close to v5.12) I assume something
> is particularly weird about the platform you are running on - because this is
> the first time I see a report like that.

Well, this platform is not yet widely available outside of AMD so it's not 
likely to have been seen by anyone else, but there is nothing special about it 
compared to any other AMD platforms beyond that that I am aware of.

Alex

RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: David Hildenbrand 
> Sent: Friday, March 12, 2021 9:12 AM
> To: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; amd-gfx list ;
> Andrew Morton 
> Cc: Huang, Ray ; Koenig, Christian
> ; Liang, Liang (Leo) ;
> Mike Rapoport ; Rafael J. Wysocki
> ; George Kennedy 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
> tail in __free_pages_core()")
> 
> On 12.03.21 15:06, Deucher, Alexander wrote:
> > [AMD Public Use]
> >
> >> -Original Message-
> >> From: David Hildenbrand 
> >> Sent: Thursday, March 11, 2021 10:03 AM
> >> To: Deucher, Alexander ; linux-
> >> ker...@vger.kernel.org; amd-gfx list ;
> >> Andrew Morton 
> >> Cc: Huang, Ray ; Koenig, Christian
> >> ; Liang, Liang (Leo)
> ;
> >> Mike Rapoport ; Rafael J. Wysocki
> >> ; George Kennedy 
> >> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages
> >> to tail in __free_pages_core()")
> >>
> >> On 11.03.21 15:41, Deucher, Alexander wrote:
> >>> [AMD Public Use]
> >>>
> >>> Booting kernels on certain AMD platforms takes 2-3 minutes with the
> >>> patch
> >> in the subject.  Reverting it restores quick boot times (few
> >> seconds).  Any ideas?
> >>>
> >>
> >> Hi,
> >>
> >> We just discovered latent BUGs in ACPI code whereby ACPI tables are
> >> exposed to the page allocator as ordinary, free system RAM. With the
> >> patch you mention, the order in which pages get allocated from the
> >> page allocator are changed - which makes the BUG trigger more easily.
> >>
> >> I could imagine that someone allocates and uses that memory on your
> >> platform, and I could imagine that such accesses are very slow.
> >>
> >> I cannot tell if that is the root cause, but at least it would make sense.
> >>
> >> See
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> >>
> kernel.org%2Fpatchwork%2Fpatch%2F1389314%2Fdata=04%7C01%7C
> >>
> alexander.deucher%40amd.com%7Cd1533aaddccd464c59f308d8e49ec563%7
> >>
> C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637510717893096801%
> >>
> 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
> >>
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xpty77D54Z5S%2FKK
> >> JO5SsVQaNsHoojWMR73whpu8VT%2B4%3Dreserved=0
> >>
> >> You might want to give that patch a try (not sure if it's the latest
> >> version). CCing George
> >
> > Thanks for the patch.  Unfortunately it didn't help.  Any other ideas?  Is
> there a newer version of that patch?
> >
> 
> @George?
> 
> It's interesting that this only applies to these special AMD systems so far. 
> Is
> there anything particular about these systems? How much memory do these
> systems have?

8G (with some carve out for the integrated GPU).
[0.044181] Memory: 6858688K/7200304K available (14345K kernel code, 9659K 
rwdata, 4980K rodata, 2484K init, 12292K bss, 341360K reserved, 0K cma-reserved)

Nothing particularly special about these systems that I am aware of.  I'll see 
if we can repro this issue on any other platforms, but so far, not one has 
noticed any problems.

> 
> Increasing the boot time from a few seconds to 2-3 minutes does not smell
> like some corner case cache effects we might be hitting in this particular
> instance - there have been minor reports that it either slightly increased or
> slightly decreases initial system performance, but that was about it.
> 
> Either, yet another latent BUG (but why? why should memory access
> suddenly be that slow? I could only guess that we are now making sooner
> use of very slow memory), or there is really something else weird going on.

Looks like pretty much everything is slower based on the timestamps in the 
dmesg output.  There is a big jump here:

> [3.758596] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
> [3.759372] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
> [   16.177983] ACPI: 13 ACPI AML tables successfully acquired and loaded
> [   17.099316] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> [   18.969959] ACPI: EC: EC started

And here:

> [   36.566608] PCI: CLS 64 bytes, default 64
> [   36.575383] Trying to unpack rootfs image as initramfs...
> [   44.594348] Initramfs unpacking failed: Decoding failed
> [   44.765141] Freeing initrd memory: 46348K

Also seeing soft lockups:
> [  124.588634] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapper/1:0]

@Liang, Liang (Leo) can you attach the dmesg outputs with 7fef431be9c9 reverted 
and without?

Alex

> 
> Cheers!
> 
> > Alex
> 
> 
> --
> Thanks,
> 
> David / dhildenb

RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: David Hildenbrand 
> Sent: Thursday, March 11, 2021 10:03 AM
> To: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; amd-gfx list ;
> Andrew Morton 
> Cc: Huang, Ray ; Koenig, Christian
> ; Liang, Liang (Leo) ;
> Mike Rapoport ; Rafael J. Wysocki
> ; George Kennedy 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
> tail in __free_pages_core()")
> 
> On 11.03.21 15:41, Deucher, Alexander wrote:
> > [AMD Public Use]
> >
> > Booting kernels on certain AMD platforms takes 2-3 minutes with the patch
> in the subject.  Reverting it restores quick boot times (few seconds).  Any
> ideas?
> >
> 
> Hi,
> 
> We just discovered latent BUGs in ACPI code whereby ACPI tables are
> exposed to the page allocator as ordinary, free system RAM. With the
> patch you mention, the order in which pages get allocated from the page
> allocator are changed - which makes the BUG trigger more easily.
> 
> I could imagine that someone allocates and uses that memory on your
> platform, and I could imagine that such accesses are very slow.
> 
> I cannot tell if that is the root cause, but at least it would make sense.
> 
> See
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fpatchwork%2Fpatch%2F1389314%2Fdata=04%7C01%7C
> alexander.deucher%40amd.com%7Cd1533aaddccd464c59f308d8e49ec563%7
> C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637510717893096801%
> 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xpty77D54Z5S%2FKK
> JO5SsVQaNsHoojWMR73whpu8VT%2B4%3Dreserved=0
> 
> You might want to give that patch a try (not sure if it's the latest
> version). CCing George

Thanks for the patch.  Unfortunately it didn't help.  Any other ideas?  Is 
there a newer version of that patch?

Alex

> 
> Thanks
> 
> > Thanks,
> >
> > Alex
> >
> > [0.00] Linux version 5.11.0-7490c004ae7e (jenkins@24dbd4b4380b)
> (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu)
> 2.30) #20210308 SMP Sun Mar 7 20:04:05 UTC 2021
> > [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-
> 7490c004ae7e root=UUID=459758f3-5106-4173-b9bc-cf9d528828ec ro
> resume=UUID=23390f67-bbaf-42c1-b31d-64ef7288e39e amd_iommu=off
> nokaslr
> > [0.00] KERNEL supported cpus:
> > [0.00]   Intel GenuineIntel
> > [0.00]   AMD AuthenticAMD
> > [0.00]   Hygon HygonGenuine
> > [0.00]   Centaur CentaurHauls
> > [0.00]   zhaoxin   Shanghai
> > [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point
> registers'
> > [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> > [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> > [0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> > [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 
> > bytes,
> using 'compacted' format.
> > [0.00] BIOS-provided physical RAM map:
> > [0.00] BIOS-e820: [mem 0x-0x0009efff]
> usable
> > [0.00] BIOS-e820: [mem 0x0009f000-0x000b]
> reserved
> > [0.00] BIOS-e820: [mem 0x0010-0x09af]
> usable
> > [0.00] BIOS-e820: [mem 0x09b0-0x09df]
> reserved
> > [0.00] BIOS-e820: [mem 0x09e0-0x09ef]
> usable
> > [0.00] BIOS-e820: [mem 0x09f0-0x09f10fff]
> ACPI NVS
> > [0.00] BIOS-e820: [mem 0x09f11000-0x6c56efff]
> usable
> > [0.00] BIOS-e820: [mem 0x6c56f000-0x6c56]
> reserved
> > [0.00] BIOS-e820: [mem 0x6c57-0x7877efff]
> usable
> > [0.00] BIOS-e820: [mem 0x7877f000-0x7af7efff]
> reserved
> > [0.00] BIOS-e820: [mem 0x7af7f000-0x7cf7efff]
> ACPI NVS
> > [0.00] BIOS-e820: [mem 0x7cf7f000-0x7cffefff]
> ACPI data
> > [0.00] BIOS-e820: [mem 0x7cfff000-0x7cff]
> usable
> > [0.00] BIOS-e820: [mem 0x7d00-0x7dff]
> reserved
> > [0.00] BIOS-e820: [mem 0x7f00-0x7fff]
> reserved
> > [0.00] BIOS-e820: [mem 0xa000-0xa00f]
> reserved
> > [0.00] BIOS-e820: [mem 0xf000-0xf7ff]
> reserved
> > [0.00] BIOS-e820: [mem 0xfec

slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-11 Thread Deucher, Alexander

[AMD Public Use]

Booting kernels on certain AMD platforms takes 2-3 minutes with the patch in 
the subject.  Reverting it restores quick boot times (few seconds).  Any ideas?

Thanks,

Alex

[0.00] Linux version 5.11.0-7490c004ae7e (jenkins@24dbd4b4380b) (gcc 
(Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu) 2.30) 
#20210308 SMP Sun Mar 7 20:04:05 UTC 2021
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-7490c004ae7e 
root=UUID=459758f3-5106-4173-b9bc-cf9d528828ec ro 
resume=UUID=23390f67-bbaf-42c1-b31d-64ef7288e39e amd_iommu=off nokaslr
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Hygon HygonGenuine
[0.00]   Centaur CentaurHauls
[0.00]   zhaoxin   Shanghai  
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009efff] usable
[0.00] BIOS-e820: [mem 0x0009f000-0x000b] reserved
[0.00] BIOS-e820: [mem 0x0010-0x09af] usable
[0.00] BIOS-e820: [mem 0x09b0-0x09df] reserved
[0.00] BIOS-e820: [mem 0x09e0-0x09ef] usable
[0.00] BIOS-e820: [mem 0x09f0-0x09f10fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x09f11000-0x6c56efff] usable
[0.00] BIOS-e820: [mem 0x6c56f000-0x6c56] reserved
[0.00] BIOS-e820: [mem 0x6c57-0x7877efff] usable
[0.00] BIOS-e820: [mem 0x7877f000-0x7af7efff] reserved
[0.00] BIOS-e820: [mem 0x7af7f000-0x7cf7efff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7cf7f000-0x7cffefff] ACPI data
[0.00] BIOS-e820: [mem 0x7cfff000-0x7cff] usable
[0.00] BIOS-e820: [mem 0x7d00-0x7dff] reserved
[0.00] BIOS-e820: [mem 0x7f00-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xa000-0xa00f] reserved
[0.00] BIOS-e820: [mem 0xf000-0xf7ff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec01fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed81fff] reserved
[0.00] BIOS-e820: [mem 0xfedc-0xfedd] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff08-0xffdd] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00023f37] usable
[0.00] BIOS-e820: [mem 0x00023f38-0x00027fff] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] e820: update [mem 0x6a275018-0x6a283857] usable ==> usable
[0.00] e820: update [mem 0x6a275018-0x6a283857] usable ==> usable
[0.00] e820: update [mem 0x6c572018-0x6c57c657] usable ==> usable
[0.00] e820: update [mem 0x6c572018-0x6c57c657] usable ==> usable
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0009efff] 
usable
[0.00] reserve setup_data: [mem 0x0009f000-0x000b] 
reserved
[0.00] reserve setup_data: [mem 0x0010-0x09af] 
usable
[0.00] reserve setup_data: [mem 0x09b0-0x09df] 
reserved
[0.00] reserve setup_data: [mem 0x09e0-0x09ef] 
usable
[0.00] reserve setup_data: [mem 0x09f0-0x09f10fff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x09f11000-0x6a275017] 
usable
[0.00] reserve setup_data: [mem 0x6a275018-0x6a283857] 
usable
[0.00] reserve setup_data: [mem 0x6a283858-0x6c56efff] 
usable
[0.00] reserve setup_data: [mem 0x6c56f000-0x6c56] 
reserved
[0.00] reserve setup_data: [mem 0x6c57-0x6c572017] 
usable
[0.00] reserve setup_data: [mem 0x6c572018-0x6c57c657] 
usable
[0.00] reserve setup_data: [mem 0x6c57c658-0x7877efff] 
usable
[0.00] reserve setup_data: [mem 0x7877f000-0x7af7efff] 
reserved
[0.00] reserve setup_data: [mem

RE: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio compute queue

2021-03-05 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Koenig, Christian 
> Sent: Friday, March 5, 2021 10:35 AM
> To: Sasha Levin ; Deucher, Alexander
> 
> Cc: Greg Kroah-Hartman ; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; Das, Nirmoy
> 
> Subject: Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio
> compute queue
> 
> Am 05.03.21 um 16:31 schrieb Sasha Levin:
> > On Fri, Mar 05, 2021 at 03:27:00PM +, Deucher, Alexander wrote:
> >> Not sure if Sasha picked that up or not. Would need to check that. If
> >> it's not, this patch should be dropped.
> >
> > Yes, it went in via autosel. I can drop it if it's not needed.
> >
> 
> IIRC this patch was created *before* the feature which needs it was merged.
> So it isn't a bug fix, but rather just a prerequisite for a new feature.
> 
> Because of this it should only be merged into an older kernel if the new
> features is back ported as well.
> 
> Alex do you agree that we can drop it?

I think so, but I don't remember the exact sequence.  @Das, Nirmoy?

Alex

RE: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio compute queue

2021-03-05 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Koenig, Christian 
> Sent: Friday, March 5, 2021 10:18 AM
> To: Deucher, Alexander ; Greg Kroah-
> Hartman ; linux-kernel@vger.kernel.org
> Cc: sta...@vger.kernel.org; Das, Nirmoy ; Sasha
> Levin 
> Subject: Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio
> compute queue
> 
> Am 05.03.21 um 15:48 schrieb Deucher, Alexander:
> > [AMD Public Use]
> >
> >> -Original Message-
> >> From: Koenig, Christian 
> >> Sent: Friday, March 5, 2021 8:03 AM
> >> To: Greg Kroah-Hartman ; linux-
> >> ker...@vger.kernel.org
> >> Cc: sta...@vger.kernel.org; Das, Nirmoy ;
> >> Deucher, Alexander ; Sasha Levin
> >> 
> >> Subject: Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high
> >> prio compute queue
> >>
> >> Mhm, I'm not sure this one needs to be backported.
> >>
> >> Why did you pick it up Greg?
> > It was picked up by Sasha's fixes checker.
> 
> Well the change who needs this isn't in any earlier kernel, isn't it?

Not sure if Sasha picked that up or not.  Would need to check that.  If it's 
not, this patch should be dropped.

Alex

> 
> Christian.
> 
> >
> > Alex
> >
> >
> >> Thanks,
> >> Christian.
> >>
> >> Am 05.03.21 um 13:21 schrieb Greg Kroah-Hartman:
> >>> From: Nirmoy Das 
> >>>
> >>> [ Upstream commit 8c0225d79273968a65e73a4204fba023ae02714d ]
> >>>
> >>> For high priority compute to work properly we need to enable wave
> >>> limiting on gfx pipe. Wave limiting is done through writing into
> >>> mmSPI_WCL_PIPE_PERCENT_GFX register. Enable only one high priority
> >>> compute queue to avoid race condition between multiple high priority
> >>> compute queues writing that register simultaneously.
> >>>
> >>> Signed-off-by: Nirmoy Das 
> >>> Acked-by: Christian König 
> >>> Reviewed-by: Alex Deucher 
> >>> Signed-off-by: Alex Deucher 
> >>> Signed-off-by: Sasha Levin 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 15 ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 +-
> >>>drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  |  6 ++
> >>>drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  6 ++
> >>>drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  7 ++-
> >>>5 files changed, 15 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> index cd2c676a2797..8e0a6c62322e 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> @@ -193,15 +193,16 @@ static bool
> >> amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
> >>>}
> >>>
> >>>bool amdgpu_gfx_is_high_priority_compute_queue(struct
> >> amdgpu_device *adev,
> >>> -int pipe, int queue)
> >>> +struct amdgpu_ring *ring)
> >>>{
> >>> - bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
> >>> - int cond;
> >>> - /* Policy: alternate between normal and high priority */
> >>> - cond = multipipe_policy ? pipe : queue;
> >>> -
> >>> - return ((cond % 2) != 0);
> >>> + /* Policy: use 1st queue as high priority compute queue if we
> >>> +  * have more than one compute queue.
> >>> +  */
> >>> + if (adev->gfx.num_compute_rings > 1 &&
> >>> + ring == >gfx.compute_ring[0])
> >>> + return true;
> >>>
> >>> + return false;
> >>>}
> >>>
> >>>void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device
> >> *adev)
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> index 6b5a8f4642cc..72dbcd2bc6a6 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> >>> @@ -380,7 +380,7 @@ void
> >> amdgpu_queue_mask_bit_to_mec_queue(struct amdgpu_device *adev,
> int
> >> bit,
> >>>bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device
> *adev,
> >&g

RE: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio compute queue

2021-03-05 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Koenig, Christian 
> Sent: Friday, March 5, 2021 8:03 AM
> To: Greg Kroah-Hartman ; linux-
> ker...@vger.kernel.org
> Cc: sta...@vger.kernel.org; Das, Nirmoy ; Deucher,
> Alexander ; Sasha Levin
> 
> Subject: Re: [PATCH 5.11 079/104] drm/amdgpu: enable only one high prio
> compute queue
> 
> Mhm, I'm not sure this one needs to be backported.
> 
> Why did you pick it up Greg?

It was picked up by Sasha's fixes checker.  

Alex


> 
> Thanks,
> Christian.
> 
> Am 05.03.21 um 13:21 schrieb Greg Kroah-Hartman:
> > From: Nirmoy Das 
> >
> > [ Upstream commit 8c0225d79273968a65e73a4204fba023ae02714d ]
> >
> > For high priority compute to work properly we need to enable wave
> > limiting on gfx pipe. Wave limiting is done through writing into
> > mmSPI_WCL_PIPE_PERCENT_GFX register. Enable only one high priority
> > compute queue to avoid race condition between multiple high priority
> > compute queues writing that register simultaneously.
> >
> > Signed-off-by: Nirmoy Das 
> > Acked-by: Christian König 
> > Reviewed-by: Alex Deucher 
> > Signed-off-by: Alex Deucher 
> > Signed-off-by: Sasha Levin 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 15 ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 +-
> >   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  |  6 ++
> >   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  6 ++
> >   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  7 ++-
> >   5 files changed, 15 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index cd2c676a2797..8e0a6c62322e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -193,15 +193,16 @@ static bool
> amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
> >   }
> >
> >   bool amdgpu_gfx_is_high_priority_compute_queue(struct
> amdgpu_device *adev,
> > -  int pipe, int queue)
> > +  struct amdgpu_ring *ring)
> >   {
> > -   bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
> > -   int cond;
> > -   /* Policy: alternate between normal and high priority */
> > -   cond = multipipe_policy ? pipe : queue;
> > -
> > -   return ((cond % 2) != 0);
> > +   /* Policy: use 1st queue as high priority compute queue if we
> > +* have more than one compute queue.
> > +*/
> > +   if (adev->gfx.num_compute_rings > 1 &&
> > +   ring == >gfx.compute_ring[0])
> > +   return true;
> >
> > +   return false;
> >   }
> >
> >   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device
> *adev)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index 6b5a8f4642cc..72dbcd2bc6a6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -380,7 +380,7 @@ void
> amdgpu_queue_mask_bit_to_mec_queue(struct amdgpu_device *adev, int
> bit,
> >   bool amdgpu_gfx_is_mec_queue_enabled(struct amdgpu_device *adev,
> int mec,
> >  int pipe, int queue);
> >   bool amdgpu_gfx_is_high_priority_compute_queue(struct
> amdgpu_device *adev,
> > -  int pipe, int queue);
> > +  struct amdgpu_ring *ring);
> >   int amdgpu_gfx_me_queue_to_bit(struct amdgpu_device *adev, int me,
> >int pipe, int queue);
> >   void amdgpu_gfx_bit_to_me_queue(struct amdgpu_device *adev, int
> bit,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > index e7d6da05011f..3a291befcddc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > @@ -4495,8 +4495,7 @@ static int gfx_v10_0_compute_ring_init(struct
> amdgpu_device *adev, int ring_id,
> > irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
> > + ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
> > + ring->pipe;
> > -   hw_prio = amdgpu_gfx_is_high_priority_compute_queue(adev,
> ring->pipe,
> > -   ring->queue) ?
> > +   hw_prio = amdgpu_gfx_is_high_priority_compute_queue(adev,
&

RE: [PATCH 5.10 637/717] drm/amd/display: Fix memory leaks in S3 resume

2021-01-05 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Andre Tomt 
> Sent: Tuesday, January 5, 2021 11:32 AM
> To: Greg Kroah-Hartman 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Deucher, Alexander
> ; Kazlauskas, Nicholas
> ; Wang, Chao-kai (Stylon)
> 
> Subject: Re: [PATCH 5.10 637/717] drm/amd/display: Fix memory leaks in S3
> resume
> 
> On 05.01.2021 07:54, Greg Kroah-Hartman wrote:
> > On Mon, Jan 04, 2021 at 08:04:08PM +0100, Andre Tomt wrote:
> >> On 28.12.2020 13:50, Greg Kroah-Hartman wrote:
> >>> From: Stylon Wang 
> >>>
> >>> commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.
> >>>
> >>> EDID parsing in S3 resume pushes new display modes to probed_modes
> >>> list but doesn't consolidate to actual mode list. This creates a
> >>> race condition when
> >>> amdgpu_dm_connector_ddc_get_modes() re-initializes the list head
> >>> without walking the list and results in  memory leak.
> >>
> >> This commit is causing me problems on 5.10.4: when I turn off the
> >> display (a LG TV in this case), and turn it back on again later there
> >> is no video output and I get the following in the kernel log:
> >>
> >> [ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]]
> *ERROR*
> >> Restoring old state failed with -12
> >>
> >> I've found another report on this commit as well:
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
> >>
> zilla.kernel.org%2Fshow_bug.cgi%3Fid%3D211033data=04%7C01%7Cal
> ex
> >>
> ander.deucher%40amd.com%7Cad673e351bab4f6af94508d8b1977ed8%7C3d
> d8961f
> >>
> e4884e608e11a82d994e183d%7C0%7C0%7C637454612140560971%7CUnknow
> n%7CTWF
> >>
> pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV
> CI6
> >>
> Mn0%3D%7C3000sdata=8Rsnbfh4P5GmFUlybb31mT7C0Ee4vDInxJ1gt
> C3jrVI%3
> >> Dreserved=0
> >>
> >> And I suspect this is the same:
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
> >>
> s.archlinux.org%2Ftask%2F69202data=04%7C01%7Calexander.deuche
> r%4
> >>
> 0amd.com%7Cad673e351bab4f6af94508d8b1977ed8%7C3dd8961fe4884e608
> e11a82
> >>
> d994e183d%7C0%7C0%7C637454612140560971%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIj
> >>
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000
> >>
> mp;sdata=1149tCcm3rcaj1MkVPbWqWhWFIPgkeBoYxo0oVv%2FzNI%3D
> mp;reserve
> >> d=0
> >>
> >> Reverting it from 5.10.4 makes things behave again.
> >>
> >> Have not tested 5.4.86 or 5.11-rc.
> >>
> >> I'm using a RX570 Polaris based card.
> >
> > Can you test 5.11-rc to see if this issue is there as well?
> 
> Just did, and have the same issue on 5.11-rc2. Reverting it also solves the
> problem on 5.11-rc2, as it does on 5.10.4
> 
> FWIW one easy way to reproduce seems to be unplugging and re-plugging
> the HDMI.

We are looking into the root cause, but I'll send out the revert for now.

Thanks,

Alex

RE: amd-pmc s2idle driver issues

2020-12-22 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Alexander Monakov 
> Sent: Tuesday, December 22, 2020 10:57 AM
> To: Deucher, Alexander 
> Cc: platform-driver-...@vger.kernel.org; S-k, Shyam-sundar  sundar@amd.com>; Hans de Goede ; linux-
> ker...@vger.kernel.org
> Subject: RE: amd-pmc s2idle driver issues
> 
> On Tue, 22 Dec 2020, Deucher, Alexander wrote:
> 
> > > Yes. Out-of-the-box it's a "modern standby" laptop. There's a "hidden"
> > > bios menu with extra settings that apparently allows to select legacy S3.
> > > I did not change it, so I'm testing the "modern" mode.
> > >
> > > Note that this driver fetches SMU version from MMIO, which looks odd
> > > to
> > > me:
> > > elsewhere (i.e. in the amdgpu driver) SMU version is retrieved by
> > > issuing the corresponding SMU command, as far as I can tell.
> >
> > There are multiple interfaces to the SMU. It's shared by the entire
> > SoC on APUs.
> 
> Just pointing that out because evidently this interface does not work on this
> laptop, producing all-ones instead of something resembling a version
> number.
> 
> Which APU generations does this driver support? If it does not support
> Renoir
> (yet?) it should be documented in the Kconfig text. Is Renoir support related
> to missing AMD0005 ACPI id binding, and borked version number info?

The current code supports both Raven/Picasso and Renoir parts. At least some 
Renoir parts are supported as that is what we are mainly testing now.  I'm not 
sure why some boards have AMDI0005 vs AMD0005. We'll have to check with the 
sbios or windows teams.

Alex

RE: amd-pmc s2idle driver issues

2020-12-22 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Alexander Monakov 
> Sent: Tuesday, December 22, 2020 9:45 AM
> To: Deucher, Alexander 
> Cc: platform-driver-...@vger.kernel.org; S-k, Shyam-sundar  sundar@amd.com>; Hans de Goede ; linux-
> ker...@vger.kernel.org
> Subject: RE: amd-pmc s2idle driver issues
> 
> On Tue, 22 Dec 2020, Deucher, Alexander wrote:
> 
> > > I've tried the "platform/x86: amd-pmc: Add AMD platform support for
> > > S2Idle"
> > > patch on my Acer Swift SF314-42 laptop (Renoir SoC, Ryzen 4500U CPU)
> > > and hit the following issues:
> > >
> > > 1. The driver doesn't bind to any device. It has the following binding 
> > > table:
> > >
> > > +static const struct acpi_device_id amd_pmc_acpi_ids[] = {
> > > + {"AMDI0005", 0},
> > > + {"AMD0004", 0},
> > > + { }
> > > +};
> > >
> > > This laptop has "AMD0005" instead. Adding it to the list allows the
> > > driver to successfully probe.
> > >
> > > 2. The debugfs interface does not seem to be very helpful. It shows
> > >
> > > SMU FW Info: 
> > >
> > > It's not very informative. The code seems to be fetching SMU version
> > > from mmio, so I guess the file should be saying "FW version" rather
> > > than "FW Info", and then, I think version number is not supposed to be "-
> 1".
> > >
> >
> > Does your platform support modern standby?  You may have to select
> > between legacy S3 and modern standby in the sbios.
> 
> Yes. Out-of-the-box it's a "modern standby" laptop. There's a "hidden"
> bios menu with extra settings that apparently allows to select legacy S3.
> I did not change it, so I'm testing the "modern" mode.
> 
> Note that this driver fetches SMU version from MMIO, which looks odd to
> me:
> elsewhere (i.e. in the amdgpu driver) SMU version is retrieved by issuing the
> corresponding SMU command, as far as I can tell.

There are multiple interfaces to the SMU. It's shared by the entire SoC on APUs.

Alex

RE: amd-pmc s2idle driver issues

2020-12-22 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Alexander Monakov 
> Sent: Sunday, December 20, 2020 4:12 PM
> To: platform-driver-...@vger.kernel.org
> Cc: S-k, Shyam-sundar ; Hans de Goede
> ; Deucher, Alexander
> ; linux-kernel@vger.kernel.org
> Subject: amd-pmc s2idle driver issues
> 
> Hi folks,
> 
> I've tried the "platform/x86: amd-pmc: Add AMD platform support for
> S2Idle"
> patch on my Acer Swift SF314-42 laptop (Renoir SoC, Ryzen 4500U CPU) and
> hit the following issues:
> 
> 1. The driver doesn't bind to any device. It has the following binding table:
> 
> +static const struct acpi_device_id amd_pmc_acpi_ids[] = {
> + {"AMDI0005", 0},
> + {"AMD0004", 0},
> + { }
> +};
> 
> This laptop has "AMD0005" instead. Adding it to the list allows the driver to
> successfully probe.
> 
> 2. The debugfs interface does not seem to be very helpful. It shows
> 
> SMU FW Info: 
> 
> It's not very informative. The code seems to be fetching SMU version from
> mmio, so I guess the file should be saying "FW version" rather than "FW
> Info", and then, I think version number is not supposed to be "-1".
> 

Does your platform support modern standby?  You may have to select between 
legacy S3 and modern standby in the sbios.

> 
> (and I'm afraid I cannot use the driver, as there seems to be an issue with
> GPU resume: sometimes the screen is frozen or black after resume, so I
> need to reboot the laptop :( )

We are still working through various platform specific sbios issues on some 
renoir platforms.  We'll be sending out the appropriate quirks to handle them 
once we've sorted them all out.

Alex

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-10 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Sent: Thursday, December 10, 2020 5:48 AM
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; 
> linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Alright. Done that.
> This should be it finally I believe.
> Which will be the initial kernel-version that incorporates that?

Looks good to me.  Bjorn, can you pick this up for PCI?

Alex

> 
> -----Original Message-
> From: Deucher, Alexander 
> Sent: Mittwoch, 9. Dezember 2020 15:24
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> Huang, Ray ; Kuehling, Felix 
> 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; 
> linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Wednesday, December 9, 2020 2:59 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-kernel@vger.kernel.org;
> > linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Alex,
> >
> > I had to revise the patch. Please see attachment. It is actually two 
> > more SSIDs affected to that.
> 
> Other than some minor whitespace issues, the patch looks fine to me.
> Please align the subsystem_device lines and put the closing 
> parenthesis on the same line as the last check.
> 
> Thanks!
> 
> Alex
> 
> >
> > Best regards,
> > Edgar
> >
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > Sent: Dienstag, 8. Dezember 2020 09:23
> > To: 'Deucher, Alexander' ; 'Huang, Ray'
> > ; 'Kuehling, Felix' 
> > Cc: 'Will Deacon' ; 'linux-kernel@vger.kernel.org'
> > ; 'linux-...@vger.kernel.org'  > p...@vger.kernel.org>; 'io...@lists.linux-foundation.org'
> > ; 'Bjorn Helgaas'
> > ; 'Joerg Roedel' ; 'Zhu, 
> > Changfeng' 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Applied the patch as in attachment. Verified that ATS for GPU-Device 
> > had been disabled. See attachment "dmesg_ATS.log".
> >
> > Was running that build over night successfully.
> >
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > Sent: Montag, 7. Dezember 2020 05:53
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-kernel@vger.kernel.org;
> > linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Hi Alex,
> >
> > I believe in the patch file, this
> > +   (pdev->subsystem_device == 0x0c19 ||
> > +    pdev->subsystem_device == 0x0c10))
> >
> > Has to be changed to:
> > +   (pdev->subsystem_device == 0xce19 ||
> > +pdev->subsystem_device == 0xcc10))
> >
> > Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and 
> > another one would "ea50:cc08".
> >
> > I will apply that patch and feedback the results soon plus the patch 
> > file that I actually had applied.
> >
> >
> > -Original Message-----
> > From: Deucher, Alexander 
> > Sent: Montag, 30. November 2020 19:36
> > To: Merger, Edgar [AUTOSOL/MAS/AUGS]
> ;
> > Huang, Ray ; Kuehling, Felix 
> > 
> > Cc: Will Deacon ; linux-kernel@vger.kernel.org;
> > linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > [AMD Public Use]
> >
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > 
> > > Sent: Thursday, November 26, 2020 4:24 AM
> > > To: Deucher, Alexander ; Huang, Ray 
> > > ; Kuehling, Felix 
> > >

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-09 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Sent: Wednesday, December 9, 2020 2:59 AM
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; 
> linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Alex,
> 
> I had to revise the patch. Please see attachment. It is actually two 
> more SSIDs affected to that.

Other than some minor whitespace issues, the patch looks fine to me.  Please 
align the subsystem_device lines and put the closing parenthesis on the same 
line as the last check.

Thanks!

Alex

> 
> Best regards,
> Edgar
> 
> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Dienstag, 8. Dezember 2020 09:23
> To: 'Deucher, Alexander' ; 'Huang, Ray'
> ; 'Kuehling, Felix' 
> Cc: 'Will Deacon' ; 'linux-kernel@vger.kernel.org' 
> ; 'linux-...@vger.kernel.org'  p...@vger.kernel.org>; 'io...@lists.linux-foundation.org'
> ; 'Bjorn Helgaas'
> ; 'Joerg Roedel' ; 'Zhu, 
> Changfeng' 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Applied the patch as in attachment. Verified that ATS for GPU-Device 
> had been disabled. See attachment "dmesg_ATS.log".
> 
> Was running that build over night successfully.
> 
> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Montag, 7. Dezember 2020 05:53
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; 
> linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Hi Alex,
> 
> I believe in the patch file, this
> + (pdev->subsystem_device == 0x0c19 ||
> +  pdev->subsystem_device == 0x0c10))
> 
> Has to be changed to:
> + (pdev->subsystem_device == 0xce19 ||
> +  pdev->subsystem_device == 0xcc10))
> 
> Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and 
> another one would "ea50:cc08".
> 
> I will apply that patch and feedback the results soon plus the patch 
> file that I actually had applied.
> 
> 
> -Original Message-
> From: Deucher, Alexander 
> Sent: Montag, 30. November 2020 19:36
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> Huang, Ray ; Kuehling, Felix 
> 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; 
> linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Thursday, November 26, 2020 4:24 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-kernel@vger.kernel.org;
> > linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > Alex,
> >
> > This is pretty much the same patch as what I have received from 
> > Joerg previously, except that it is tied to the particular Emerson 
> > platform and its derivatives (listed with Subsystem IDs).
> 
> Right.  As per my original point, I don't want to disable ATS on all 
> Picasso chips because doing so would break GPU compute on them, so I'd 
> like to apply this quirk as narrowly as possible.
> 
> >
> > Below patch was what Joerg provided me and I successfully tested.
> >
> > This diff to the kernel should do that:
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
> > f70692ac79c5..3911b0ec57ba 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5176,6 +5176,8 @@
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> > 0x6900, quirk_amd_harvest_no_ats);
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, 
> > quirk_amd_harvest_no_ats);
> >  /* AMD Navi14 dGPU */
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, 
> > quirk_amd_harvest_no_ats);
> > +/* AMD Raven platform iGPU */
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, 
> > +quirk_amd_harvest_no_ats);
> >  #endif /* CONFIG_PC

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-30 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> Sent: Thursday, November 26, 2020 4:24 AM
> To: Deucher, Alexander ; Huang, Ray
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Alex,
> 
> This is pretty much the same patch as what I have received from Joerg
> previously, except that it is tied to the particular Emerson platform and its
> derivatives (listed with Subsystem IDs).

Right.  As per my original point, I don't want to disable ATS on all Picasso 
chips because doing so would break GPU compute on them, so I'd like to apply 
this quirk as narrowly as possible.

> 
> Below patch was what Joerg provided me and I successfully tested.
> 
> This diff to the kernel should do that:
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> f70692ac79c5..3911b0ec57ba 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> 0x6900, quirk_amd_harvest_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> quirk_amd_harvest_no_ats);
> +/* AMD Raven platform iGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8,
> +quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> 
>  /* Freescale PCIe doesn't support MSI in RC mode */
> 
> So far I have seen this issue on two instances of this chip, but I must admit
> that I did test only two of them to this extent, so I guess it is not a bad 
> chip in
> particular, but the chips we use are from the same production lot, so it might
> be a systematical problem of that production lot?
> 
> UEFI-Setup shows:
> Processor Family: 17h
> Procossor Model: 20h - 2Fh
> CPUID: 00820F01
> Microcode Patch Level: 8200103
> 
> Looking at the chip-die I found that this is a fully qualified IP Silicon 
> (according
> to Ryzen Embedded R1000 SOC Interlock).
> YE1305C9T20FG
> BI2015SUY
> 9JB6496P00123
> 2016 AMD
> DIFFUSED IN USA
> MADE IN CHINA
> 
> Currently used SBIOS is a branch from "EmbeddedPI-FP5 1.2.0.3RC3".
> 
> In the future our SBIOS might merge with EmbeddedPI-FP5_1.2.0.5RC3.
> 

I think it's more likely an sbios issue, so hopefully the new release fixes it.

Alex

> 
> 
> 
> -Original Message-
> From: Deucher, Alexander 
> Sent: Mittwoch, 25. November 2020 17:08
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ;
> Huang, Ray ; Kuehling, Felix
> 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Wednesday, November 25, 2020 5:04 AM
> > To: Deucher, Alexander ; Huang, Ray
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-kernel@vger.kernel.org;
> > linux- p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn
> > Helgaas ; Joerg Roedel ; Zhu,
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> > broken
> >
> > I do have also other problems with this unit, when IOMMU is enabled
> > and pci=noats is not set as kernel parameter.
> >
> > [ 2004.265906] amdgpu :0b:00.0: [drm:amdgpu_ib_ring_tests
> > [amdgpu]]
> > *ERROR* IB test failed on gfx (-110).
> > [ 2004.266024] [drm:amdgpu_device_delayed_init_work_handler
> [amdgpu]]
> > *ERROR* ib ring test failed (-110).
> >
> 
> Is this seen on all instances of this chip or only specific silicon?  I.e., 
> could this
> be a bad chip?  Would it be possible to test a newer sbios?  I think the
> attached patch should work if we can't get it fixed on the platform side.  It
> should only enable the quirk on your particular platform.
> 
> Alex
> 
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > Sent: Mittwoch, 25. November 2020 10:16
> > To: 'Deucher, Alexander' ; 'Huang, Ray'
> > ; 'Kuehling, Felix' 
> > Cc: 'Will Deacon' ; 'linux-kernel@vger.kernel.org'
> > ; 'linux-...@vger.kernel.org'  > p...@vger.kernel.org>; 'io...@lists.linux-foundation.org'
> > ; 'Bjorn Helgaas'
> > ; 'Joerg Roedel' ; 'Zhu,
> > Chan

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-25 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> Sent: Wednesday, November 25, 2020 5:04 AM
> To: Deucher, Alexander ; Huang, Ray
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> I do have also other problems with this unit, when IOMMU is enabled and
> pci=noats is not set as kernel parameter.
> 
> [ 2004.265906] amdgpu :0b:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]]
> *ERROR* IB test failed on gfx (-110).
> [ 2004.266024] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]]
> *ERROR* ib ring test failed (-110).
> 

Is this seen on all instances of this chip or only specific silicon?  I.e., 
could this be a bad chip?  Would it be possible to test a newer sbios?  I think 
the attached patch should work if we can't get it fixed on the platform side.  
It should only enable the quirk on your particular platform.

Alex


> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Mittwoch, 25. November 2020 10:16
> To: 'Deucher, Alexander' ; 'Huang, Ray'
> ; 'Kuehling, Felix' 
> Cc: 'Will Deacon' ; 'linux-kernel@vger.kernel.org'  ker...@vger.kernel.org>; 'linux-...@vger.kernel.org'  p...@vger.kernel.org>; 'io...@lists.linux-foundation.org'
> ; 'Bjorn Helgaas'
> ; 'Joerg Roedel' ; 'Zhu,
> Changfeng' 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Remark:
> 
> Systems with R1305G APU (which show the issue) have the following VGA-
> Controller:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Picasso (rev cf)
> 
> Systems with V1404I APU (which do not show the issue) have the following
> VGA-Controller:
> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev 83)
> 
> "rev cf" vs. "ref 83" is probably what you where referring to with PCI 
> Revision
> ID.
> 
> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> Sent: Mittwoch, 25. November 2020 07:05
> To: 'Deucher, Alexander' ; Huang, Ray
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> I see that problem only on systems that use a R1305G APU
> 
> sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
> 
> shows
> 
> VCE feature version: 0, firmware version: 0x UVD feature version: 0,
> firmware version: 0x MC feature version: 0, firmware version:
> 0x ME feature version: 50, firmware version: 0x00a3 PFP
> feature version: 50, firmware version: 0x00bb CE feature version: 50,
> firmware version: 0x004f RLC feature version: 1, firmware version:
> 0x0049 RLC SRLC feature version: 1, firmware version: 0x0001 RLC
> SRLG feature version: 1, firmware version: 0x0001 RLC SRLS feature
> version: 1, firmware version: 0x0001 MEC feature version: 50, firmware
> version: 0x01b5
> MEC2 feature version: 50, firmware version: 0x01b5 SOS feature version:
> 0, firmware version: 0x ASD feature version: 0, firmware version:
> 0x2130 TA XGMI feature version: 0, firmware version: 0x TA
> RAS feature version: 0, firmware version: 0x SMC feature version: 0,
> firmware version: 0x2527
> SDMA0 feature version: 41, firmware version: 0x00a9 VCN feature
> version: 0, firmware version: 0x0110901c DMCU feature version: 0, firmware
> version: 0x0001 VBIOS version: 113-RAVEN2-117
> 
> We are also using V1404I APU on the same boards and I haven´t seen the
> issue on those boards
> 
> These boards give me slightly different info: sudo cat
> /sys/kernel/debug/dri/0/amdgpu_firmware_info
> 
> VCE feature version: 0, firmware version: 0x UVD feature version: 0,
> firmware version: 0x MC feature version: 0, firmware version:
> 0x ME feature version: 47, firmware version: 0x00a2 PFP
> feature version: 47, firmware version: 0x00b9 CE feature version: 47,
> firmware version: 0x004e RLC feature version: 1, firmware version:
> 0x0213 RLC SRLC feature version: 1, firmware version: 0x0001 RLC
> SRLG feature version: 1, firmware version: 0x0001 RLC SRLS feature
> version: 1, firmware version: 0x0001 MEC feature version: 47, firmware
> version: 0x01ab
> MEC2 feature

RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-24 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> Sent: Tuesday, November 24, 2020 2:29 AM
> To: Huang, Ray ; Kuehling, Felix
> 
> Cc: Will Deacon ; Deucher, Alexander
> ; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn Helgaas
> ; Joerg Roedel ; Zhu, Changfeng
> 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Module Version : PiccasoCpu 10
> AGESA Version   : PiccasoPI 100A
> 
> I did not try to enter the system in any other way (like via ssh) than via
> Desktop.

You can get this information from the amdgpu driver.  E.g., sudo cat 
/sys/kernel/debug/dri/0/amdgpu_firmware_info .  Also what is the PCI revision 
id of your chip (from lspci)?  Also are you just seeing this on specific 
versions of the sbios?

Thanks,

Alex


> 
> -Original Message-
> From: Huang Rui 
> Sent: Dienstag, 24. November 2020 07:43
> To: Kuehling, Felix 
> Cc: Will Deacon ; Deucher, Alexander
> ; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; io...@lists.linux-foundation.org; Bjorn Helgaas
> ; Merger, Edgar [AUTOSOL/MAS/AUGS]
> ; Joerg Roedel ;
> Changfeng Zhu 
> Subject: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> 
> On Tue, Nov 24, 2020 at 06:51:11AM +0800, Kuehling, Felix wrote:
> > On 2020-11-23 5:33 p.m., Will Deacon wrote:
> > > On Mon, Nov 23, 2020 at 09:04:14PM +, Deucher, Alexander wrote:
> > >> [AMD Public Use]
> > >>
> > >>> -Original Message-
> > >>> From: Will Deacon 
> > >>> Sent: Monday, November 23, 2020 8:44 AM
> > >>> To: linux-kernel@vger.kernel.org
> > >>> Cc: linux-...@vger.kernel.org; io...@lists.linux-foundation.org;
> > >>> Will Deacon ; Bjorn Helgaas
> > >>> ; Deucher, Alexander
> > >>> ; Edgar Merger
> > >>> ; Joerg Roedel 
> > >>> Subject: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> > >>>
> > >>> Edgar Merger reports that the AMD Raven GPU does not work reliably
> > >>> on his system when the IOMMU is enabled:
> > >>>
> > >>>| [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> > >>> signaled seq=1, emitted seq=3
> > >>>| [...]
> > >>>| amdgpu :0b:00.0: GPU reset begin!
> > >>>| AMD-Vi: Completion-Wait loop timed out
> > >>>| iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> > >>> device=0b:00.0 address=0x38edc0970]
> > >>>
> > >>> This is indicative of a hardware/platform configuration issue so,
> > >>> since disabling ATS has been shown to resolve the problem, add a
> > >>> quirk to match this particular device while Edgar follows-up with AMD
> for more information.
> > >>>
> > >>> Cc: Bjorn Helgaas 
> > >>> Cc: Alex Deucher 
> > >>> Reported-by: Edgar Merger 
> > >>> Suggested-by: Joerg Roedel 
> > >>> Link:
> > >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-
> 3A__lore%26d%3DDwIDAw%26c%3DjOURTkCZzT8tVB5xPEYIm3YJGoxoTaQs
> QPzPKJGaWbo%26r%3DBJxhacqqa4K1PJGm6_-
> 862rdSP13_P6LVp7j_9l1xmg%26m%3DlNXu2xwvyxEZ3PzoVmXMBXXS55jsmf
> DicuQFJqkIOH4%26s%3D_5VDNCRQdA7AhsvvZ3TJJtQZ2iBp9c9tFHIleTYT_ZM
> %26e%3Ddata=04%7C01%7CAlexander.Deucher%40amd.com%7C6d5f
> a241f9634692c03908d8904a942c%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637417997272974427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100
> 0sdata=OEgYlw%2F1YP0C%2FnWBRQUxwBH56mGOJxYMWSQ%2Fj1Y
> 9f6Q%3Dreserved=0 .
> > >>> kernel.org/linux-
> > >>>
> iommu/MWHPR10MB1310F042A30661D4158520B589FC0@MWHPR10M
> > >>> B1310.namprd10.prod.outlook.com
> > >>>
> her%40amd.com%7C1a883fe14d0c408e7d9508d88fb5df4e%7C3dd8961fe488
> > >>>
> 4e608e11a82d994e183d%7C0%7C0%7C637417358593629699%7CUnknown%7
> > >>>
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> > >>>
> LCJXVCI6Mn0%3D%7C1000sdata=TMgKldWzsX8XZ0l7q3%2BszDWXQJJ
> > >>> LOUfX5oGaoLN8n%2B8%3Dreserved=0
> > >>> Signed-off-by: Will Deacon 
> > >>> ---
> > >>>
> > >>> Hi all,
> > >>>
> > >>> Since Joerg is away at the moment, I'm posting this to try to mak

RE: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-11-23 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Will Deacon 
> Sent: Monday, November 23, 2020 8:44 AM
> To: linux-kernel@vger.kernel.org
> Cc: linux-...@vger.kernel.org; io...@lists.linux-foundation.org; Will
> Deacon ; Bjorn Helgaas ;
> Deucher, Alexander ; Edgar Merger
> ; Joerg Roedel 
> Subject: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> 
> Edgar Merger reports that the AMD Raven GPU does not work reliably on his
> system when the IOMMU is enabled:
> 
>   | [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> signaled seq=1, emitted seq=3
>   | [...]
>   | amdgpu :0b:00.0: GPU reset begin!
>   | AMD-Vi: Completion-Wait loop timed out
>   | iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> device=0b:00.0 address=0x38edc0970]
> 
> This is indicative of a hardware/platform configuration issue so, since
> disabling ATS has been shown to resolve the problem, add a quirk to match
> this particular device while Edgar follows-up with AMD for more information.
> 
> Cc: Bjorn Helgaas 
> Cc: Alex Deucher 
> Reported-by: Edgar Merger 
> Suggested-by: Joerg Roedel 
> Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Flinux-
> iommu%2FMWHPR10MB1310F042A30661D4158520B589FC0%40MWHPR10M
> B1310.namprd10.prod.outlook.comdata=04%7C01%7Calexander.deuc
> her%40amd.com%7C1a883fe14d0c408e7d9508d88fb5df4e%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C637417358593629699%7CUnknown%7
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C1000sdata=TMgKldWzsX8XZ0l7q3%2BszDWXQJJ
> LOUfX5oGaoLN8n%2B8%3Dreserved=0
> Signed-off-by: Will Deacon 
> ---
> 
> Hi all,
> 
> Since Joerg is away at the moment, I'm posting this to try to make some
> progress with the thread in the Link: tag.

+ Felix

What system is this?  Can you provide more details?  Does a sbios update fix 
this?  Disabling ATS for all Ravens will break GPU compute for a lot of people. 
 I'd prefer to just black list this particular system (e.g., just SSIDs or 
revision) if possible.

Alex

> 
> Cheers,
> 
> Will
> 
>  drivers/pci/quirks.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> f70692ac79c5..3911b0ec57ba 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> 0x6900, quirk_amd_harvest_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> quirk_amd_harvest_no_ats);
> +/* AMD Raven platform iGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8,
> +quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> 
>  /* Freescale PCIe doesn't support MSI in RC mode */
> --
> 2.29.2.454.gaff20da3a2-goog

RE: On disabling AGP without working alternative (PCI fallback is broken for years)

2020-11-09 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Thomas “illwieckz“ Debesse 
> Sent: Monday, November 9, 2020 6:41 AM
> To: LKML 
> Cc: Koenig, Christian ; Deucher, Alexander 
> 
> Subject: On disabling AGP without working alternative (PCI fallback is 
> broken for years)
> 
> Hi, on May 12 2020, a commit (ba806f9) was merged disabling AGP in 
> default build.
> 
> It was signed-off by Christian König and Reviewed by Alex Deucher.
> Distributions started to backport this commit, and it seems to have 
> happened with 5.4.0-48-generic on Ubuntu 20.04 LTS side, which was 
> built on Sep 10 2020.
> 
> Around that time I noticed AGP computers experiencing lock-ups and 
> other problems making them unusable after the upgrade. After 
> investigating what was happening bisecting Linux versions, I reverted 
> the commit and those computers were working again.
> 
> Commit message was:
> 
> > This means a performance regression for some GPUs, but also a bug 
> > fix for some others.
> 
> Unfortunately, this commit does not only introduce a performance 
> regression but makes some computers unusable, maybe all computers with 
> AMD CPUs.
> 
> One of the root cause may be that PCI GPUs are broken for years on AMD 
> platforms, it was tested and verified on:
> 
> - K8-based computer with AGP
> - K8-based computer with PCI Express
> - K10-based computer with AGP
> - Piledriver-based computer with PCI Express
> 
> The breakage was tested and reproduced from Linux 4.4 to Linux
> 5.10-rc2 (I have not tried older than 4.4).
> 
> PCI GPUs may be broken on some other platforms, but I have found that 
> testing on an Intel PC (with PCI Express) does not reproduce the issue 
> when the PCI GPU hardware is plugged in.
> 
> There is two patches I'm requesting comments for:
> 
> ## drm/radeon: make all PCI GPUs use 32 bits DMA bit mask
> 
> https://lkml.org/lkml/2020/11/5/307
> 
> This one is not enough to fix PCI GPUs but it is enough to prevent to 
> fail r600_ring_test on ATI PCI devices. Note that Nvidia PCI GPUs 
> can't be fixed by this, and this uncovers other bug with AGP GPUs when 
> AGP is disabled at build time. Also, this patch may makes PCI GPUs 
> working on a non-optimal way on platform that accepts them with 40-bit 
> DMA bit mask (like Intel- based computers that already work without any 
> patch).
> 
> This patch is inspired from the patch made to solve that issue from
> 2012 on kernel 3.5: https://bugzilla.redhat.com/show_bug.cgi?id=785375
> 
> At the time, such change may have been enough to fix the issue, it's 
> not true any more. More breakage may have been introduced since.
> 
> Also, maybe this patch becomes useless when other PCI bugs are fixed, 
> who knows? At least, this is an entry-point for investigations.

I think you may be seeing fallout from this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=33b3ad3788aba846fc8b9a065fe2685a0b64f713
That patch lead to screen corruption and other issues on older radeons.  It 
seemed to be related to AGP and/or HIMEM.  Disabling either of those fixes the 
issues.
I proposed reverting the change, but there was push back to find the root cause:
https://www.spinics.net/lists/stable/msg413960.html


> 
> ## Revert "drm/radeon: disable AGP by default"
> 
> https://lkml.org/lkml/2020/11/5/308
> 
> This is the simple fix but currently only solution to make AMD hosts 
> with AGP port to get a display again, as without this reverts, those 
> computers do not have any alternative to run a display (even not PCI GPUs).
> 
> I'm asking for comments on those patches. I may have reached my own 
> skill cap on kernel development anyway. I can repurpose hardware to 
> test any other patch and can contribute time for such testing. Unlike 
> AGP GPUs, PCI GPUs are hard to find, so you may appreciate the time 
> and availability offered.
> 
> The PCI GPU on AMD CPU issue was verified with both Nvidia (GS 8400GS
> rev.2) and ATI (Radeon HD 4350) PCI GPUs, such GPU sample not being 
> old cards from the previous millennial but capable
> ones: TeraScale RV710 architecture on ATI side and Tesla 1.0 NV98 on 
> Nvidia side. They can both do OpenGL 3.3 and feature both 512M of 
> VRAM. The ATI one had HDMI port, and it is known some variant of the 
> Nvidia one (not the one I own but same specification) had HDMI port too.
> 
> Also, fixing PCI GPUs may not be enough to fix AGP GPUs running as PCI 
> ones, since fixing some issues (not all) on PCI side raises new issues 
> with AGP GPUs running as PCI ones but not on native PCI GPUs (see below).
> 
> Bugs aside, one thing that is important to consider against the AGP 
> disablement is that there i

RE: [PATCH] drm/amdgpu: do not initialise global variables to 0 or NULL

2020-11-03 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Greg KH 
> Sent: Tuesday, November 3, 2020 1:53 AM
> To: Koenig, Christian 
> Cc: Alex Deucher ; Deepak R Varma
> ; David Airlie ; LKML  ker...@vger.kernel.org>; Maling list - DRI developers  de...@lists.freedesktop.org>; Melissa Wen ;
> amd-gfx list ; Daniel Vetter
> ; Daniel Vetter ; Deucher,
> Alexander 
> Subject: Re: [PATCH] drm/amdgpu: do not initialise global variables to 0 or
> NULL
> 
> On Mon, Nov 02, 2020 at 09:06:21PM +0100, Christian König wrote:
> > Am 02.11.20 um 20:43 schrieb Alex Deucher:
> > > On Mon, Nov 2, 2020 at 1:42 PM Deepak R Varma
>  wrote:
> > > > Initializing global variable to 0 or NULL is not necessary and
> > > > should be avoided. Issue reported by checkpatch script as:
> > > > ERROR: do not initialise globals to 0 (or NULL).
> > > I agree that this is technically correct, but a lot of people don't
> > > seem to know that so we get a lot of comments about this code for
> > > the variables that are not explicitly set.  Seems less confusing to
> > > initialize them even if it not necessary.  I don't have a
> > > particularly strong opinion on it however.
> >
> > Agree with Alex.
> >
> > Especially for the module parameters we should have a explicit init
> > value for documentation purposes, even when it is 0.
> 
> Why is this one tiny driver somehow special compared to the entire rest of
> the kernel?  (hint, it isn't...)
> 
> Please follow the normal coding style rules, there's no reason to ignore them
> unless you like to constantly reject patches like this that get sent to you.
> 

I'll apply the patch, but as a data point, this is the first time I've gotten a 
patch to make this change.  I get several bug reports or patches every year to 
explicitly set values to global variables because users assume they are not 
initialized.  So it will always be a trade off as to which patches you want to 
NACK.

Alex

RE: amdgpu crashes on OOM

2020-10-26 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Michel Dänzer 
> Sent: Monday, October 26, 2020 7:04 AM
> To: Alex Xu (Hello71) ; Kazlauskas, Nicholas
> ; Deucher, Alexander
> ; Wentland, Harry
> ; Li, Sun peng (Leo) ;
> amd-...@lists.freedesktop.org
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: amdgpu crashes on OOM
> 
> On 2020-10-26 5:29 a.m., Alex Xu (Hello71) wrote:
> > Hi,
> >
> > I frequently encounter OOM on my system, mostly due to my own fault.
> > Recently, I noticed that not only does a swap storm happen and OOM
> > killer gets invoked, but the graphics output freezes permanently.
> > Checking the kernel messages, I see:
> >
> > kworker/u24:4: page allocation failure: order:5,
> mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
> nodemask=(null)
> > CPU: 6 PID: 279469 Comm: kworker/u24:4 Tainted: GW 
> > 5.9.0-14732-
> g20b1adb60cf6 #2
> > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450
> > Pro4, BIOS P4.20 06/18/2020
> > Workqueue: events_unbound commit_work
> > Call Trace:
> >   ? dump_stack+0x57/0x6a
> >   ? warn_alloc.cold+0x69/0xcd
> >   ? __alloc_pages_direct_compact+0xfb/0x116
> >   ? __alloc_pages_slowpath.constprop.0+0x9c2/0xc14
> >   ? __alloc_pages_nodemask+0x143/0x167
> >   ? kmalloc_order+0x24/0x64
> >   ? dc_create_state+0x1a/0x4d
> >   ? amdgpu_dm_atomic_commit_tail+0x1b19/0x227d
> 
> Looks like dc_create_state should use kvzalloc instead of kzalloc
> (dc_state_free already uses kvfree).
> 
> order:5 means it's trying to allocate 32 physically contiguous pages, which 
> can
> be hard to fulfill even with lower memory pressure.
> 

It was using kvzalloc, but was accidently dropped when that code was 
refactored.  I just sent a patch to fix it.

Alex

> 
> --
> Earthling Michel Dänzer   |
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredh
> at.com%2Fdata=04%7C01%7Calexander.deucher%40amd.com%7Cc60
> 56551dd4d423bdc0508d8799ed189%7C3dd8961fe4884e608e11a82d994e183d
> %7C0%7C0%7C637393070333648663%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> 000sdata=a7Lpu04KnpsFQpCO7y5WOLJSMPpA%2Be1s%2FufgYTDHs2k
> %3Dreserved=0
> Libre software enthusiast | Mesa and X developer

RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-09-06 Thread Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Joerg Roedel 
> Sent: Friday, September 4, 2020 6:06 AM
> To: Deucher, Alexander 
> Cc: jroe...@suse.de; Kuehling, Felix ;
> io...@lists.linux-foundation.org; Huang, Ray ;
> Koenig, Christian ; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
>
> On Fri, Aug 28, 2020 at 03:47:07PM +, Deucher, Alexander wrote:
> > Ah, right,  So CZ and ST are not an issue.  Raven is paired with Zen based
> CPUs.
>
> Okay, so for the Raven case, can you add code to the amdgpu driver which
> makes it fail to initialize on Raven when SME is active? There is a global
> checking function for that, so that shouldn't be hard to do.
>

Sure.  How about the attached patch?

Alex

From f479b9da353c2547c26ebac8930a5dcd9a134eb7 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Sun, 6 Sep 2020 12:05:12 -0400
Subject: [PATCH] drm/amdgpu: Fail to load on RAVEN if SME is active

Due to hardware bugs, scatter/gather display on raven requires
a 1:1 IOMMU mapping, however, SME (System Memory Encryption)
requires an indirect IOMMU mapping because the encryption bit
is beyond the DMA mask of the chip.  As such, the two are
incompatible.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 12e16445df7c..d87d37c25329 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1102,6 +1102,16 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 		return -ENODEV;
 	}
 
+	/* Due to hardware bugs, S/G Display on raven requires a 1:1 IOMMU mapping,
+	 * however, SME requires an indirect IOMMU mapping because the encryption
+	 * bit is beyond the DMA mask of the chip.
+	 */
+	if (mem_encrypt_active() && ((flags & AMD_ASIC_MASK) == CHIP_RAVEN)) {
+		dev_info(>dev,
+			 "SME is not compatible with RAVEN\n");
+		return -ENOTSUPP;
+	}
+
 #ifdef CONFIG_DRM_AMDGPU_SI
 	if (!amdgpu_si_support) {
 		switch (flags & AMD_ASIC_MASK) {
-- 
2.25.4

RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: jroe...@suse.de 
> Sent: Friday, August 28, 2020 11:30 AM
> To: Deucher, Alexander 
> Cc: Kuehling, Felix ; Joerg Roedel
> ; io...@lists.linux-foundation.org; Huang, Ray
> ; Koenig, Christian ;
> Lendacky, Thomas ; Suthikulpanit, Suravee
> ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
> 
> On Fri, Aug 28, 2020 at 03:11:32PM +, Deucher, Alexander wrote:
> > There are hw bugs on Raven and probably Carrizo/Stoney where they need
> > 1:1 mapping to avoid bugs in some corner cases with the displays.
> > Other GPUs should be fine.  The VIDs is 0x1002 and the DIDs are 0x15dd
> > and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876,
> > 0x9877 and 0x98e4 for carrizo and stoney.  As long as we preserve the
> > 1:1 mapping for those asics, we should be fine.
> 
> Okay, Stoney at least has no Zen-based CPU, so no support for memory
> encryption anyway. How about Raven, is it paired with a Zen CPU?

Ah, right,  So CZ and ST are not an issue.  Raven is paired with Zen based CPUs.

Thanks,

Alex

RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-28 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Kuehling, Felix 
> Sent: Friday, August 28, 2020 9:55 AM
> To: jroe...@suse.de; Deucher, Alexander 
> Cc: Joerg Roedel ; io...@lists.linux-foundation.org;
> Huang, Ray ; Koenig, Christian
> ; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
> 
> Am 2020-08-28 um 9:46 a.m. schrieb jroe...@suse.de:
> > On Wed, Aug 26, 2020 at 03:25:58PM +, Deucher, Alexander wrote:
> >>> Alex, do you know if anyone has tested amdgpu on an APU with SME
> >>> enabled? Is this considered something we support?
> >> It's not something we've tested.  I'm not even sure the GPU portion
> >> of APUs will work properly without an identity mapping.  SME should
> >> work properly with dGPUs however, so this is a proper fix for them.
> >> We don't use the IOMMUv2 path on dGPUs at all.
> > Is it possible to make the IOMMUv2 paths optional on iGPUs as well
> > when SME is active (or better, when the GPU is not identity mapped)?
> 
> Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not needed
> for graphics. And we're making it optional for KFD as well.
> 
> The question Alex and I raised here is more general. We may have some
> assumptions in the amdgpu driver that are broken when the framebuffer is
> not identity mapped. This would break the iGPU in a more general sense,
> regardless of KFD and IOMMUv2. In that case, we don't really need to worry
> about breaking KFD because we have a much bigger problem.

There are hw bugs on Raven and probably Carrizo/Stoney where they need 1:1 
mapping to avoid bugs in some corner cases with the displays.  Other GPUs 
should be fine.  The VIDs is 0x1002 and the DIDs are 0x15dd and 0x15d8 for 
raven variants and 0x9870, 0x9874, 0x9875, 0x9876, 0x9877 and 0x98e4 for 
carrizo and stoney.  As long as we preserve the 1:1 mapping for those asics, we 
should be fine.

Alex

> 
> Regards,
>   Felix
> 
> 
> >
> > Regards,
> >
> > Joerg

RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

2020-08-26 Thread Deucher, Alexander

[AMD Public Use]

 + Christian

> -Original Message-
> From: Kuehling, Felix 
> Sent: Wednesday, August 26, 2020 11:22 AM
> To: Deucher, Alexander ; Joerg Roedel
> ; io...@lists.linux-foundation.org; Huang, Ray
> 
> Cc: jroe...@suse.de; Lendacky, Thomas ;
> Suthikulpanit, Suravee ; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
> 
> [+Ray]
> 
> 
> Thanks for the heads up. Currently KFD won't work on APUs when IOMMUv2
> is disabled. But Ray is working on fallbacks that will allow KFD to work on
> APUs even without IOMMUv2, similar to our dGPUs. Along with changes in
> ROCm user mode, those fallbacks are necessary for making ROCm on APUs
> generally useful.
> 
> 
> How common is SME on typical PCs or laptops that would use AMD APUs?

I think the hw supports it, but it as far as I know it's not formally 
productized on client parts.

> 
> 
> Alex, do you know if anyone has tested amdgpu on an APU with SME
> enabled? Is this considered something we support?

It's not something we've tested.  I'm not even sure the GPU portion of APUs 
will work properly without an identity mapping.  SME should work properly with 
dGPUs however, so this is a proper fix for them.  We don't use the IOMMUv2 path 
on dGPUs at all.

Alex

> 
> 
> Thanks,
>   Felix
> 
> 
> Am 2020-08-26 um 10:14 a.m. schrieb Deucher, Alexander:
> >
> > [AMD Official Use Only - Internal Distribution Only]
> >
> >
> > + Felix
> > --
> > --
> > *From:* Joerg Roedel 
> > *Sent:* Monday, August 24, 2020 6:54 AM
> > *To:* io...@lists.linux-foundation.org
> > 
> > *Cc:* Joerg Roedel ; jroe...@suse.de
> > ; Lendacky, Thomas ;
> > Suthikulpanit, Suravee ; Deucher,
> > Alexander ; linux-kernel@vger.kernel.org
> > 
> > *Subject:* [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> > active
> >
> > From: Joerg Roedel 
> >
> > Hi,
> >
> > Some IOMMUv2 capable devices do not work correctly when SME is active,
> > because their DMA mask does not include the encryption bit, so that
> > they can not DMA to encrypted memory directly.
> >
> > The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
> > capable devices into an identity mapped domain. Fix that by not
> > forcing an identity mapped domain on devices when SME is active and
> > forbid using their IOMMUv2 functionality.
> >
> > Please review.
> >
> > Thanks,
> >
> >     Joerg
> >
> > Joerg Roedel (2):
> >   iommu/amd: Do not force direct mapping when SME is active
> >   iommu/amd: Do not use IOMMUv2 functionality when SME is active
> >
> >  drivers/iommu/amd/iommu.c    | 7 ++-
> >  drivers/iommu/amd/iommu_v2.c | 7 +++
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > --
> > 2.28.0
> >

RE: [PATCH 1/2] iommu/amd: Do not force direct mapping when SME is active

2020-08-26 Thread Deucher, Alexander

[AMD Public Use]

+ Felix, Christian

> -Original Message-
> From: Joerg Roedel 
> Sent: Monday, August 24, 2020 6:54 AM
> To: io...@lists.linux-foundation.org
> Cc: Joerg Roedel ; jroe...@suse.de; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; Deucher, Alexander
> ; linux-kernel@vger.kernel.org
> Subject: [PATCH 1/2] iommu/amd: Do not force direct mapping when SME is
> active
> 
> From: Joerg Roedel 
> 
> Do not force devices supporting IOMMUv2 to be direct mapped when
> memory encryption is active. This might cause them to be unusable because
> their DMA mask does not include the encryption bit.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd/iommu.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index ba9f3dbc5b94..77e4268e41cf 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2659,7 +2659,12 @@ static int amd_iommu_def_domain_type(struct
> device *dev)
>   if (!dev_data)
>   return 0;
> 
> - if (dev_data->iommu_v2)
> + /*
> +  * Do not identity map IOMMUv2 capable devices when memory
> encryption is
> +  * active, because some of those devices (AMD GPUs) don't have the
> +  * encryption bit in their DMA-mask and require remapping.
> +  */

I think on the integrated GPUs in APUs I'd prefer to have the identity mapping 
over SME, but I guess this is fine because you have to explicitly enable SME 
and if you do that you know what you are getting into.

Alex

> + if (!mem_encrypt_active() && dev_data->iommu_v2)
>   return IOMMU_DOMAIN_IDENTITY;
> 
>   return 0;
> --
> 2.28.0

RE: [PATCH 2/2] iommu/amd: Do not use IOMMUv2 functionality when SME is active

2020-08-26 Thread Deucher, Alexander

[AMD Public Use]

+ Felix, Christian

> -Original Message-
> From: Joerg Roedel 
> Sent: Monday, August 24, 2020 6:54 AM
> To: io...@lists.linux-foundation.org
> Cc: Joerg Roedel ; jroe...@suse.de; Lendacky, Thomas
> ; Suthikulpanit, Suravee
> ; Deucher, Alexander
> ; linux-kernel@vger.kernel.org
> Subject: [PATCH 2/2] iommu/amd: Do not use IOMMUv2 functionality when
> SME is active
> 
> From: Joerg Roedel 
> 
> When memory encryption is active the device is likely not in a direct mapped
> domain. Forbid using IOMMUv2 functionality for now until finer grained
> checks for this have been implemented.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/amd/iommu_v2.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/amd/iommu_v2.c
> b/drivers/iommu/amd/iommu_v2.c index c259108ab6dd..0d175aed1d92
> 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -737,6 +737,13 @@ int amd_iommu_init_device(struct pci_dev *pdev,
> int pasids)
> 
>   might_sleep();
> 
> + /*
> +  * When memory encryption is active the device is likely not in a
> +  * direct-mapped domain. Forbid using IOMMUv2 functionality for
> now.
> +  */
> + if (mem_encrypt_active())
> + return -ENODEV;
> +
>   if (!amd_iommu_v2_supported())
>   return -ENODEV;
> 
> --
> 2.28.0

RE: [PATCH] PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

2020-07-28 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Kai-Heng Feng 
> Sent: Tuesday, July 28, 2020 6:46 AM
> To: bhelg...@google.com
> Cc: Kai-Heng Feng ; Deucher, Alexander
> ; open list:PCI SUBSYSTEM  p...@vger.kernel.org>; open list 
> Subject: [PATCH] PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
> 
> We are seeing AMD Radeon Pro W5700 doesn't work when IOMMU is
> enabled:
> [3.375841] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> device=63:00.0 address=0x42b5b01a0]
> [3.375845] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> device=63:00.0 address=0x42b5b01c0]
> 
> The error also makes graphics driver fail to probe the device.
> 
> It appears to be the same issue as commit 5e89cd303e3a ("PCI: Mark AMD
> Navi14 GPU rev 0xc5 ATS as broken") addresses, and indeed the same ATS
> quirk can workaround the issue.
> 
> Bugzilla:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D208725data=02%7C01%7Cal
> exander.deucher%40amd.com%7Cbb49d8e71c29459d631a08d832e36d56%7
> C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637315299664339358&
> amp;sdata=SUAXEIoIJfgTm54FmgwUCMUI%2Bk2qWNcvSpvpU09Ak5k%3D&
> amp;reserved=0
> Cc: Alex Deucher 
> Signed-off-by: Kai-Heng Feng 

This was fixed in the vbios, but apparently that didn't make it out to everyone.
Acked-by: Alex Deucher 

> ---
>  drivers/pci/quirks.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> 812bfc32ecb8..052efeb9f053 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5192,7 +5192,8 @@
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422,
> quirk_no_ext_tags);
>   */
>  static void quirk_amd_harvest_no_ats(struct pci_dev *pdev)  {
> - if (pdev->device == 0x7340 && pdev->revision != 0xc5)
> + if ((pdev->device == 0x7312 && pdev->revision != 0x00) ||
> + (pdev->device == 0x7340 && pdev->revision != 0xc5))
>   return;
> 
>   pci_info(pdev, "disabling ATS\n");
> @@ -5203,6 +5204,8 @@ static void quirk_amd_harvest_no_ats(struct
> pci_dev *pdev)  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_amd_harvest_no_ats);
>  /* AMD Iceland dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> quirk_amd_harvest_no_ats);
> +/* AMD Navi10 dGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312,
> +quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340,
> quirk_amd_harvest_no_ats);  #endif /* CONFIG_PCI_ATS */
> --
> 2.17.1

RE: [BUG] "Pre-boot DMA Protection" makes AMDGPU stop working

2020-07-02 Thread Deucher, Alexander

[AMD Public Use]

> -Original Message-
> From: Kai-Heng Feng 
> Sent: Thursday, July 2, 2020 8:04 AM
> To: Joerg Roedel 
> Cc: Deucher, Alexander ;
> io...@lists.linux-foundation.org; open list 
> Subject: [BUG] "Pre-boot DMA Protection" makes AMDGPU stop working
> 
> Hi,
> 
> A more detailed bug report can be found at [1].
> 
> I have a AMD Renoir system that can't enter graphical session because there
> are many IOMMU splat.
> 
> Alex suggested to disable "Pre-boot DMA Protection", I can confirm once it's
> disabled, AMDGPU starts working with IOMMU enabled.
> So raise the issue here because I have no knowledge on how to reset the
> IOMMU.

+ Suravee

This is part of MS's Secure Core initiative.  We are investigating how to 
properly handle this properly on Linux.  Stay tuned.

Alex

> 
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1204data=02%7C01%7Calexander.deucher%40amd.com
> %7C60746a6fecf04a5e570908d81e8011c6%7C3dd8961fe4884e608e11a82d994
> e183d%7C0%7C0%7C637292882713301680sdata=r6cj19Vc8N0%2FSmsb
> CAJva%2BabMD2b5r2lvPLIxZSacoY%3Dreserved=0
> 
> Kai-Heng

RE: clean up kernel_{read,write} & friends v2

2020-05-28 Thread Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

> -Original Message-
> From: Al Viro  On Behalf Of Al Viro
> Sent: Thursday, May 28, 2020 4:06 PM
> To: Matthew Wilcox 
> Cc: Joe Perches ; Linus Torvalds  foundation.org>; Christoph Hellwig ; Ian Kent
> ; David Howells ; Linux
> Kernel Mailing List ; linux-fsdevel  fsde...@vger.kernel.org>; LSM List  mod...@vger.kernel.org>; NetFilter ;
> Deucher, Alexander ; David Airlie
> 
> Subject: Re: clean up kernel_{read,write} & friends v2
> 
> On Thu, May 28, 2020 at 12:44:41PM -0700, Matthew Wilcox wrote:
> > On Thu, May 28, 2020 at 08:33:40PM +0100, Al Viro wrote:
> > > On Thu, May 28, 2020 at 12:22:08PM -0700, Joe Perches wrote:
> > >
> > > > Hard limits at 80 really don't work well, especially with some of
> > > > the 25+ character length identifiers used today.
> > >
> > > IMO any such identifier is a good reason for a warning.
> > >
> > > The litmus test is actually very simple: how unpleasant would it be
> > > to mention the identifiers while discussing the code over the phone?
> >
> > Here's a good example of a function which should be taken out and shot:
> >
> > int
> amdgpu_atombios_get_leakage_vddc_based_on_leakage_params(struct
> > amdgpu_devic e *adev, ...
> > switch (frev) {
> > case 2:
> > switch (crev) {
> > ...
> > if (profile->ucElbVDDC_Num > 0) {
> > for (i = 0; i < profile->ucElbVDDC_Num; 
> > i++) {
> > if (vddc_id_buf[i] == 
> > virtual_voltage_id) {
> > for (j = 0; j < 
> > profile->ucLeakageBinNum; j++) {
> > if 
> > (vbios_voltage_id <= leakage_bin[j]) {
> > *vddc
> > = vddc_buf[j * profile->ucElbVDDC_Num + i];
> >
> > I mean, I get it that AMD want to show off just how studly a monitor
> > they support, but good grief ...
> 
> I must respectfully disagree.  It clearly needs to be hanged, drawn and
> quartered...

You found it necessary to add me to a random thread with no context to complain 
about the coding style?  How about sending a patch to clean it up if you find 
it that unsavory.

Alex

RE: [PATCH 1/7] ASoC: amd: No need PCI-MSI interrupts

2019-10-17 Thread Deucher, Alexander

> -Original Message-
> From: RAVULAPATI, VISHNU VARDHAN RAO
> 
> Sent: Thursday, October 17, 2019 5:33 AM
> To: Mark Brown ; Deucher, Alexander
> 
> Cc: RAVULAPATI, VISHNU VARDHAN RAO
> ; Liam Girdwood
> ; Jaroslav Kysela ; Takashi Iwai
> ; Mukunda, Vijendar ;
> Maruthi Srinivas Bayyavarapu ; Colin Ian
> King ; Dan Carpenter
> ; moderated list:SOUND - SOC LAYER /
> DYNAMIC AUDIO POWER MANAGEM... ; open
> list 
> Subject: Re: [PATCH 1/7] ASoC: amd: No need PCI-MSI interrupts
> 
> 
> 
> On 11/10/19 3:03 AM, vishnu wrote:
> > Hi,
> > Please find my inline comments.
> >
> > Thanks,
> > Vishnu
> >
> > On 01/10/19 10:59 PM, Mark Brown wrote:
> >> On Tue, Oct 01, 2019 at 05:23:43PM +, Deucher, Alexander wrote:
> >>
> >>>> ACP-PCI controller driver does not depends msi interrupts.
> >>>> So removed msi related pci functions which have no use and does not
> >>>> impact on existing functionality.
> >>
> >>> In general, however, aren't MSIs preferred to legacy interrupts?
> >>
> >> As I understand it.  Or at the very least I'm not aware of any
> >> situation where they're harmful.  It'd be good to have a clear
> >> explanation of why we're removing the support.
> >
> > Actually our device is audio device and it does not depends on MSI`s.
> > So we thought to remove it as it has no purpose or meaning to have
> > this code in our audio based ACP-PCI driver.
> >
> >>> Doesn't the driver have to opt into MSI support?  As such, won't
> >>> removing this code effectively disable MSI support?
> >>
> >> Yes.
> >
> >
> 
> Hi Mark,
> 
> Any updates on this patch.

You are removing functionality from the driver with no rational as to why it's 
necessary.  What's the point of this patch?  Does it fix a particular issue?  
If not, I suggest just dropping it.  The hw supports MSIs, why not use them?

Alex

RE: [PATCH 2/7] ASoC: amd: Registering device endpoints using MFD framework

2019-10-02 Thread Deucher, Alexander

> -Original Message-
> From: Lee Jones 
> Sent: Wednesday, October 2, 2019 8:38 AM
> To: Deucher, Alexander 
> Cc: RAVULAPATI, VISHNU VARDHAN RAO
> ; Liam Girdwood
> ; Mark Brown ; Jaroslav
> Kysela ; Takashi Iwai ; Mukunda,
> Vijendar ; Maruthi Srinivas Bayyavarapu
> ; Mehta, Sanju
> ; Colin Ian King ; Dan
> Carpenter ; moderated list:SOUND - SOC LAYER
> / DYNAMIC AUDIO POWER MANAGEM... ;
> open list 
> Subject: Re: [PATCH 2/7] ASoC: amd: Registering device endpoints using MFD
> framework
> 
> On Tue, 01 Oct 2019, Deucher, Alexander wrote:
> 
> > > -Original Message-
> > > From: Lee Jones 
> > > Sent: Tuesday, October 1, 2019 8:00 AM
> > > To: RAVULAPATI, VISHNU VARDHAN RAO
> > > 
> > > Cc: RAVULAPATI, VISHNU VARDHAN RAO
> > > ; Deucher, Alexander
> > > ; Liam Girdwood
> ;
> > > Mark Brown ; Jaroslav Kysela ;
> > > Takashi Iwai ; Mukunda, Vijendar
> > > ; Maruthi Srinivas Bayyavarapu
> > > ; Mehta, Sanju
> ;
> > > Colin Ian King ; Dan Carpenter
> > > ; moderated list:SOUND - SOC LAYER /
> > > DYNAMIC AUDIO POWER MANAGEM... ;
> open
> > > list 
> > > Subject: Re: [PATCH 2/7] ASoC: amd: Registering device endpoints
> > > using MFD framework
> > >
> > > On Tue, 01 Oct 2019, vishnu wrote:
> > >
> > > > Hi Jones,
> > > >
> > > > I am very Thankful to your review comments.
> > > >
> > > > Actually The driver is not totally based on MFD. It just uses
> > > > mfd_add_hotplug_devices() and mfd_remove_devices() for adding
> the
> > > > devices automatically.
> > > >
> > > > Remaining code has nothing to do with MFD framework.
> > > >
> > > > So I thought It would not break the coding style and moved ahead
> > > > by using the MFD API by adding its header file.
> > > >
> > > > If it is any violation of coding standard then I can move it to
> > > > drivers/mfd.
> > > >
> > > > This patch could be a show stopper for us.Please suggest us how
> > > > can we move ahead ASAP.
> > >
> > > Either move the MFD parts to drivers/mfd, or stop using the MFD API.
> >
> > There are more drivers outside of drivers/mfd using this API than
> > drivers in drivers/mfd.
> 
> People do wrong things all the time.  It doesn't make them right.
> 
> > In a lot of cases it doesn't make sense to move the driver to drivers/mfd.
> 
> In those cases, the platform_device_*() API should be used.

Why do we have both?  It's not clear to me on when we should use one vs the 
other.  These are not platforms per se, they are PCI devices that happen to 
have other devices on them.  On previous projects, I was told to use mfd and no 
objections were raised at that time.

Alex

RE: [PATCH 2/7] ASoC: amd: Registering device endpoints using MFD framework

2019-10-01 Thread Deucher, Alexander

> -Original Message-
> From: Lee Jones 
> Sent: Tuesday, October 1, 2019 8:00 AM
> To: RAVULAPATI, VISHNU VARDHAN RAO
> 
> Cc: RAVULAPATI, VISHNU VARDHAN RAO
> ; Deucher, Alexander
> ; Liam Girdwood ;
> Mark Brown ; Jaroslav Kysela ;
> Takashi Iwai ; Mukunda, Vijendar
> ; Maruthi Srinivas Bayyavarapu
> ; Mehta, Sanju
> ; Colin Ian King ; Dan
> Carpenter ; moderated list:SOUND - SOC LAYER
> / DYNAMIC AUDIO POWER MANAGEM... ;
> open list 
> Subject: Re: [PATCH 2/7] ASoC: amd: Registering device endpoints using MFD
> framework
> 
> On Tue, 01 Oct 2019, vishnu wrote:
> 
> > Hi Jones,
> >
> > I am very Thankful to your review comments.
> >
> > Actually The driver is not totally based on MFD. It just uses
> > mfd_add_hotplug_devices() and mfd_remove_devices() for adding the
> > devices automatically.
> >
> > Remaining code has nothing to do with MFD framework.
> >
> > So I thought It would not break the coding style and moved ahead by
> > using the MFD API by adding its header file.
> >
> > If it is any violation of coding standard then I can move it to
> > drivers/mfd.
> >
> > This patch could be a show stopper for us.Please suggest us how can we
> > move ahead ASAP.
> 
> Either move the MFD parts to drivers/mfd, or stop using the MFD API.

There are more drivers outside of drivers/mfd using this API than drivers in 
drivers/mfd.  In a lot of cases it doesn't make sense to move the driver to 
drivers/mfd.

Alex

> 
> --
> Lee Jones [李琼斯]
> Linaro Services Technical Lead
> Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook |
> Twitter | Blog

RE: [PATCH 1/7] ASoC: amd: No need PCI-MSI interrupts

2019-10-01 Thread Deucher, Alexander

> -Original Message-
> From: Ravulapati Vishnu vardhan rao
> 
> Sent: Monday, September 30, 2019 8:58 PM
> Cc: Deucher, Alexander ; RAVULAPATI,
> VISHNU VARDHAN RAO ; Liam
> Girdwood ; Mark Brown ;
> Jaroslav Kysela ; Takashi Iwai ;
> Mukunda, Vijendar ; Maruthi Srinivas
> Bayyavarapu ; Deucher, Alexander
> ; Colin Ian King
> ; Dan Carpenter ;
> moderated list:SOUND - SOC LAYER / DYNAMIC AUDIO POWER MANAGEM...
> ; open list 
> Subject: [PATCH 1/7] ASoC: amd: No need PCI-MSI interrupts
> 
> ACP-PCI controller driver does not depends msi interrupts.
> So removed msi related pci functions which have no use and does not impact
> on existing functionality.

In general, however, aren't MSIs preferred to legacy interrupts?  Doesn't the 
driver have to opt into MSI support?  As such, won't removing this code 
effectively disable MSI support?

Alex

> 
> Signed-off-by: Ravulapati Vishnu vardhan rao
> 
> ---
>  sound/soc/amd/raven/pci-acp3x.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/sound/soc/amd/raven/pci-acp3x.c b/sound/soc/amd/raven/pci-
> acp3x.c index facec24..8f6bf00 100644
> --- a/sound/soc/amd/raven/pci-acp3x.c
> +++ b/sound/soc/amd/raven/pci-acp3x.c
> @@ -46,14 +46,7 @@ static int snd_acp3x_probe(struct pci_dev *pci,
>   goto release_regions;
>   }
> 
> - /* check for msi interrupt support */
> - ret = pci_enable_msi(pci);
> - if (ret)
> - /* msi is not enabled */
> - irqflags = IRQF_SHARED;
> - else
> - /* msi is enabled */
> - irqflags = 0;
> + irqflags = 0;
> 
>   addr = pci_resource_start(pci, 0);
>   adata->acp3x_base = ioremap(addr, pci_resource_len(pci, 0)); @@ -
> 112,7 +105,6 @@ static int snd_acp3x_probe(struct pci_dev *pci,
>   return 0;
> 
>  unmap_mmio:
> - pci_disable_msi(pci);
>   iounmap(adata->acp3x_base);
>  release_regions:
>   pci_release_regions(pci);
> @@ -129,7 +121,6 @@ static void snd_acp3x_remove(struct pci_dev *pci)
>   platform_device_unregister(adata->pdev);
>   iounmap(adata->acp3x_base);
> 
> - pci_disable_msi(pci);
>   pci_release_regions(pci);
>   pci_disable_device(pci);
>  }
> --
> 2.7.4

RE: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon R7 GPUs

2019-04-10 Thread Deucher, Alexander

> -Original Message-
> From: Deucher, Alexander
> Sent: Wednesday, April 10, 2019 10:47 AM
> To: Bjorn Helgaas ; Nikolai Kostrigin
> ; Suthikulpanit, Suravee
> (suravee.suthikulpa...@amd.com) ;
> Lendacky, Thomas ; Kuehling, Felix
> (felix.kuehl...@amd.com) ; Koenig, Christian
> (christian.koe...@amd.com) 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> jroe...@suse.de
> Subject: RE: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon
> R7 GPUs
> 
> > -Original Message-
> > From: Bjorn Helgaas 
> > Sent: Tuesday, April 9, 2019 5:59 PM
> > To: Nikolai Kostrigin 
> > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > jroe...@suse.de; Deucher, Alexander 
> > Subject: Re: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD
> > Radeon
> > R7 GPUs
> >
> > [+cc Alex]
> >
> > This claims to be a resend, but I don't see a previous posting.
> >
> > There *was* discussion when the quirk was added two years ago for a
> > different device.  As part of that, Alex thought only that device
> > would be affected and ATS was validated on other GPUs:
> >
> >
> >
> https://lore.kernel.org/lkml/BN6PR12MB165278346BE8A76B1E4412AFF7EA0
> > @BN6PR12MB1652.namprd12.prod.outlook.com/
> >
> > On Mon, Apr 08, 2019 at 01:37:25PM +0300, Nikolai Kostrigin wrote:
> > > ATS is broken on this hardware (at least for Stoney Ridge based
> > > laptop) and causes IOMMU stalls and system failure. Disable ATS on
> > > these devices to make them usable again with IOMMU enabled Thanks
> to
> > > Joerg Roedel  for help.
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=194521
> > >
> 
> + a few AMD people
> 
> Seeing this bug makes it more clear.  I don't think this is a problem with the
> GPU.  I think it's a problem with either the sbios or iommu.  I think the 
> original
> quirk added for stoney (0x98e4) is probably wrong as well.  I suspect we
> need a quirk for a particular laptop or sbios versions.  We validated ATS
> extensively with Carrizo based systems (the system in the bug report above
> is Carrizo based) since it is the basis of our ROCm support on APUs.  We have
> also been involved in tons of Linux OEM preloads with both Carrizo and
> Stoney based APUs in combination with TOPAZ dGPUs (0x6900) and haven't
> seen this issue in those programs.  We also have TOPAZ dGPUs used in OEM
> programs with Intel chipsets and haven't seen the issue.  I suspect since
> windows does not use the IOMMU by default, the sbios settings may not be
> well validated on certain windows only skus.  I'd rather make these DMI
> matches or something like that for the platform or at the very least match
> the SSIDs as well.

Reading through these bugs again it seems to be an issue with Stoney APUs, not 
the dGPU specifically.  I think it would be better to disable ATS in general if 
a stoney based platform was detected rather than adding ATS quirks for devices 
then someone may put in a Stoney based platform.  It also seems to be related 
to runtime pm on the dGPU.  Disabling runtime pm also seem to fix the issue.  
On these systems runtime pm for the dGPU is controlled via ACPI (either ATPX or 
_PR3 depending on the platform).  Maybe something doesn't get restored properly 
on runtime resume which cases the ATS issues?

Alex

> 
> Alex
> 
> > > Signed-off-by: Nikolai Kostrigin 
> >
> > Joerg, I'm happy to merge this if you would review or ack it.  I don't
> > know enough to conclude that this is the root cause.  It'd be nice to
> > have an actual AMD erratum.  Maybe it would even have a list of
> > affected devices so we could get them all at once so people wouldn't
> > have to trip over them one by one.
> >
> > > ---
> > >  drivers/pci/quirks.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > 4700d24e5d55..abb2532e16bf 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4876,6 +4876,7 @@ static void quirk_no_ats(struct pci_dev *pdev)
> > >
> > >  /* AMD Stoney platform GPU */
> > >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_no_ats);
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> > quirk_no_ats);
> > >  #endif /* CONFIG_PCI_ATS */
> > >
> > >  /* Freescale PCIe doesn't support MSI in RC mode */
> > > --
> > > 2.21.0
> > >

RE: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon R7 GPUs

2019-04-10 Thread Deucher, Alexander

> -Original Message-
> From: Bjorn Helgaas 
> Sent: Tuesday, April 9, 2019 5:59 PM
> To: Nikolai Kostrigin 
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> jroe...@suse.de; Deucher, Alexander 
> Subject: Re: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon
> R7 GPUs
> 
> [+cc Alex]
> 
> This claims to be a resend, but I don't see a previous posting.
> 
> There *was* discussion when the quirk was added two years ago for a
> different device.  As part of that, Alex thought only that device would be
> affected and ATS was validated on other GPUs:
> 
> 
> https://lore.kernel.org/lkml/BN6PR12MB165278346BE8A76B1E4412AFF7EA0
> @BN6PR12MB1652.namprd12.prod.outlook.com/
> 
> On Mon, Apr 08, 2019 at 01:37:25PM +0300, Nikolai Kostrigin wrote:
> > ATS is broken on this hardware (at least for Stoney Ridge based
> > laptop) and causes IOMMU stalls and system failure. Disable ATS on
> > these devices to make them usable again with IOMMU enabled Thanks to
> > Joerg Roedel  for help.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=194521
> >

+ a few AMD people

Seeing this bug makes it more clear.  I don't think this is a problem with the 
GPU.  I think it's a problem with either the sbios or iommu.  I think the 
original quirk added for stoney (0x98e4) is probably wrong as well.  I suspect 
we need a quirk for a particular laptop or sbios versions.  We validated ATS 
extensively with Carrizo based systems (the system in the bug report above is 
Carrizo based) since it is the basis of our ROCm support on APUs.  We have also 
been involved in tons of Linux OEM preloads with both Carrizo and Stoney based 
APUs in combination with TOPAZ dGPUs (0x6900) and haven't seen this issue in 
those programs.  We also have TOPAZ dGPUs used in OEM programs with Intel 
chipsets and haven't seen the issue.  I suspect since windows does not use the 
IOMMU by default, the sbios settings may not be well validated on certain 
windows only skus.  I'd rather make these DMI matches or something like that 
for the platform or at the very least match the SSIDs as well.

Alex

> > Signed-off-by: Nikolai Kostrigin 
> 
> Joerg, I'm happy to merge this if you would review or ack it.  I don't know
> enough to conclude that this is the root cause.  It'd be nice to have an 
> actual
> AMD erratum.  Maybe it would even have a list of affected devices so we
> could get them all at once so people wouldn't have to trip over them one by
> one.
> 
> > ---
> >  drivers/pci/quirks.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 4700d24e5d55..abb2532e16bf 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -4876,6 +4876,7 @@ static void quirk_no_ats(struct pci_dev *pdev)
> >
> >  /* AMD Stoney platform GPU */
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> quirk_no_ats);
> >  #endif /* CONFIG_PCI_ATS */
> >
> >  /* Freescale PCIe doesn't support MSI in RC mode */
> > --
> > 2.21.0
> >

RE: [PATCH 5.0 072/246] drm/amd/display: Fix reference counting for struct dc_sink.

2019-04-05 Thread Deucher, Alexander

> -Original Message-
> From: Mathias Fröhlich 
> Sent: Friday, April 5, 2019 1:13 AM
> To: Greg Kroah-Hartman 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Li, Sun peng (Leo)
> ; Deucher, Alexander
> ; Sasha Levin 
> Subject: Re: [PATCH 5.0 072/246] drm/amd/display: Fix reference counting
> for struct dc_sink.
> 
> Greg,
> 
> as I mentioned in the commit message, I saw more fixes to that area in Alex
> Deuchers queue when I fed that to Alex. There is one fix that I can think of
> that interacts with my fixes. Means, we may get unwanted side effects of my
> patch without the fix mentioned below. With that below patch also selected,
> I think we should be ok for stable.
> Alex, AMD people, your opinion?

I'm not sure.  I haven't gone through to figure out what combinations of all of 
these patches are required for each stable kernel.  The combinatorics are too 
much which is why I only cc stable on certain patches.  Harry or Leo may have 
be able to comment however.

Alex

> 
> The one that I can spot not already in linux-5.0.y is:
> 
> commit 3f01f098a4e2ef30ef628497c43a3d568e720376
> Author: Jerry (Fangzhi) Zuo 
> Date:   Thu Jan 24 11:46:49 2019 -0500
> 
> drm/amd/display: Clear dc_sink after it gets released
> 
> [Why]
> The dc_sink was released but the pointer on the aconnector was
> not cleared.
> 
> [How]
> Clear it.
> 
> best
> 
> Mathias
> 
> 
> On Thursday, 4 April 2019 10:46:12 CEST Greg Kroah-Hartman wrote:
> > 5.0-stable review patch.  If anyone has any objections, please let me know.
> >
> > --
> >
> > [ Upstream commit dcd5fb82ffb484124203aa339733663ac0b059f3 ]
> >
> > Reference counting in amdgpu_dm_connector for
> > amdgpu_dm_connector::dc_sink and
> amdgpu_dm_connector::dc_em_sink as
> > well as in dc_link::local_sink seems to be out of shape. Thus make
> > reference counting consistent for these members and just plain
> > increment the reference count when the variable gets assigned and
> decrement when the pointer is set to zero or replaced.
> > Also simplify reference counting in selected function sopes to be sure
> > the reference is released in any case. In some cases add NULL pointer
> > check before dereferencing.
> > At a hand full of places a comment is placed to stat that the
> > reference increment happened already somewhere else.
> >
> > This actually fixes the following kernel bug on my system when
> > enabling display core in amdgpu. There are some more similar bug
> > reports around, so it probably helps at more places.
> >
> >kernel BUG at mm/slub.c:294!
> >invalid opcode:  [#1] SMP PTI
> >CPU: 9 PID: 1180 Comm: Xorg Not tainted 5.0.0-rc1+ #2
> >Hardware name: Supermicro X10DAi/X10DAI, BIOS 3.0a 02/05/2018
> >RIP: 0010:__slab_free+0x1e2/0x3d0
> >Code: 8b 54 24 30 48 89 4c 24 28 e8 da fb ff ff 4c 8b 54 24 28 85 c0 0f 
> > 85 67 fe
> ff ff 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 49 3b 5c 24 28 75 
> ab 48
> 8b 44 24 30 49 89 4c 24 28 49 89 44
> >RSP: 0018:b0978589fa90 EFLAGS: 00010246
> >RAX: 92f12806c400 RBX: 80200019 RCX: 92f12806c400
> >RDX: 92f12806c400 RSI: dd6421a01a00 RDI: 92ed2f406e80
> >RBP: b0978589fb40 R08: 0001 R09: c0ee4748
> >R10: 92f12806c400 R11: 0001 R12: dd6421a01a00
> >R13: 92f12806c400 R14: 92ed2f406e80 R15: dd6421a01a20
> >FS:  7f4170be0ac0() GS:92ed2fb4()
> knlGS:
> >CS:  0010 DS:  ES:  CR0: 80050033
> >CR2: 562818aaa000 CR3: 00045745a002 CR4: 003606e0
> >DR0:  DR1:  DR2: 
> >DR3:  DR6: fffe0ff0 DR7: 0400
> >Call Trace:
> > ? drm_dbg+0x87/0x90 [drm]
> > dc_stream_release+0x28/0x50 [amdgpu]
> > amdgpu_dm_connector_mode_valid+0xb4/0x1f0 [amdgpu]
> > drm_helper_probe_single_connector_modes+0x492/0x6b0
> [drm_kms_helper]
> > drm_mode_getconnector+0x457/0x490 [drm]
> > ? drm_connector_property_set_ioctl+0x60/0x60 [drm]
> > drm_ioctl_kernel+0xa9/0xf0 [drm]
> > drm_ioctl+0x201/0x3a0 [drm]
> > ? drm_connector_property_set_ioctl+0x60/0x60 [drm]
> > amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> > do_vfs_ioctl+0xa4/0x630
> > ? __sys_recvmsg+0x83/0xa0
> > ksys_ioctl+0x60/0x90
> > __x64_sys_ioctl+0x16/0x20
> > do_syscall_64+0x5b/0x160
> > entry_SYSCAL

RE: [PATCH 00/10] HMM updates for 5.1

2019-03-19 Thread Deucher, Alexander

> -Original Message-
> From: Jerome Glisse 
> Sent: Tuesday, March 19, 2019 12:58 PM
> To: Andrew Morton 
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Kuehling, Felix
> ; Koenig, Christian
> ; Ralph Campbell ;
> John Hubbard ; Jason Gunthorpe
> ; Dan Williams ; Deucher,
> Alexander 
> Subject: Re: [PATCH 00/10] HMM updates for 5.1
> 
> On Tue, Mar 19, 2019 at 09:40:07AM -0700, Andrew Morton wrote:
> > On Mon, 18 Mar 2019 13:04:04 -0400 Jerome Glisse 
> wrote:
> >
> > > On Wed, Mar 13, 2019 at 09:10:04AM -0700, Andrew Morton wrote:
> > > > On Tue, 12 Mar 2019 21:27:06 -0400 Jerome Glisse
>  wrote:
> > > >
> > > > > Andrew you will not be pushing this patchset in 5.1 ?
> > > >
> > > > I'd like to.  It sounds like we're converging on a plan.
> > > >
> > > > It would be good to hear more from the driver developers who will
> > > > be consuming these new features - links to patchsets, review
> > > > feedback, etc.  Which individuals should we be asking?  Felix,
> > > > Christian and Jason, perhaps?
> > > >
> > >
> > > So i am guessing you will not send this to Linus ?
> >
> > I was waiting to see how the discussion proceeds.  Was also expecting
> > various changelog updates (at least) - more acks from driver
> > developers, additional pointers to client driver patchsets,
> > description of their readiness, etc.
> 
> nouveau will benefit from this patchset and is already upstream in 5.1 so i am
> not sure what kind of pointer i can give for that, it is already there. amdgpu
> will also benefit from it and is queue up AFAICT. ODP RDMA is the third driver
> and i gave link to the patch that also use the 2 new functions that this
> patchset introduce. Do you want more ?
> 
> I guess i will repost with updated ack as Felix, Jason and few others told me
> they were fine with it.
> 
> >
> > Today I discover that Alex has cherrypicked "mm/hmm: use reference
> > counting for HMM struct" into a tree which is fed into linux-next
> > which rather messes things up from my end and makes it hard to feed a
> > (possibly modified version of) that into Linus.
> 
> :( i did not know the tree they pull that in was fed into next. I will 
> discourage
> them from doing so going forward.
> 

I can drop it.  I included it because it fixes an issue with HMM as used by 
amdgpu in our current -next tree.  So users testing my drm-next branch will run 
into the issue without it.  I don't plan to include it the actual -next PR.  
What is the recommended way to deal with this?

Alex

> > So I think I'll throw up my hands, drop them all and shall await
> > developments :(
> 
> What more do you want to see ? I can repost with the ack already given and
> the improve commit wording on some of the patch. But from user point of
> view nouveau is already upstream, ODP RDMA depends on this patchset and
> is posted and i have given link to it. amdgpu is queue up. What more do i
> need ?
> 
> Cheers,
> Jérôme

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-09 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Tuesday, October 9, 2018 8:45 AM
> To: Deucher, Alexander 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 06:01:50PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: Greg Kroah-Hartman 
> > > Sent: Monday, October 8, 2018 1:11 PM
> > > To: Deucher, Alexander 
> > > Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland,
> > > Harry ; Zhu, Rex ;
> Sasha
> > > Levin 
> > > Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values
> > > to DC for smu7/8
> > >
> > > On Mon, Oct 08, 2018 at 04:02:19PM +, Deucher, Alexander wrote:
> > > > > -Original Message-
> > > > > From: Greg Kroah-Hartman 
> > > > > Sent: Monday, October 8, 2018 10:44 AM
> > > > > To: Deucher, Alexander 
> > > > > Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org;
> > > > > Wentland, Harry ; Zhu, Rex
> > > > > ;
> > > Sasha
> > > > > Levin 
> > > > > Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock
> > > > > values to DC for smu7/8
> > > > >
> > > > > On Mon, Oct 08, 2018 at 02:33:56PM +0000, Deucher, Alexander
> wrote:
> > > > > > > -Original Message-
> > > > > > > From: Greg Kroah-Hartman 
> > > > > > > Sent: Monday, September 24, 2018 7:53 AM
> > > > > > > To: linux-kernel@vger.kernel.org
> > > > > > > Cc: Greg Kroah-Hartman ;
> > > > > > > sta...@vger.kernel.org; Wentland, Harry
> > > > > > > ; Deucher, Alexander
> > > > > > > ; Zhu, Rex
> ;
> > > Sasha
> > > > > > > Levin 
> > > > > > > Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock
> > > > > > > values to DC for
> > > > > > > smu7/8
> > > > > > >
> > > > > > > 4.18-stable review patch.  If anyone has any objections,
> > > > > > > please let me
> > > > > know.
> > > > > > >
> > > > > >
> > > > > > This regresses power usage on 4.18.  Please revert.
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=201275
> > > > >
> > > > > Is it reverted in "upstream" as well?  If not, please get the
> > > > > fix in there and then I will be glad to backport it here.
> > > > >
> > > >
> > > > There's no breakage in upstream.  This patch addressed breakages
> > > > in
> > > > 4.19 specifically due to some other refactoring we did in the driver.
> > > > I'll try and dig out the exact series of patches this addressed.
> > >
> > > So there is no problem in 4.19-rc7?  That contridicts the statement
> > > of looking in drm-next for the fixes.
> >
> > Sorry, what statement about drm-next?  This patch was for 4.19 and was
> not intended for 4.18.  It was picked up by Sasha's auto select system for
> stable.
> 
> I thought this thread said that.
> 
> Ok, so I should just revert it?

Yes, please.

Thanks,

Alex

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-09 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Tuesday, October 9, 2018 8:45 AM
> To: Deucher, Alexander 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 06:01:50PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: Greg Kroah-Hartman 
> > > Sent: Monday, October 8, 2018 1:11 PM
> > > To: Deucher, Alexander 
> > > Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland,
> > > Harry ; Zhu, Rex ;
> Sasha
> > > Levin 
> > > Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values
> > > to DC for smu7/8
> > >
> > > On Mon, Oct 08, 2018 at 04:02:19PM +, Deucher, Alexander wrote:
> > > > > -Original Message-
> > > > > From: Greg Kroah-Hartman 
> > > > > Sent: Monday, October 8, 2018 10:44 AM
> > > > > To: Deucher, Alexander 
> > > > > Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org;
> > > > > Wentland, Harry ; Zhu, Rex
> > > > > ;
> > > Sasha
> > > > > Levin 
> > > > > Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock
> > > > > values to DC for smu7/8
> > > > >
> > > > > On Mon, Oct 08, 2018 at 02:33:56PM +0000, Deucher, Alexander
> wrote:
> > > > > > > -Original Message-
> > > > > > > From: Greg Kroah-Hartman 
> > > > > > > Sent: Monday, September 24, 2018 7:53 AM
> > > > > > > To: linux-kernel@vger.kernel.org
> > > > > > > Cc: Greg Kroah-Hartman ;
> > > > > > > sta...@vger.kernel.org; Wentland, Harry
> > > > > > > ; Deucher, Alexander
> > > > > > > ; Zhu, Rex
> ;
> > > Sasha
> > > > > > > Levin 
> > > > > > > Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock
> > > > > > > values to DC for
> > > > > > > smu7/8
> > > > > > >
> > > > > > > 4.18-stable review patch.  If anyone has any objections,
> > > > > > > please let me
> > > > > know.
> > > > > > >
> > > > > >
> > > > > > This regresses power usage on 4.18.  Please revert.
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=201275
> > > > >
> > > > > Is it reverted in "upstream" as well?  If not, please get the
> > > > > fix in there and then I will be glad to backport it here.
> > > > >
> > > >
> > > > There's no breakage in upstream.  This patch addressed breakages
> > > > in
> > > > 4.19 specifically due to some other refactoring we did in the driver.
> > > > I'll try and dig out the exact series of patches this addressed.
> > >
> > > So there is no problem in 4.19-rc7?  That contridicts the statement
> > > of looking in drm-next for the fixes.
> >
> > Sorry, what statement about drm-next?  This patch was for 4.19 and was
> not intended for 4.18.  It was picked up by Sasha's auto select system for
> stable.
> 
> I thought this thread said that.
> 
> Ok, so I should just revert it?

Yes, please.

Thanks,

Alex

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Monday, October 8, 2018 1:11 PM
> To: Deucher, Alexander 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 04:02:19PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: Greg Kroah-Hartman 
> > > Sent: Monday, October 8, 2018 10:44 AM
> > > To: Deucher, Alexander 
> > > Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland,
> > > Harry ; Zhu, Rex ;
> Sasha
> > > Levin 
> > > Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values
> > > to DC for smu7/8
> > >
> > > On Mon, Oct 08, 2018 at 02:33:56PM +, Deucher, Alexander wrote:
> > > > > -Original Message-
> > > > > From: Greg Kroah-Hartman 
> > > > > Sent: Monday, September 24, 2018 7:53 AM
> > > > > To: linux-kernel@vger.kernel.org
> > > > > Cc: Greg Kroah-Hartman ;
> > > > > sta...@vger.kernel.org; Wentland, Harry
> > > > > ; Deucher, Alexander
> > > > > ; Zhu, Rex ;
> Sasha
> > > > > Levin 
> > > > > Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values
> > > > > to DC for
> > > > > smu7/8
> > > > >
> > > > > 4.18-stable review patch.  If anyone has any objections, please
> > > > > let me
> > > know.
> > > > >
> > > >
> > > > This regresses power usage on 4.18.  Please revert.
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=201275
> > >
> > > Is it reverted in "upstream" as well?  If not, please get the fix in
> > > there and then I will be glad to backport it here.
> > >
> >
> > There's no breakage in upstream.  This patch addressed breakages in
> > 4.19 specifically due to some other refactoring we did in the driver.
> > I'll try and dig out the exact series of patches this addressed.
> 
> So there is no problem in 4.19-rc7?  That contridicts the statement of looking
> in drm-next for the fixes.

Sorry, what statement about drm-next?  This patch was for 4.19 and was not 
intended for 4.18.  It was picked up by Sasha's auto select system for stable.

Alex

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Monday, October 8, 2018 1:11 PM
> To: Deucher, Alexander 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 04:02:19PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: Greg Kroah-Hartman 
> > > Sent: Monday, October 8, 2018 10:44 AM
> > > To: Deucher, Alexander 
> > > Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland,
> > > Harry ; Zhu, Rex ;
> Sasha
> > > Levin 
> > > Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values
> > > to DC for smu7/8
> > >
> > > On Mon, Oct 08, 2018 at 02:33:56PM +, Deucher, Alexander wrote:
> > > > > -Original Message-
> > > > > From: Greg Kroah-Hartman 
> > > > > Sent: Monday, September 24, 2018 7:53 AM
> > > > > To: linux-kernel@vger.kernel.org
> > > > > Cc: Greg Kroah-Hartman ;
> > > > > sta...@vger.kernel.org; Wentland, Harry
> > > > > ; Deucher, Alexander
> > > > > ; Zhu, Rex ;
> Sasha
> > > > > Levin 
> > > > > Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values
> > > > > to DC for
> > > > > smu7/8
> > > > >
> > > > > 4.18-stable review patch.  If anyone has any objections, please
> > > > > let me
> > > know.
> > > > >
> > > >
> > > > This regresses power usage on 4.18.  Please revert.
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=201275
> > >
> > > Is it reverted in "upstream" as well?  If not, please get the fix in
> > > there and then I will be glad to backport it here.
> > >
> >
> > There's no breakage in upstream.  This patch addressed breakages in
> > 4.19 specifically due to some other refactoring we did in the driver.
> > I'll try and dig out the exact series of patches this addressed.
> 
> So there is no problem in 4.19-rc7?  That contridicts the statement of looking
> in drm-next for the fixes.

Sorry, what statement about drm-next?  This patch was for 4.19 and was not 
intended for 4.18.  It was picked up by Sasha's auto select system for stable.

Alex

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Sasha Levin 
> Sent: Monday, October 8, 2018 10:49 AM
> To: Deucher, Alexander 
> Cc: Greg Kroah-Hartman ; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 02:33:56PM +, Deucher, Alexander wrote:
> >> -Original Message-
> >> From: Greg Kroah-Hartman 
> >> Sent: Monday, September 24, 2018 7:53 AM
> >> To: linux-kernel@vger.kernel.org
> >> Cc: Greg Kroah-Hartman ;
> >> sta...@vger.kernel.org; Wentland, Harry ;
> >> Deucher, Alexander ; Zhu, Rex
> >> ; Sasha Levin 
> >> Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> >> for
> >> smu7/8
> >>
> >> 4.18-stable review patch.  If anyone has any objections, please let me
> know.
> >>
> >
> >This regresses power usage on 4.18.  Please revert.
> >https://bugzilla.kernel.org/show_bug.cgi?id=201275
> 
> Hi Alex,
> 
> Thank you for the report.
> 
> I'm working on improving this process, I'd be very grateful if you could
> answer a few questions about this:
> 
> 1. Is the same breakage seen upstream? (if so, it should be reverted there as
> well and we can grab the revert into -stable).

No regression in 4.19 or -next.

> 2. Does the issue reported by this patch ("pipes seem to hang with a 4k DP
> and 1080p HDMI display") exist in the 4.18 stable tree?

I don't think so, but I'm not 100% sure.  Harry, Rex do you know if this is a 
general issue or was it just fall out from the changes to the interface?

> 3. If not, could you briefly explain why?

We refactored the interface between the power and display components and this 
patch fixed up some of that fallout due to the differences in units used in 
each component.

> 
> 
> The algorithm I use was very confident about this patch being stable material,
> and when I looked at it back then (and again now) I was very confident of the
> same. If I can understand where I was wrong I could improve my process.

There are some other dependent patches required that were not flagged in the 
patch itself.  IIRC, they were a bit big for stable.

Alex

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Sasha Levin 
> Sent: Monday, October 8, 2018 10:49 AM
> To: Deucher, Alexander 
> Cc: Greg Kroah-Hartman ; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 02:33:56PM +, Deucher, Alexander wrote:
> >> -Original Message-
> >> From: Greg Kroah-Hartman 
> >> Sent: Monday, September 24, 2018 7:53 AM
> >> To: linux-kernel@vger.kernel.org
> >> Cc: Greg Kroah-Hartman ;
> >> sta...@vger.kernel.org; Wentland, Harry ;
> >> Deucher, Alexander ; Zhu, Rex
> >> ; Sasha Levin 
> >> Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> >> for
> >> smu7/8
> >>
> >> 4.18-stable review patch.  If anyone has any objections, please let me
> know.
> >>
> >
> >This regresses power usage on 4.18.  Please revert.
> >https://bugzilla.kernel.org/show_bug.cgi?id=201275
> 
> Hi Alex,
> 
> Thank you for the report.
> 
> I'm working on improving this process, I'd be very grateful if you could
> answer a few questions about this:
> 
> 1. Is the same breakage seen upstream? (if so, it should be reverted there as
> well and we can grab the revert into -stable).

No regression in 4.19 or -next.

> 2. Does the issue reported by this patch ("pipes seem to hang with a 4k DP
> and 1080p HDMI display") exist in the 4.18 stable tree?

I don't think so, but I'm not 100% sure.  Harry, Rex do you know if this is a 
general issue or was it just fall out from the changes to the interface?

> 3. If not, could you briefly explain why?

We refactored the interface between the power and display components and this 
patch fixed up some of that fallout due to the differences in units used in 
each component.

> 
> 
> The algorithm I use was very confident about this patch being stable material,
> and when I looked at it back then (and again now) I was very confident of the
> same. If I can understand where I was wrong I could improve my process.

There are some other dependent patches required that were not flagged in the 
patch itself.  IIRC, they were a bit big for stable.

Alex

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Monday, October 8, 2018 10:44 AM
> To: Deucher, Alexander 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 02:33:56PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: Greg Kroah-Hartman 
> > > Sent: Monday, September 24, 2018 7:53 AM
> > > To: linux-kernel@vger.kernel.org
> > > Cc: Greg Kroah-Hartman ;
> > > sta...@vger.kernel.org; Wentland, Harry ;
> > > Deucher, Alexander ; Zhu, Rex
> > > ; Sasha Levin 
> > > Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to
> > > DC for
> > > smu7/8
> > >
> > > 4.18-stable review patch.  If anyone has any objections, please let me
> know.
> > >
> >
> > This regresses power usage on 4.18.  Please revert.
> > https://bugzilla.kernel.org/show_bug.cgi?id=201275
> 
> Is it reverted in "upstream" as well?  If not, please get the fix in there and
> then I will be glad to backport it here.
> 

There's no breakage in upstream.  This patch addressed breakages in 4.19 
specifically due to some other refactoring we did in the driver.  I'll try and 
dig out the exact series of patches this addressed.

Alex

> thanks,
> 
> greg k-h

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Monday, October 8, 2018 10:44 AM
> To: Deucher, Alexander 
> Cc: linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Wentland, Harry
> ; Zhu, Rex ; Sasha Levin
> 
> Subject: Re: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC
> for smu7/8
> 
> On Mon, Oct 08, 2018 at 02:33:56PM +, Deucher, Alexander wrote:
> > > -Original Message-
> > > From: Greg Kroah-Hartman 
> > > Sent: Monday, September 24, 2018 7:53 AM
> > > To: linux-kernel@vger.kernel.org
> > > Cc: Greg Kroah-Hartman ;
> > > sta...@vger.kernel.org; Wentland, Harry ;
> > > Deucher, Alexander ; Zhu, Rex
> > > ; Sasha Levin 
> > > Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to
> > > DC for
> > > smu7/8
> > >
> > > 4.18-stable review patch.  If anyone has any objections, please let me
> know.
> > >
> >
> > This regresses power usage on 4.18.  Please revert.
> > https://bugzilla.kernel.org/show_bug.cgi?id=201275
> 
> Is it reverted in "upstream" as well?  If not, please get the fix in there and
> then I will be glad to backport it here.
> 

There's no breakage in upstream.  This patch addressed breakages in 4.19 
specifically due to some other refactoring we did in the driver.  I'll try and 
dig out the exact series of patches this addressed.

Alex

> thanks,
> 
> greg k-h

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Monday, September 24, 2018 7:53 AM
> To: linux-kernel@vger.kernel.org
> Cc: Greg Kroah-Hartman ;
> sta...@vger.kernel.org; Wentland, Harry ;
> Deucher, Alexander ; Zhu, Rex
> ; Sasha Levin 
> Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for
> smu7/8
> 
> 4.18-stable review patch.  If anyone has any objections, please let me know.
> 

This regresses power usage on 4.18.  Please revert.
https://bugzilla.kernel.org/show_bug.cgi?id=201275

Thanks,

Alex

> --
> 
> From: Harry Wentland 
> 
> [ Upstream commit c3cb424a086921f6bb0449b10d998352a756d6d5 ]
> 
> The previous change wasn't covering smu 7 and 8 and therefore DC was
> seeing wrong clock values.
> 
> This fixes an issue where the pipes seem to hang with a 4k DP and 1080p
> HDMI display.
> 
> Fixes: c3df50abc84b ("drm/amd/pp: Convert clock unit to KHz as defined")
> Signed-off-by: Harry Wentland 
> Acked-by: Alex Deucher 
> Cc:rex@amd.com
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c |8 
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu8_hwmgr.c |6 +++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> @@ -4555,12 +4555,12 @@ static int smu7_get_sclks(struct pp_hwmg
>   return -EINVAL;
>   dep_sclk_table = table_info->vdd_dep_on_sclk;
>   for (i = 0; i < dep_sclk_table->count; i++)
> - clocks->clock[i] = dep_sclk_table->entries[i].clk;
> + clocks->clock[i] = dep_sclk_table->entries[i].clk * 10;
>   clocks->count = dep_sclk_table->count;
>   } else if (hwmgr->pp_table_version == PP_TABLE_V0) {
>   sclk_table = hwmgr->dyn_state.vddc_dependency_on_sclk;
>   for (i = 0; i < sclk_table->count; i++)
> - clocks->clock[i] = sclk_table->entries[i].clk;
> + clocks->clock[i] = sclk_table->entries[i].clk * 10;
>   clocks->count = sclk_table->count;
>   }
> 
> @@ -4592,7 +4592,7 @@ static int smu7_get_mclks(struct pp_hwmg
>   return -EINVAL;
>   dep_mclk_table = table_info->vdd_dep_on_mclk;
>   for (i = 0; i < dep_mclk_table->count; i++) {
> - clocks->clock[i] = dep_mclk_table->entries[i].clk;
> + clocks->clock[i] = dep_mclk_table->entries[i].clk * 10;
>   clocks->latency[i] = smu7_get_mem_latency(hwmgr,
>   dep_mclk_table-
> >entries[i].clk);
>   }
> @@ -4600,7 +4600,7 @@ static int smu7_get_mclks(struct pp_hwmg
>   } else if (hwmgr->pp_table_version == PP_TABLE_V0) {
>   mclk_table = hwmgr-
> >dyn_state.vddc_dependency_on_mclk;
>   for (i = 0; i < mclk_table->count; i++)
> - clocks->clock[i] = mclk_table->entries[i].clk;
> + clocks->clock[i] = mclk_table->entries[i].clk * 10;
>   clocks->count = mclk_table->count;
>   }
>   return 0;
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu8_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu8_hwmgr.c
> @@ -1605,17 +1605,17 @@ static int smu8_get_clock_by_type(struct
>   switch (type) {
>   case amd_pp_disp_clock:
>   for (i = 0; i < clocks->count; i++)
> - clocks->clock[i] = data->sys_info.display_clock[i];
> + clocks->clock[i] = data->sys_info.display_clock[i] * 10;
>   break;
>   case amd_pp_sys_clock:
>   table = hwmgr->dyn_state.vddc_dependency_on_sclk;
>   for (i = 0; i < clocks->count; i++)
> - clocks->clock[i] = table->entries[i].clk;
> + clocks->clock[i] = table->entries[i].clk * 10;
>   break;
>   case amd_pp_mem_clock:
>   clocks->count = SMU8_NUM_NBPMEMORYCLOCK;
>   for (i = 0; i < clocks->count; i++)
> - clocks->clock[i] = data-
> >sys_info.nbp_memory_clock[clocks->count - 1 - i];
> + clocks->clock[i] = data-
> >sys_info.nbp_memory_clock[clocks->count - 1
> +- i] * 10;
>   break;
>   default:
>   return -1;
>

RE: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for smu7/8

2018-10-08 Thread Deucher, Alexander

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Monday, September 24, 2018 7:53 AM
> To: linux-kernel@vger.kernel.org
> Cc: Greg Kroah-Hartman ;
> sta...@vger.kernel.org; Wentland, Harry ;
> Deucher, Alexander ; Zhu, Rex
> ; Sasha Levin 
> Subject: [PATCH 4.18 222/235] drm/amd/pp: Send khz clock values to DC for
> smu7/8
> 
> 4.18-stable review patch.  If anyone has any objections, please let me know.
> 

This regresses power usage on 4.18.  Please revert.
https://bugzilla.kernel.org/show_bug.cgi?id=201275

Thanks,

Alex

> --
> 
> From: Harry Wentland 
> 
> [ Upstream commit c3cb424a086921f6bb0449b10d998352a756d6d5 ]
> 
> The previous change wasn't covering smu 7 and 8 and therefore DC was
> seeing wrong clock values.
> 
> This fixes an issue where the pipes seem to hang with a 4k DP and 1080p
> HDMI display.
> 
> Fixes: c3df50abc84b ("drm/amd/pp: Convert clock unit to KHz as defined")
> Signed-off-by: Harry Wentland 
> Acked-by: Alex Deucher 
> Cc:rex@amd.com
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c |8 
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu8_hwmgr.c |6 +++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> @@ -4555,12 +4555,12 @@ static int smu7_get_sclks(struct pp_hwmg
>   return -EINVAL;
>   dep_sclk_table = table_info->vdd_dep_on_sclk;
>   for (i = 0; i < dep_sclk_table->count; i++)
> - clocks->clock[i] = dep_sclk_table->entries[i].clk;
> + clocks->clock[i] = dep_sclk_table->entries[i].clk * 10;
>   clocks->count = dep_sclk_table->count;
>   } else if (hwmgr->pp_table_version == PP_TABLE_V0) {
>   sclk_table = hwmgr->dyn_state.vddc_dependency_on_sclk;
>   for (i = 0; i < sclk_table->count; i++)
> - clocks->clock[i] = sclk_table->entries[i].clk;
> + clocks->clock[i] = sclk_table->entries[i].clk * 10;
>   clocks->count = sclk_table->count;
>   }
> 
> @@ -4592,7 +4592,7 @@ static int smu7_get_mclks(struct pp_hwmg
>   return -EINVAL;
>   dep_mclk_table = table_info->vdd_dep_on_mclk;
>   for (i = 0; i < dep_mclk_table->count; i++) {
> - clocks->clock[i] = dep_mclk_table->entries[i].clk;
> + clocks->clock[i] = dep_mclk_table->entries[i].clk * 10;
>   clocks->latency[i] = smu7_get_mem_latency(hwmgr,
>   dep_mclk_table-
> >entries[i].clk);
>   }
> @@ -4600,7 +4600,7 @@ static int smu7_get_mclks(struct pp_hwmg
>   } else if (hwmgr->pp_table_version == PP_TABLE_V0) {
>   mclk_table = hwmgr-
> >dyn_state.vddc_dependency_on_mclk;
>   for (i = 0; i < mclk_table->count; i++)
> - clocks->clock[i] = mclk_table->entries[i].clk;
> + clocks->clock[i] = mclk_table->entries[i].clk * 10;
>   clocks->count = mclk_table->count;
>   }
>   return 0;
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu8_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu8_hwmgr.c
> @@ -1605,17 +1605,17 @@ static int smu8_get_clock_by_type(struct
>   switch (type) {
>   case amd_pp_disp_clock:
>   for (i = 0; i < clocks->count; i++)
> - clocks->clock[i] = data->sys_info.display_clock[i];
> + clocks->clock[i] = data->sys_info.display_clock[i] * 10;
>   break;
>   case amd_pp_sys_clock:
>   table = hwmgr->dyn_state.vddc_dependency_on_sclk;
>   for (i = 0; i < clocks->count; i++)
> - clocks->clock[i] = table->entries[i].clk;
> + clocks->clock[i] = table->entries[i].clk * 10;
>   break;
>   case amd_pp_mem_clock:
>   clocks->count = SMU8_NUM_NBPMEMORYCLOCK;
>   for (i = 0; i < clocks->count; i++)
> - clocks->clock[i] = data-
> >sys_info.nbp_memory_clock[clocks->count - 1 - i];
> + clocks->clock[i] = data-
> >sys_info.nbp_memory_clock[clocks->count - 1
> +- i] * 10;
>   break;
>   default:
>   return -1;
>

RE: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock

2018-05-08 Thread Deucher, Alexander

> -Original Message-
> From: Agrawal, Akshu
> Sent: Tuesday, May 8, 2018 12:04 AM
> To: Deucher, Alexander <alexander.deuc...@amd.com>
> Cc: djku...@chromium.org; mturque...@baylibre.com; sb...@kernel.org;
> Koenig, Christian <christian.koe...@amd.com>; airl...@redhat.com; Liu,
> Shaoyun <shaoyun@amd.com>; linux-kernel@vger.kernel.org; linux-
> c...@vger.kernel.org; r...@rjwysocki.net; l...@kernel.org; linux-
> a...@vger.kernel.org
> Subject: Re: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock
> 
> 
> 
> On 5/8/2018 3:14 AM, Deucher, Alexander wrote:
> >> -Original Message-
> >> From: Agrawal, Akshu
> >> Sent: Monday, May 7, 2018 6:14 AM
> >> Cc: djku...@chromium.org; Agrawal, Akshu <akshu.agra...@amd.com>;
> >> Deucher, Alexander <alexander.deuc...@amd.com>;
> >> mturque...@baylibre.com; sb...@kernel.org; Koenig, Christian
> >> <christian.koe...@amd.com>; airl...@redhat.com; Liu, Shaoyun
> >> <shaoyun@amd.com>; linux-kernel@vger.kernel.org; linux-
> >> c...@vger.kernel.org; r...@rjwysocki.net; l...@kernel.org; linux-
> >> a...@vger.kernel.org
> >> Subject: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock
> >>
> >> Stoney SoC provides oscout clock. This clock can support 25Mhz and
> >> 48Mhz of frequency.
> >> The clock is available for general system use.
> >>
> >> Signed-off-by: Akshu Agrawal <akshu.agra...@amd.com>
> >> ---
> >> v2: config change, added SPDX tag and used clk_hw_register_.
> >> v3: Fix kbuild warning for checking of NULL pointer
> >> v4: unregister clk_hw in driver remove, add .suppress_bind_attrs
> >>  drivers/clk/x86/Makefile |  3 +-
> >>  drivers/clk/x86/clk-st.c | 85
> >> 
> >>  include/linux/platform_data/clk-st.h | 17 
> >>  3 files changed, 104 insertions(+), 1 deletion(-)  create mode
> >> 100644 drivers/clk/x86/clk-st.c  create mode 100644
> >> include/linux/platform_data/clk-st.h
> >>
> >> diff --git a/drivers/clk/x86/Makefile b/drivers/clk/x86/Makefile
> >> index 1367afb..00303bc 100644
> >> --- a/drivers/clk/x86/Makefile
> >> +++ b/drivers/clk/x86/Makefile
> >> @@ -1,3 +1,4 @@
> >> +obj-$(CONFIG_PMC_ATOM)+= clk-pmc-atom.o
> >> +obj-$(CONFIG_X86_AMD_PLATFORM_DEVICE) += clk-st.o
> >>  clk-x86-lpss-objs := clk-lpt.o
> >>  obj-$(CONFIG_X86_INTEL_LPSS)  += clk-x86-lpss.o
> >> -obj-$(CONFIG_PMC_ATOM)+= clk-pmc-atom.o
> >> diff --git a/drivers/clk/x86/clk-st.c b/drivers/clk/x86/clk-st.c new
> >> file mode
> >> 100644 index 000..8a7795c
> >> --- /dev/null
> >> +++ b/drivers/clk/x86/clk-st.c
> >> @@ -0,0 +1,85 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >
> > Should this be MIT?  The original license was MIT.
> >
> > Alex
> >
> 
> We are adding SPDX tag, while license remains same GPL-2.0
> 
> What I have read is this is "to provide license identifiers inside the source
> code that could be easily parsed by machines and would allow checking for
> license compliance of an open source project easier."

My point as just that the original license on the file that you first sent out 
was MIT so the SPDX tag should be MIT rather than GPL.  E.g.,
SPDX-License-Identifier: MIT

Alex

RE: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock

2018-05-08 Thread Deucher, Alexander

> -Original Message-
> From: Agrawal, Akshu
> Sent: Tuesday, May 8, 2018 12:04 AM
> To: Deucher, Alexander 
> Cc: djku...@chromium.org; mturque...@baylibre.com; sb...@kernel.org;
> Koenig, Christian ; airl...@redhat.com; Liu,
> Shaoyun ; linux-kernel@vger.kernel.org; linux-
> c...@vger.kernel.org; r...@rjwysocki.net; l...@kernel.org; linux-
> a...@vger.kernel.org
> Subject: Re: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock
> 
> 
> 
> On 5/8/2018 3:14 AM, Deucher, Alexander wrote:
> >> -Original Message-
> >> From: Agrawal, Akshu
> >> Sent: Monday, May 7, 2018 6:14 AM
> >> Cc: djku...@chromium.org; Agrawal, Akshu ;
> >> Deucher, Alexander ;
> >> mturque...@baylibre.com; sb...@kernel.org; Koenig, Christian
> >> ; airl...@redhat.com; Liu, Shaoyun
> >> ; linux-kernel@vger.kernel.org; linux-
> >> c...@vger.kernel.org; r...@rjwysocki.net; l...@kernel.org; linux-
> >> a...@vger.kernel.org
> >> Subject: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock
> >>
> >> Stoney SoC provides oscout clock. This clock can support 25Mhz and
> >> 48Mhz of frequency.
> >> The clock is available for general system use.
> >>
> >> Signed-off-by: Akshu Agrawal 
> >> ---
> >> v2: config change, added SPDX tag and used clk_hw_register_.
> >> v3: Fix kbuild warning for checking of NULL pointer
> >> v4: unregister clk_hw in driver remove, add .suppress_bind_attrs
> >>  drivers/clk/x86/Makefile |  3 +-
> >>  drivers/clk/x86/clk-st.c | 85
> >> 
> >>  include/linux/platform_data/clk-st.h | 17 
> >>  3 files changed, 104 insertions(+), 1 deletion(-)  create mode
> >> 100644 drivers/clk/x86/clk-st.c  create mode 100644
> >> include/linux/platform_data/clk-st.h
> >>
> >> diff --git a/drivers/clk/x86/Makefile b/drivers/clk/x86/Makefile
> >> index 1367afb..00303bc 100644
> >> --- a/drivers/clk/x86/Makefile
> >> +++ b/drivers/clk/x86/Makefile
> >> @@ -1,3 +1,4 @@
> >> +obj-$(CONFIG_PMC_ATOM)+= clk-pmc-atom.o
> >> +obj-$(CONFIG_X86_AMD_PLATFORM_DEVICE) += clk-st.o
> >>  clk-x86-lpss-objs := clk-lpt.o
> >>  obj-$(CONFIG_X86_INTEL_LPSS)  += clk-x86-lpss.o
> >> -obj-$(CONFIG_PMC_ATOM)+= clk-pmc-atom.o
> >> diff --git a/drivers/clk/x86/clk-st.c b/drivers/clk/x86/clk-st.c new
> >> file mode
> >> 100644 index 000..8a7795c
> >> --- /dev/null
> >> +++ b/drivers/clk/x86/clk-st.c
> >> @@ -0,0 +1,85 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >
> > Should this be MIT?  The original license was MIT.
> >
> > Alex
> >
> 
> We are adding SPDX tag, while license remains same GPL-2.0
> 
> What I have read is this is "to provide license identifiers inside the source
> code that could be easily parsed by machines and would allow checking for
> license compliance of an open source project easier."

My point as just that the original license on the file that you first sent out 
was MIT so the SPDX tag should be MIT rather than GPL.  E.g.,
SPDX-License-Identifier: MIT

Alex

RE: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock

2018-05-07 Thread Deucher, Alexander

> -Original Message-
> From: Agrawal, Akshu
> Sent: Monday, May 7, 2018 6:14 AM
> Cc: djku...@chromium.org; Agrawal, Akshu <akshu.agra...@amd.com>;
> Deucher, Alexander <alexander.deuc...@amd.com>;
> mturque...@baylibre.com; sb...@kernel.org; Koenig, Christian
> <christian.koe...@amd.com>; airl...@redhat.com; Liu, Shaoyun
> <shaoyun@amd.com>; linux-kernel@vger.kernel.org; linux-
> c...@vger.kernel.org; r...@rjwysocki.net; l...@kernel.org; linux-
> a...@vger.kernel.org
> Subject: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock
> 
> Stoney SoC provides oscout clock. This clock can support 25Mhz and 48Mhz
> of frequency.
> The clock is available for general system use.
> 
> Signed-off-by: Akshu Agrawal <akshu.agra...@amd.com>
> ---
> v2: config change, added SPDX tag and used clk_hw_register_.
> v3: Fix kbuild warning for checking of NULL pointer
> v4: unregister clk_hw in driver remove, add .suppress_bind_attrs
>  drivers/clk/x86/Makefile |  3 +-
>  drivers/clk/x86/clk-st.c | 85
> 
>  include/linux/platform_data/clk-st.h | 17 
>  3 files changed, 104 insertions(+), 1 deletion(-)  create mode 100644
> drivers/clk/x86/clk-st.c  create mode 100644
> include/linux/platform_data/clk-st.h
> 
> diff --git a/drivers/clk/x86/Makefile b/drivers/clk/x86/Makefile index
> 1367afb..00303bc 100644
> --- a/drivers/clk/x86/Makefile
> +++ b/drivers/clk/x86/Makefile
> @@ -1,3 +1,4 @@
> +obj-$(CONFIG_PMC_ATOM)   += clk-pmc-atom.o
> +obj-$(CONFIG_X86_AMD_PLATFORM_DEVICE)+= clk-st.o
>  clk-x86-lpss-objs:= clk-lpt.o
>  obj-$(CONFIG_X86_INTEL_LPSS) += clk-x86-lpss.o
> -obj-$(CONFIG_PMC_ATOM)   += clk-pmc-atom.o
> diff --git a/drivers/clk/x86/clk-st.c b/drivers/clk/x86/clk-st.c new file mode
> 100644 index 000..8a7795c
> --- /dev/null
> +++ b/drivers/clk/x86/clk-st.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: GPL-2.0

Should this be MIT?  The original license was MIT.

Alex

> +/*
> + * clock framework for AMD Stoney based clocks
> + *
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include  #include
> +
> +
> +/* Clock Driving Strength 2 register */
> +#define CLKDRVSTR2   0x28
> +/* Clock Control 1 register */
> +#define MISCCLKCNTL1 0x40
> +/* Auxiliary clock1 enable bit */
> +#define OSCCLKENB2
> +/* 25Mhz auxiliary output clock freq bit */
> +#define OSCOUT1CLK25MHZ  16
> +
> +#define ST_CLK_48M   0
> +#define ST_CLK_25M   1
> +#define ST_CLK_MUX   2
> +#define ST_CLK_GATE  3
> +#define ST_MAX_CLKS  4
> +
> +static const char * const clk_oscout1_parents[] = { "clk48MHz",
> +"clk25MHz" };
> +
> +static int st_clk_probe(struct platform_device *pdev) {
> + struct st_clk_data *st_data;
> + struct clk_hw **hws;
> +
> + st_data = dev_get_platdata(>dev);
> + if (!st_data || !st_data->base)
> + return -EINVAL;
> +
> + hws = devm_kzalloc(>dev, sizeof(*hws) * ST_MAX_CLKS,
> GFP_KERNEL);
> + if (!hws)
> + return -ENOMEM;
> +
> + hws[ST_CLK_48M] = clk_hw_register_fixed_rate(NULL, "clk48MHz",
> NULL, 0,
> +  4800);
> + hws[ST_CLK_25M] = clk_hw_register_fixed_rate(NULL, "clk25MHz",
> NULL, 0,
> +  2500);
> +
> + hws[ST_CLK_MUX] = clk_hw_register_mux(NULL, "oscout1_mux",
> + clk_oscout1_parents, ARRAY_SIZE(clk_oscout1_parents),
> + 0, st_data->base + CLKDRVSTR2, OSCOUT1CLK25MHZ, 3, 0,
> NULL);
> +
> + clk_set_parent(hws[ST_CLK_MUX]->clk, hws[ST_CLK_25M]->clk);
> +
> + hws[ST_CLK_GATE] = clk_hw_register_gate(NULL, "oscout1",
> "oscout1_mux",
> + 0, st_data->base + MISCCLKCNTL1, OSCCLKENB,
> + CLK_GATE_SET_TO_DISABLE, NULL);
> +
> + clk_hw_register_clkdev(hws[ST_CLK_GATE], "oscout1", NULL);
> +
> + platform_set_drvdata(pdev, hws);
> + return 0;
> +}
> +
> +static int st_clk_remove(struct platform_device *pdev) {
> + struct clk_hw **hws;
> + int i;
> +
> + hws = platform_get_drvdata(pdev);
> +
> + for (i = 0; i < ST_MAX_CLKS; i++)
> + clk_hw_unregister(hws[i]);
> + return 0;
> +}
> +
> +static struct platform_driver st_clk_driver = {
> + .driver = {
> + .name = "clk-st",
> + .suppress_bind_attrs = true,
> + },
> +

RE: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock

2018-05-07 Thread Deucher, Alexander

> -Original Message-
> From: Agrawal, Akshu
> Sent: Monday, May 7, 2018 6:14 AM
> Cc: djku...@chromium.org; Agrawal, Akshu ;
> Deucher, Alexander ;
> mturque...@baylibre.com; sb...@kernel.org; Koenig, Christian
> ; airl...@redhat.com; Liu, Shaoyun
> ; linux-kernel@vger.kernel.org; linux-
> c...@vger.kernel.org; r...@rjwysocki.net; l...@kernel.org; linux-
> a...@vger.kernel.org
> Subject: [PATCH v4 1/2] clk: x86: Add ST oscout platform clock
> 
> Stoney SoC provides oscout clock. This clock can support 25Mhz and 48Mhz
> of frequency.
> The clock is available for general system use.
> 
> Signed-off-by: Akshu Agrawal 
> ---
> v2: config change, added SPDX tag and used clk_hw_register_.
> v3: Fix kbuild warning for checking of NULL pointer
> v4: unregister clk_hw in driver remove, add .suppress_bind_attrs
>  drivers/clk/x86/Makefile |  3 +-
>  drivers/clk/x86/clk-st.c | 85
> 
>  include/linux/platform_data/clk-st.h | 17 
>  3 files changed, 104 insertions(+), 1 deletion(-)  create mode 100644
> drivers/clk/x86/clk-st.c  create mode 100644
> include/linux/platform_data/clk-st.h
> 
> diff --git a/drivers/clk/x86/Makefile b/drivers/clk/x86/Makefile index
> 1367afb..00303bc 100644
> --- a/drivers/clk/x86/Makefile
> +++ b/drivers/clk/x86/Makefile
> @@ -1,3 +1,4 @@
> +obj-$(CONFIG_PMC_ATOM)   += clk-pmc-atom.o
> +obj-$(CONFIG_X86_AMD_PLATFORM_DEVICE)+= clk-st.o
>  clk-x86-lpss-objs:= clk-lpt.o
>  obj-$(CONFIG_X86_INTEL_LPSS) += clk-x86-lpss.o
> -obj-$(CONFIG_PMC_ATOM)   += clk-pmc-atom.o
> diff --git a/drivers/clk/x86/clk-st.c b/drivers/clk/x86/clk-st.c new file mode
> 100644 index 000..8a7795c
> --- /dev/null
> +++ b/drivers/clk/x86/clk-st.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: GPL-2.0

Should this be MIT?  The original license was MIT.

Alex

> +/*
> + * clock framework for AMD Stoney based clocks
> + *
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include  #include
> +
> +
> +/* Clock Driving Strength 2 register */
> +#define CLKDRVSTR2   0x28
> +/* Clock Control 1 register */
> +#define MISCCLKCNTL1 0x40
> +/* Auxiliary clock1 enable bit */
> +#define OSCCLKENB2
> +/* 25Mhz auxiliary output clock freq bit */
> +#define OSCOUT1CLK25MHZ  16
> +
> +#define ST_CLK_48M   0
> +#define ST_CLK_25M   1
> +#define ST_CLK_MUX   2
> +#define ST_CLK_GATE  3
> +#define ST_MAX_CLKS  4
> +
> +static const char * const clk_oscout1_parents[] = { "clk48MHz",
> +"clk25MHz" };
> +
> +static int st_clk_probe(struct platform_device *pdev) {
> + struct st_clk_data *st_data;
> + struct clk_hw **hws;
> +
> + st_data = dev_get_platdata(>dev);
> + if (!st_data || !st_data->base)
> + return -EINVAL;
> +
> + hws = devm_kzalloc(>dev, sizeof(*hws) * ST_MAX_CLKS,
> GFP_KERNEL);
> + if (!hws)
> + return -ENOMEM;
> +
> + hws[ST_CLK_48M] = clk_hw_register_fixed_rate(NULL, "clk48MHz",
> NULL, 0,
> +  4800);
> + hws[ST_CLK_25M] = clk_hw_register_fixed_rate(NULL, "clk25MHz",
> NULL, 0,
> +  2500);
> +
> + hws[ST_CLK_MUX] = clk_hw_register_mux(NULL, "oscout1_mux",
> + clk_oscout1_parents, ARRAY_SIZE(clk_oscout1_parents),
> + 0, st_data->base + CLKDRVSTR2, OSCOUT1CLK25MHZ, 3, 0,
> NULL);
> +
> + clk_set_parent(hws[ST_CLK_MUX]->clk, hws[ST_CLK_25M]->clk);
> +
> + hws[ST_CLK_GATE] = clk_hw_register_gate(NULL, "oscout1",
> "oscout1_mux",
> + 0, st_data->base + MISCCLKCNTL1, OSCCLKENB,
> + CLK_GATE_SET_TO_DISABLE, NULL);
> +
> + clk_hw_register_clkdev(hws[ST_CLK_GATE], "oscout1", NULL);
> +
> + platform_set_drvdata(pdev, hws);
> + return 0;
> +}
> +
> +static int st_clk_remove(struct platform_device *pdev) {
> + struct clk_hw **hws;
> + int i;
> +
> + hws = platform_get_drvdata(pdev);
> +
> + for (i = 0; i < ST_MAX_CLKS; i++)
> + clk_hw_unregister(hws[i]);
> + return 0;
> +}
> +
> +static struct platform_driver st_clk_driver = {
> + .driver = {
> + .name = "clk-st",
> + .suppress_bind_attrs = true,
> + },
> + .probe = st_clk_probe,
> + .remove = st_clk_remove,
> +};
> +builtin_platform_driver(st_clk_driver);
> diff --git a/include/linux/platform_data/clk-st.h
> b/include/linux/platform_data/clk-st.h
> new file mode 100644
> index 000..188184d
> --- /dev/null
> +++ b/include/linux/platform_data/clk-st.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * clock framework for AMD Stoney based clock
> + *
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + */
> +
> +#ifndef __CLK_ST_H
> +#define __CLK_ST_H
> +
> +#include 
> +
> +struct st_clk_data {
> + void __iomem *base;
> +};
> +
> +#endif /* __CLK_ST_H */
> --
> 1.9.1

RE: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset

2018-04-19 Thread Deucher, Alexander

> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: Thursday, April 19, 2018 5:47 PM
> To: Jason Gunthorpe <j...@ziepe.ca>
> Cc: Sinan Kaya <ok...@codeaurora.org>; Bjorn Helgaas
> <bhelg...@google.com>; linux-...@vger.kernel.org;
> sulr...@codeaurora.org; ti...@codeaurora.org; linux-arm-
> m...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Mike
> Marciniszyn <mike.marcinis...@intel.com>; Dennis Dalessandro
> <dennis.dalessan...@intel.com>; Doug Ledford <dledf...@redhat.com>;
> open list:HFI1 DRIVER <linux-r...@vger.kernel.org>; open list  ker...@vger.kernel.org>; Deucher, Alexander
> <alexander.deuc...@amd.com>
> Subject: Re: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset
> 
> [+cc Alex, who might know why DRM drivers have their own PCIe Gen3
> code]
> 
> On Thu, Apr 19, 2018 at 02:26:32PM -0600, Jason Gunthorpe wrote:
> > On Thu, Apr 19, 2018 at 03:56:23PM -0400, Sinan Kaya wrote:
> > > The infiniband adapter might be connected to a PCI hotplug slot.
> > > Performing secondary bus reset on a hotplug slot causes PCI link
> up/down interrupts.
> > >
> > > Hotplug driver removes the device from system when a link down
> > > interrupt is observed and performs re-enumeration when link up
> interrupt is observed.
> > >
> > > This conflicts with what this code is trying to do. Try secondary
> > > bus reset only if pci_reset_slot() fails/unsupported.
> > >
> > > Signed-off-by: Sinan Kaya <ok...@codeaurora.org>
> > > drivers/infiniband/hw/hfi1/pcie.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/infiniband/hw/hfi1/pcie.c
> > > b/drivers/infiniband/hw/hfi1/pcie.c
> > > index 83d66e8..75f49e3 100644
> > > +++ b/drivers/infiniband/hw/hfi1/pcie.c
> > > @@ -908,7 +908,8 @@ static int trigger_sbr(struct hfi1_devdata *dd)
> >
> > The code above this hunk is:
> >
> > /*
> >  * Trigger a secondary bus reset (SBR) on ourselves using our parent.
> >  *
> >  * Based on pci_parent_bus_reset() which is not exported by the
> >  * kernel core.
> >  */
> > static int trigger_sbr(struct hfi1_devdata *dd) {
> >
> > [..]
> >
> > This really seems like something the PCI core should be helping with,
> > drivers shouldn't be doing stuff like this. I get the feeling this
> > should be a common need if drivers support various error recovery
> > schemes?
> >
> > Bjorn, would be appropriate to export pci_parent_bus_reset() or some
> > variation therin??
> 
> I agree it would be really nice if the PCI core could help out somehow so we
> could get some of this code out of individual drivers.
> 
> If fact, stepping back a few paces, this HFI reset path is part of a 
> transition to
> PCIe gen3 signaling, and I'm not sure why *that* is in the driver either.
> 
> There's an ongoing discussion [1] about why this gen3 code is in the driver.
> Several DRM drivers include similar code (cik_pcie_gen3_enable(),
> si_pcie_gen3_enable()).
> 
> I *thought* the hardware was supposed to automatically negotiate to the
> highest rate supported by both sides without any help at all from software.
> But since several drivers have code to do it themselves, I wonder if I'm
> missing something, or maybe there's something the PCI core should be doing
> that it isn't, and the driver code is basically working around that PCI core
> deficiency.

My understanding was that some platfoms only bring up the link in gen 1 mode 
for compatibility reasons.  TBH, I'm not that familiar with how the links come 
up on different platforms.

Alex

RE: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset

2018-04-19 Thread Deucher, Alexander

> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: Thursday, April 19, 2018 5:47 PM
> To: Jason Gunthorpe 
> Cc: Sinan Kaya ; Bjorn Helgaas
> ; linux-...@vger.kernel.org;
> sulr...@codeaurora.org; ti...@codeaurora.org; linux-arm-
> m...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Mike
> Marciniszyn ; Dennis Dalessandro
> ; Doug Ledford ;
> open list:HFI1 DRIVER ; open list  ker...@vger.kernel.org>; Deucher, Alexander
> 
> Subject: Re: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset
> 
> [+cc Alex, who might know why DRM drivers have their own PCIe Gen3
> code]
> 
> On Thu, Apr 19, 2018 at 02:26:32PM -0600, Jason Gunthorpe wrote:
> > On Thu, Apr 19, 2018 at 03:56:23PM -0400, Sinan Kaya wrote:
> > > The infiniband adapter might be connected to a PCI hotplug slot.
> > > Performing secondary bus reset on a hotplug slot causes PCI link
> up/down interrupts.
> > >
> > > Hotplug driver removes the device from system when a link down
> > > interrupt is observed and performs re-enumeration when link up
> interrupt is observed.
> > >
> > > This conflicts with what this code is trying to do. Try secondary
> > > bus reset only if pci_reset_slot() fails/unsupported.
> > >
> > > Signed-off-by: Sinan Kaya 
> > > drivers/infiniband/hw/hfi1/pcie.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/infiniband/hw/hfi1/pcie.c
> > > b/drivers/infiniband/hw/hfi1/pcie.c
> > > index 83d66e8..75f49e3 100644
> > > +++ b/drivers/infiniband/hw/hfi1/pcie.c
> > > @@ -908,7 +908,8 @@ static int trigger_sbr(struct hfi1_devdata *dd)
> >
> > The code above this hunk is:
> >
> > /*
> >  * Trigger a secondary bus reset (SBR) on ourselves using our parent.
> >  *
> >  * Based on pci_parent_bus_reset() which is not exported by the
> >  * kernel core.
> >  */
> > static int trigger_sbr(struct hfi1_devdata *dd) {
> >
> > [..]
> >
> > This really seems like something the PCI core should be helping with,
> > drivers shouldn't be doing stuff like this. I get the feeling this
> > should be a common need if drivers support various error recovery
> > schemes?
> >
> > Bjorn, would be appropriate to export pci_parent_bus_reset() or some
> > variation therin??
> 
> I agree it would be really nice if the PCI core could help out somehow so we
> could get some of this code out of individual drivers.
> 
> If fact, stepping back a few paces, this HFI reset path is part of a 
> transition to
> PCIe gen3 signaling, and I'm not sure why *that* is in the driver either.
> 
> There's an ongoing discussion [1] about why this gen3 code is in the driver.
> Several DRM drivers include similar code (cik_pcie_gen3_enable(),
> si_pcie_gen3_enable()).
> 
> I *thought* the hardware was supposed to automatically negotiate to the
> highest rate supported by both sides without any help at all from software.
> But since several drivers have code to do it themselves, I wonder if I'm
> missing something, or maybe there's something the PCI core should be doing
> that it isn't, and the driver code is basically working around that PCI core
> deficiency.

My understanding was that some platfoms only bring up the link in gen 1 mode 
for compatibility reasons.  TBH, I'm not that familiar with how the links come 
up on different platforms.

Alex

RE: radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)

2018-01-17 Thread Deucher, Alexander

> -Original Message-
> From: Borislav Petkov [mailto:b...@alien8.de]
> Sent: Wednesday, January 17, 2018 4:04 PM
> To: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian
> <christian.koe...@amd.com>
> Cc: lkml <linux-kernel@vger.kernel.org>
> Subject: radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> 
> Hi guys,
> 
> seen this already?
> 
> I see it has happened during resume. Kernel is rc7+tip/master.
> 
> Box is stable otherwise while I'm working on it...

Being investigated here:
https://bugs.freedesktop.org/show_bug.cgi?id=104082

Alex

> 
> [66861.818432] usb 10-1: USB disconnect, device number 2 [75380.827447]
> perf: interrupt took too long (2527 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79000 [94022.728431] radeon
> :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [94022.735717] 
> swiotlb:
> coherent allocation failed for device :01:00.0 size=2097152
> [94022.743525] CPU: 2 PID: 3069 Comm: Xorg Not tainted 4.15.0-rc7+ #4
> [94022.749711] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013 [94022.749711] Call Trace:
> [94022.749717]  dump_stack+0x67/0x8f
> [94022.749720]  swiotlb_alloc_coherent+0x169/0x170
> [94022.749726]  ttm_dma_pool_get_pages+0x1ea/0x450 [ttm]
> [94022.749731]  ttm_dma_populate+0x248/0x330 [ttm] [94022.749734]
> ttm_tt_bind+0x23/0x50 [ttm] [94022.749737]
> ttm_bo_handle_move_mem+0x3a1/0x3e0 [ttm] [94022.749741]  ?
> ttm_bo_mem_space+0x3bc/0x4a0 [ttm] [94022.749744]
> ttm_bo_validate+0x139/0x150 [ttm] [94022.749746]  ?
> _raw_write_unlock+0x12/0x30 [94022.749748]  ?
> drm_vma_offset_add+0x6a/0x90 [94022.749751]
> ttm_bo_init_reserved+0x3a5/0x470 [ttm] [94022.749754]
> ttm_bo_init+0x4d/0xb0 [ttm] [94022.749778]  ?
> radeon_update_memory_usage.isra.0+0x60/0x60 [radeon] [94022.749784]  ?
> drm_gem_object_init+0x31/0x50 [94022.749796]
> radeon_bo_create+0x1bf/0x290 [radeon] [94022.749809]  ?
> radeon_update_memory_usage.isra.0+0x60/0x60 [radeon] [94022.749822]
> radeon_gem_object_create+0xa9/0x1b0 [radeon] [94022.749835]  ?
> radeon_gem_pwrite_ioctl+0x30/0x30 [radeon] [94022.749848]
> radeon_gem_create_ioctl+0x6a/0xf0 [radeon] [94022.749862]  ?
> radeon_gem_pwrite_ioctl+0x30/0x30 [radeon] [94022.749863]
> drm_ioctl_kernel+0x6e/0xd0 [94022.749865]  ?
> unix_state_double_unlock+0x30/0x30
> [94022.749866]  drm_ioctl+0x33b/0x3f0
> [94022.749879]  ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> [94022.749881]  ? preempt_count_sub+0xa8/0x100 [94022.749882]  ?
> _raw_spin_unlock_irqrestore+0x25/0x50
> [94022.749883]  ? preempt_count_sub+0xa8/0x100 [94022.749894]
> radeon_drm_ioctl+0x5d/0xa0 [radeon] [94022.749896]
> do_vfs_ioctl+0xa2/0x600 [94022.749898]  ? __fget+0x67/0xb0 [94022.749899]
> SyS_ioctl+0x4c/0x90 [94022.749901]  entry_SYSCALL_64_fastpath+0x22/0x8a
> [94022.749902] RIP: 0033:0x7f28b58145e7
> [94022.749903] RSP: 002b:7ffdac6c6948 EFLAGS: 0246 [94022.770322]
> radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [94022.770323]
> swiotlb: coherent allocation failed for device :01:00.0 size=2097152
> 
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

RE: radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)

2018-01-17 Thread Deucher, Alexander

> -Original Message-
> From: Borislav Petkov [mailto:b...@alien8.de]
> Sent: Wednesday, January 17, 2018 4:04 PM
> To: Deucher, Alexander ; Koenig, Christian
> 
> Cc: lkml 
> Subject: radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> 
> Hi guys,
> 
> seen this already?
> 
> I see it has happened during resume. Kernel is rc7+tip/master.
> 
> Box is stable otherwise while I'm working on it...

Being investigated here:
https://bugs.freedesktop.org/show_bug.cgi?id=104082

Alex

> 
> [66861.818432] usb 10-1: USB disconnect, device number 2 [75380.827447]
> perf: interrupt took too long (2527 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79000 [94022.728431] radeon
> :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [94022.735717] 
> swiotlb:
> coherent allocation failed for device :01:00.0 size=2097152
> [94022.743525] CPU: 2 PID: 3069 Comm: Xorg Not tainted 4.15.0-rc7+ #4
> [94022.749711] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013 [94022.749711] Call Trace:
> [94022.749717]  dump_stack+0x67/0x8f
> [94022.749720]  swiotlb_alloc_coherent+0x169/0x170
> [94022.749726]  ttm_dma_pool_get_pages+0x1ea/0x450 [ttm]
> [94022.749731]  ttm_dma_populate+0x248/0x330 [ttm] [94022.749734]
> ttm_tt_bind+0x23/0x50 [ttm] [94022.749737]
> ttm_bo_handle_move_mem+0x3a1/0x3e0 [ttm] [94022.749741]  ?
> ttm_bo_mem_space+0x3bc/0x4a0 [ttm] [94022.749744]
> ttm_bo_validate+0x139/0x150 [ttm] [94022.749746]  ?
> _raw_write_unlock+0x12/0x30 [94022.749748]  ?
> drm_vma_offset_add+0x6a/0x90 [94022.749751]
> ttm_bo_init_reserved+0x3a5/0x470 [ttm] [94022.749754]
> ttm_bo_init+0x4d/0xb0 [ttm] [94022.749778]  ?
> radeon_update_memory_usage.isra.0+0x60/0x60 [radeon] [94022.749784]  ?
> drm_gem_object_init+0x31/0x50 [94022.749796]
> radeon_bo_create+0x1bf/0x290 [radeon] [94022.749809]  ?
> radeon_update_memory_usage.isra.0+0x60/0x60 [radeon] [94022.749822]
> radeon_gem_object_create+0xa9/0x1b0 [radeon] [94022.749835]  ?
> radeon_gem_pwrite_ioctl+0x30/0x30 [radeon] [94022.749848]
> radeon_gem_create_ioctl+0x6a/0xf0 [radeon] [94022.749862]  ?
> radeon_gem_pwrite_ioctl+0x30/0x30 [radeon] [94022.749863]
> drm_ioctl_kernel+0x6e/0xd0 [94022.749865]  ?
> unix_state_double_unlock+0x30/0x30
> [94022.749866]  drm_ioctl+0x33b/0x3f0
> [94022.749879]  ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
> [94022.749881]  ? preempt_count_sub+0xa8/0x100 [94022.749882]  ?
> _raw_spin_unlock_irqrestore+0x25/0x50
> [94022.749883]  ? preempt_count_sub+0xa8/0x100 [94022.749894]
> radeon_drm_ioctl+0x5d/0xa0 [radeon] [94022.749896]
> do_vfs_ioctl+0xa2/0x600 [94022.749898]  ? __fget+0x67/0xb0 [94022.749899]
> SyS_ioctl+0x4c/0x90 [94022.749901]  entry_SYSCALL_64_fastpath+0x22/0x8a
> [94022.749902] RIP: 0033:0x7f28b58145e7
> [94022.749903] RSP: 002b:7ffdac6c6948 EFLAGS: 0246 [94022.770322]
> radeon :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [94022.770323]
> swiotlb: coherent allocation failed for device :01:00.0 size=2097152
> 
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

RE: [PATCH] ASoC: amd: Add error checking to probe function

2017-11-21 Thread Deucher, Alexander

> -Original Message-
> From: Agrawal, Akshu
> Sent: Tuesday, November 21, 2017 1:15 AM
> To: Deucher, Alexander; 'Guenter Roeck'; Liam Girdwood; Mukunda,
> Vijendar
> Cc: Mark Brown; Jaroslav Kysela; Takashi Iwai; alsa-de...@alsa-project.org;
> linux-kernel@vger.kernel.org; Dominik Behr; Daniel Kurtz
> Subject: Re: [PATCH] ASoC: amd: Add error checking to probe function
> 
> 
> 
> On 11/21/2017 10:17 AM, Deucher, Alexander wrote:
> >> -Original Message-
> >> From: Guenter Roeck [mailto:groe...@gmail.com] On Behalf Of Guenter
> >> Roeck
> >> Sent: Monday, November 20, 2017 11:28 PM
> >> To: Liam Girdwood
> >> Cc: Mark Brown; Jaroslav Kysela; Takashi Iwai; alsa-devel@alsa-
> project.org;
> >> linux-kernel@vger.kernel.org; Guenter Roeck; Deucher, Alexander;
> Dominik
> >> Behr; Daniel Kurtz
> >> Subject: [PATCH] ASoC: amd: Add error checking to probe function
> >>
> >> The acp_audio_dma does not perform sufficient error checking in its
> probe
> >> function. This can result in crashes if a critical error path is
> >> encountered.
> >>
> >> Fixes: 7c31335a03b6a ("ASoC: AMD: add AMD ASoC ACP 2.x DMA driver")
> >> Cc: Alex Deucher <alexander.deuc...@amd.com>
> >> Cc: Dominik Behr <db...@chromium.org>
> >> Cc: Daniel Kurtz <djku...@chromium.org>
> >> Signed-off-by: Guenter Roeck <li...@roeck-us.net>
> >> ---
> >> I didn't add an error check to acp_init() since I was not sure if
> >> its return value is ignored on purpose.
> >
> > Vijendar, Akshu can you comment?
> 
> This is also the case of missing error check.
> acp_init will return error if either sw reset did not happen or clock
> did not get enabled. In both cases we should error out in probe.
> 

Can you send out a patch to enable that error checking?

Thanks,

Alex

> >
> > The patch looks good to me.
> > Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
> >
> >>
> >>   sound/soc/amd/acp-pcm-dma.c | 7 +++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-
> >> dma.c
> >> index 9f521a55d610..b5e41df6bb3a 100644
> >> --- a/sound/soc/amd/acp-pcm-dma.c
> >> +++ b/sound/soc/amd/acp-pcm-dma.c
> >> @@ -1051,6 +1051,11 @@ static int acp_audio_probe(struct
> platform_device
> >> *pdev)
> >>struct resource *res;
> >>const u32 *pdata = pdev->dev.platform_data;
> >>
> >> +  if (!pdata) {
> >> +  dev_err(>dev, "Missing platform data\n");
> >> +  return -ENODEV;
> >> +  }
> >> +
> >>audio_drv_data = devm_kzalloc(>dev, sizeof(struct
> >> audio_drv_data),
> >>GFP_KERNEL);
> >>if (audio_drv_data == NULL)
> >> @@ -1058,6 +1063,8 @@ static int acp_audio_probe(struct
> platform_device
> >> *pdev)
> >>
> >>res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> >>audio_drv_data->acp_mmio = devm_ioremap_resource(
> >>> dev, res);
> >> +  if (IS_ERR(audio_drv_data->acp_mmio))
> >> +  return PTR_ERR(audio_drv_data->acp_mmio);
> >>
> >>/* The following members gets populated in device 'open'
> >> * function. Till then interrupts are disabled in 'acp_init'
> >> --
> >> 2.7.4
> >

RE: [PATCH] ASoC: amd: Add error checking to probe function

2017-11-21 Thread Deucher, Alexander

> -Original Message-
> From: Agrawal, Akshu
> Sent: Tuesday, November 21, 2017 1:15 AM
> To: Deucher, Alexander; 'Guenter Roeck'; Liam Girdwood; Mukunda,
> Vijendar
> Cc: Mark Brown; Jaroslav Kysela; Takashi Iwai; alsa-de...@alsa-project.org;
> linux-kernel@vger.kernel.org; Dominik Behr; Daniel Kurtz
> Subject: Re: [PATCH] ASoC: amd: Add error checking to probe function
> 
> 
> 
> On 11/21/2017 10:17 AM, Deucher, Alexander wrote:
> >> -Original Message-
> >> From: Guenter Roeck [mailto:groe...@gmail.com] On Behalf Of Guenter
> >> Roeck
> >> Sent: Monday, November 20, 2017 11:28 PM
> >> To: Liam Girdwood
> >> Cc: Mark Brown; Jaroslav Kysela; Takashi Iwai; alsa-devel@alsa-
> project.org;
> >> linux-kernel@vger.kernel.org; Guenter Roeck; Deucher, Alexander;
> Dominik
> >> Behr; Daniel Kurtz
> >> Subject: [PATCH] ASoC: amd: Add error checking to probe function
> >>
> >> The acp_audio_dma does not perform sufficient error checking in its
> probe
> >> function. This can result in crashes if a critical error path is
> >> encountered.
> >>
> >> Fixes: 7c31335a03b6a ("ASoC: AMD: add AMD ASoC ACP 2.x DMA driver")
> >> Cc: Alex Deucher 
> >> Cc: Dominik Behr 
> >> Cc: Daniel Kurtz 
> >> Signed-off-by: Guenter Roeck 
> >> ---
> >> I didn't add an error check to acp_init() since I was not sure if
> >> its return value is ignored on purpose.
> >
> > Vijendar, Akshu can you comment?
> 
> This is also the case of missing error check.
> acp_init will return error if either sw reset did not happen or clock
> did not get enabled. In both cases we should error out in probe.
> 

Can you send out a patch to enable that error checking?

Thanks,

Alex

> >
> > The patch looks good to me.
> > Reviewed-by: Alex Deucher 
> >
> >>
> >>   sound/soc/amd/acp-pcm-dma.c | 7 +++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-
> >> dma.c
> >> index 9f521a55d610..b5e41df6bb3a 100644
> >> --- a/sound/soc/amd/acp-pcm-dma.c
> >> +++ b/sound/soc/amd/acp-pcm-dma.c
> >> @@ -1051,6 +1051,11 @@ static int acp_audio_probe(struct
> platform_device
> >> *pdev)
> >>struct resource *res;
> >>const u32 *pdata = pdev->dev.platform_data;
> >>
> >> +  if (!pdata) {
> >> +  dev_err(>dev, "Missing platform data\n");
> >> +  return -ENODEV;
> >> +  }
> >> +
> >>audio_drv_data = devm_kzalloc(>dev, sizeof(struct
> >> audio_drv_data),
> >>GFP_KERNEL);
> >>if (audio_drv_data == NULL)
> >> @@ -1058,6 +1063,8 @@ static int acp_audio_probe(struct
> platform_device
> >> *pdev)
> >>
> >>res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> >>audio_drv_data->acp_mmio = devm_ioremap_resource(
> >>> dev, res);
> >> +  if (IS_ERR(audio_drv_data->acp_mmio))
> >> +  return PTR_ERR(audio_drv_data->acp_mmio);
> >>
> >>/* The following members gets populated in device 'open'
> >> * function. Till then interrupts are disabled in 'acp_init'
> >> --
> >> 2.7.4
> >

RE: [PATCH] ASoC: amd: Add error checking to probe function

2017-11-20 Thread Deucher, Alexander

> -Original Message-
> From: Guenter Roeck [mailto:groe...@gmail.com] On Behalf Of Guenter
> Roeck
> Sent: Monday, November 20, 2017 11:28 PM
> To: Liam Girdwood
> Cc: Mark Brown; Jaroslav Kysela; Takashi Iwai; alsa-de...@alsa-project.org;
> linux-kernel@vger.kernel.org; Guenter Roeck; Deucher, Alexander; Dominik
> Behr; Daniel Kurtz
> Subject: [PATCH] ASoC: amd: Add error checking to probe function
> 
> The acp_audio_dma does not perform sufficient error checking in its probe
> function. This can result in crashes if a critical error path is
> encountered.
> 
> Fixes: 7c31335a03b6a ("ASoC: AMD: add AMD ASoC ACP 2.x DMA driver")
> Cc: Alex Deucher <alexander.deuc...@amd.com>
> Cc: Dominik Behr <db...@chromium.org>
> Cc: Daniel Kurtz <djku...@chromium.org>
> Signed-off-by: Guenter Roeck <li...@roeck-us.net>
> ---
> I didn't add an error check to acp_init() since I was not sure if
> its return value is ignored on purpose.

Vijendar, Akshu can you comment?

The patch looks good to me.
Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>

> 
>  sound/soc/amd/acp-pcm-dma.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-
> dma.c
> index 9f521a55d610..b5e41df6bb3a 100644
> --- a/sound/soc/amd/acp-pcm-dma.c
> +++ b/sound/soc/amd/acp-pcm-dma.c
> @@ -1051,6 +1051,11 @@ static int acp_audio_probe(struct platform_device
> *pdev)
>   struct resource *res;
>   const u32 *pdata = pdev->dev.platform_data;
> 
> + if (!pdata) {
> + dev_err(>dev, "Missing platform data\n");
> + return -ENODEV;
> + }
> +
>   audio_drv_data = devm_kzalloc(>dev, sizeof(struct
> audio_drv_data),
>   GFP_KERNEL);
>   if (audio_drv_data == NULL)
> @@ -1058,6 +1063,8 @@ static int acp_audio_probe(struct platform_device
> *pdev)
> 
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   audio_drv_data->acp_mmio = devm_ioremap_resource(
> >dev, res);
> + if (IS_ERR(audio_drv_data->acp_mmio))
> + return PTR_ERR(audio_drv_data->acp_mmio);
> 
>   /* The following members gets populated in device 'open'
>* function. Till then interrupts are disabled in 'acp_init'
> --
> 2.7.4

RE: [PATCH] ASoC: amd: Add error checking to probe function

2017-11-20 Thread Deucher, Alexander

> -Original Message-
> From: Guenter Roeck [mailto:groe...@gmail.com] On Behalf Of Guenter
> Roeck
> Sent: Monday, November 20, 2017 11:28 PM
> To: Liam Girdwood
> Cc: Mark Brown; Jaroslav Kysela; Takashi Iwai; alsa-de...@alsa-project.org;
> linux-kernel@vger.kernel.org; Guenter Roeck; Deucher, Alexander; Dominik
> Behr; Daniel Kurtz
> Subject: [PATCH] ASoC: amd: Add error checking to probe function
> 
> The acp_audio_dma does not perform sufficient error checking in its probe
> function. This can result in crashes if a critical error path is
> encountered.
> 
> Fixes: 7c31335a03b6a ("ASoC: AMD: add AMD ASoC ACP 2.x DMA driver")
> Cc: Alex Deucher 
> Cc: Dominik Behr 
> Cc: Daniel Kurtz 
> Signed-off-by: Guenter Roeck 
> ---
> I didn't add an error check to acp_init() since I was not sure if
> its return value is ignored on purpose.

Vijendar, Akshu can you comment?

The patch looks good to me.
Reviewed-by: Alex Deucher 

> 
>  sound/soc/amd/acp-pcm-dma.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-
> dma.c
> index 9f521a55d610..b5e41df6bb3a 100644
> --- a/sound/soc/amd/acp-pcm-dma.c
> +++ b/sound/soc/amd/acp-pcm-dma.c
> @@ -1051,6 +1051,11 @@ static int acp_audio_probe(struct platform_device
> *pdev)
>   struct resource *res;
>   const u32 *pdata = pdev->dev.platform_data;
> 
> + if (!pdata) {
> + dev_err(>dev, "Missing platform data\n");
> + return -ENODEV;
> + }
> +
>   audio_drv_data = devm_kzalloc(>dev, sizeof(struct
> audio_drv_data),
>   GFP_KERNEL);
>   if (audio_drv_data == NULL)
> @@ -1058,6 +1063,8 @@ static int acp_audio_probe(struct platform_device
> *pdev)
> 
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   audio_drv_data->acp_mmio = devm_ioremap_resource(
> >dev, res);
> + if (IS_ERR(audio_drv_data->acp_mmio))
> + return PTR_ERR(audio_drv_data->acp_mmio);
> 
>   /* The following members gets populated in device 'open'
>* function. Till then interrupts are disabled in 'acp_init'
> --
> 2.7.4

RE: linux-next: manual merge of the sound-asoc tree with the drm-misc tree

2017-10-18 Thread Deucher, Alexander

> -Original Message-
> From: Mark Brown [mailto:broo...@kernel.org]
> Sent: Wednesday, October 18, 2017 6:08 AM
> To: Deucher, Alexander; Mukunda, Vijendar; Zhu, Rex; Daniel Vetter; Intel
> Graphics; DRI; Liam Girdwood
> Cc: Linux-Next Mailing List; Linux Kernel Mailing List; alsa-devel@alsa-
> project.org
> Subject: Re: linux-next: manual merge of the sound-asoc tree with the drm-
> misc tree
> 
> On Wed, Oct 18, 2017 at 10:57:33AM +0100, Mark Brown wrote:
> 
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your
> tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
> 
> Actually I'm just going to discard the AMD drivers from the ASoC tree
> because the build produces reams of errors like those below, the changes
> to move the chip type definitions around weren't fully baked.  Please
> resend both the pull request and the patches with this fixed.  Note also
> that if you're basing something on Linus' tree you should use a tagged
> release rather than just a random commit.

Your conflict change affectively reverted 1e4448648333a which is what caused 
the problem.  It looks like Dave did not yet pull the request I made.  I can 
send another pull request, but you may run into the same issue if you resolve 
the conflict the same way again.  

Alex

> 
> In file included from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu.h:51:0,
>  from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu_conn
> ectors.c:31:
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/../include/amd
> _shared.h:33:6: error: nested redefinition of 'enum amd_asic_type'
>  enum amd_asic_type {
>   ^
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/../include/amd
> _shared.h:33:6: error: redeclaration of 'enum amd_asic_type'
> In file included from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/../include/amd
> _shared.h:26:0,
>  from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu.h:51,
>  from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu_conn
> ectors.c:31:
> /home/broonie/tmpfs/next/include/drm/amd_asic_type.h:28:6: note:
> originally defined here
>  enum amd_asic_type {
>   ^

RE: linux-next: manual merge of the sound-asoc tree with the drm-misc tree

2017-10-18 Thread Deucher, Alexander

> -Original Message-
> From: Mark Brown [mailto:broo...@kernel.org]
> Sent: Wednesday, October 18, 2017 6:08 AM
> To: Deucher, Alexander; Mukunda, Vijendar; Zhu, Rex; Daniel Vetter; Intel
> Graphics; DRI; Liam Girdwood
> Cc: Linux-Next Mailing List; Linux Kernel Mailing List; alsa-devel@alsa-
> project.org
> Subject: Re: linux-next: manual merge of the sound-asoc tree with the drm-
> misc tree
> 
> On Wed, Oct 18, 2017 at 10:57:33AM +0100, Mark Brown wrote:
> 
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your
> tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
> 
> Actually I'm just going to discard the AMD drivers from the ASoC tree
> because the build produces reams of errors like those below, the changes
> to move the chip type definitions around weren't fully baked.  Please
> resend both the pull request and the patches with this fixed.  Note also
> that if you're basing something on Linus' tree you should use a tagged
> release rather than just a random commit.

Your conflict change affectively reverted 1e4448648333a which is what caused 
the problem.  It looks like Dave did not yet pull the request I made.  I can 
send another pull request, but you may run into the same issue if you resolve 
the conflict the same way again.  

Alex

> 
> In file included from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu.h:51:0,
>  from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu_conn
> ectors.c:31:
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/../include/amd
> _shared.h:33:6: error: nested redefinition of 'enum amd_asic_type'
>  enum amd_asic_type {
>   ^
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/../include/amd
> _shared.h:33:6: error: redeclaration of 'enum amd_asic_type'
> In file included from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/../include/amd
> _shared.h:26:0,
>  from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu.h:51,
>  from
> /home/broonie/tmpfs/next/drivers/gpu/drm/amd/amdgpu/amdgpu_conn
> ectors.c:31:
> /home/broonie/tmpfs/next/include/drm/amd_asic_type.h:28:6: note:
> originally defined here
>  enum amd_asic_type {
>   ^

RE: linux-next: manual merge of the sound-asoc tree with the drm-misc tree

2017-10-18 Thread Deucher, Alexander

> -Original Message-
> From: Mark Brown [mailto:broo...@kernel.org]
> Sent: Wednesday, October 18, 2017 5:58 AM
> To: Deucher, Alexander; Mukunda, Vijendar; Zhu, Rex; Daniel Vetter; Intel
> Graphics; DRI; Liam Girdwood
> Cc: Linux-Next Mailing List; Linux Kernel Mailing List
> Subject: linux-next: manual merge of the sound-asoc tree with the drm-misc
> tree
> 
> Hi all,
> 
> Today's linux-next merge of the sound-asoc tree got a conflict in:
> 
>   drivers/gpu/drm/amd/include/amd_shared.h
> 
> between commit:
> 
>   cfa289fd4986c ("drm/amdgpu: rename amdgpu_dpm_funcs to
> amd_pm_funcs")
> 
> from the drm-misc tree and commit:
> 
>   1e4448648333a ("drm/amdgpu Moving amdgpu asic types to a separate
> file")

The patch below effectively reverts 1e4448648333a.  If you drop the patch 
below, you should be fine.

Alex

> 
> from the sound-asoc tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc drivers/gpu/drm/amd/include/amd_shared.h
> index de6fc2731b98,3a49fbd8baf8..
> --- a/drivers/gpu/drm/amd/include/amd_shared.h
> +++ b/drivers/gpu/drm/amd/include/amd_shared.h
> @@@ -23,37 -23,10 +23,39 @@@
>   #ifndef __AMD_SHARED_H__
>   #define __AMD_SHARED_H__
> 
> - #define AMD_MAX_USEC_TIMEOUT20  /* 200 ms */
> + #include 
> 
>  +struct seq_file;
>  +
>  +/*
>  + * Supported ASIC types
>  + */
>  +enum amd_asic_type {
>  +CHIP_TAHITI = 0,
>  +CHIP_PITCAIRN,
>  +CHIP_VERDE,
>  +CHIP_OLAND,
>  +CHIP_HAINAN,
>  +CHIP_BONAIRE,
>  +CHIP_KAVERI,
>  +CHIP_KABINI,
>  +CHIP_HAWAII,
>  +CHIP_MULLINS,
>  +CHIP_TOPAZ,
>  +CHIP_TONGA,
>  +CHIP_FIJI,
>  +CHIP_CARRIZO,
>  +CHIP_STONEY,
>  +CHIP_POLARIS10,
>  +CHIP_POLARIS11,
>  +CHIP_POLARIS12,
>  +CHIP_VEGA10,
>  +CHIP_RAVEN,
>  +CHIP_LAST,
>  +};
>  +
> + #define AMD_MAX_USEC_TIMEOUT20  /* 200 ms */
> +
>   /*
>* Chip flags
>*/

RE: linux-next: manual merge of the sound-asoc tree with the drm-misc tree

2017-10-18 Thread Deucher, Alexander

> -Original Message-
> From: Mark Brown [mailto:broo...@kernel.org]
> Sent: Wednesday, October 18, 2017 5:58 AM
> To: Deucher, Alexander; Mukunda, Vijendar; Zhu, Rex; Daniel Vetter; Intel
> Graphics; DRI; Liam Girdwood
> Cc: Linux-Next Mailing List; Linux Kernel Mailing List
> Subject: linux-next: manual merge of the sound-asoc tree with the drm-misc
> tree
> 
> Hi all,
> 
> Today's linux-next merge of the sound-asoc tree got a conflict in:
> 
>   drivers/gpu/drm/amd/include/amd_shared.h
> 
> between commit:
> 
>   cfa289fd4986c ("drm/amdgpu: rename amdgpu_dpm_funcs to
> amd_pm_funcs")
> 
> from the drm-misc tree and commit:
> 
>   1e4448648333a ("drm/amdgpu Moving amdgpu asic types to a separate
> file")

The patch below effectively reverts 1e4448648333a.  If you drop the patch 
below, you should be fine.

Alex

> 
> from the sound-asoc tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc drivers/gpu/drm/amd/include/amd_shared.h
> index de6fc2731b98,3a49fbd8baf8..
> --- a/drivers/gpu/drm/amd/include/amd_shared.h
> +++ b/drivers/gpu/drm/amd/include/amd_shared.h
> @@@ -23,37 -23,10 +23,39 @@@
>   #ifndef __AMD_SHARED_H__
>   #define __AMD_SHARED_H__
> 
> - #define AMD_MAX_USEC_TIMEOUT20  /* 200 ms */
> + #include 
> 
>  +struct seq_file;
>  +
>  +/*
>  + * Supported ASIC types
>  + */
>  +enum amd_asic_type {
>  +CHIP_TAHITI = 0,
>  +CHIP_PITCAIRN,
>  +CHIP_VERDE,
>  +CHIP_OLAND,
>  +CHIP_HAINAN,
>  +CHIP_BONAIRE,
>  +CHIP_KAVERI,
>  +CHIP_KABINI,
>  +CHIP_HAWAII,
>  +CHIP_MULLINS,
>  +CHIP_TOPAZ,
>  +CHIP_TONGA,
>  +CHIP_FIJI,
>  +CHIP_CARRIZO,
>  +CHIP_STONEY,
>  +CHIP_POLARIS10,
>  +CHIP_POLARIS11,
>  +CHIP_POLARIS12,
>  +CHIP_VEGA10,
>  +CHIP_RAVEN,
>  +CHIP_LAST,
>  +};
>  +
> + #define AMD_MAX_USEC_TIMEOUT20  /* 200 ms */
> +
>   /*
>* Chip flags
>*/

RE: [PATCH linux-firmware 2/3] WHENCE: Add new radeon firmware

2017-10-09 Thread Deucher, Alexander

> -Original Message-
> From: Ben Hutchings [mailto:b...@decadent.org.uk]
> Sent: Monday, October 09, 2017 1:18 PM
> To: linux-kernel@vger.kernel.org; linux-firmw...@kernel.org
> Cc: Deucher, Alexander
> Subject: [PATCH linux-firmware 2/3] WHENCE: Add new radeon firmware
> 
> Signed-off-by: Ben Hutchings <b...@decadent.org.uk>

Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>

> ---
>  WHENCE | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/WHENCE b/WHENCE
> index 3010ec56583e..2261a349f7cc 100644
> --- a/WHENCE
> +++ b/WHENCE
> @@ -1874,6 +1874,8 @@ File: radeon/mullins_sdma.bin
>  File: radeon/mullins_sdma1.bin
>  File: radeon/mullins_uvd.bin
>  File: radeon/mullins_vce.bin
> +File: radeon/banks_k_2_smc.bin
> +File: radeon/si58_mc.bin
> 
>  Licence: Redistributable. See LICENSE.radeon for details.
>

RE: [PATCH linux-firmware 2/3] WHENCE: Add new radeon firmware

2017-10-09 Thread Deucher, Alexander

> -Original Message-
> From: Ben Hutchings [mailto:b...@decadent.org.uk]
> Sent: Monday, October 09, 2017 1:18 PM
> To: linux-kernel@vger.kernel.org; linux-firmw...@kernel.org
> Cc: Deucher, Alexander
> Subject: [PATCH linux-firmware 2/3] WHENCE: Add new radeon firmware
> 
> Signed-off-by: Ben Hutchings 

Reviewed-by: Alex Deucher 

> ---
>  WHENCE | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/WHENCE b/WHENCE
> index 3010ec56583e..2261a349f7cc 100644
> --- a/WHENCE
> +++ b/WHENCE
> @@ -1874,6 +1874,8 @@ File: radeon/mullins_sdma.bin
>  File: radeon/mullins_sdma1.bin
>  File: radeon/mullins_uvd.bin
>  File: radeon/mullins_vce.bin
> +File: radeon/banks_k_2_smc.bin
> +File: radeon/si58_mc.bin
> 
>  Licence: Redistributable. See LICENSE.radeon for details.
>

RE: [PATCH 1/4] drm/amd/powerplay: Cocci spatch "alloc_cast"

2017-09-21 Thread Deucher, Alexander

> -Original Message-
> From: Thomas Meyer [mailto:tho...@m3y3r.de]
> Sent: Thursday, September 21, 2017 2:34 AM
> To: Deucher, Alexander; Koenig, Christian; airl...@linux.ie; amd-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH 1/4] drm/amd/powerplay: Cocci spatch "alloc_cast"
> 
> Remove casting the values returned by memory allocation functions like
> kmalloc, kzalloc, kmem_cache_alloc, kmem_cache_zalloc etc."
> Found by coccinelle spatch "api/alloc/alloc_cast.cocci"
> 
> Signed-off-by: Thomas Meyer <tho...@m3y3r.de>
> ---
> 
> diff -u -p
> a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> @@ -291,7 +291,7 @@ static int get_mm_clock_voltage_table(
>   table_size = sizeof(uint32_t) +
> 
>   sizeof(phm_ppt_v1_mm_clock_voltage_dependency_record) *
>   mm_dependency_table->ucNumEntries;
> - mm_table = (phm_ppt_v1_mm_clock_voltage_dependency_table
> *)
> + mm_table =
>   kzalloc(table_size, GFP_KERNEL);

Please fix up the whitespace here and below.

Alex

> 
>   if (!mm_table)
> @@ -519,7 +519,7 @@ static int get_socclk_voltage_dependency
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   clk_dep_table->ucNumEntries;
> 
> - clk_table = (phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -554,7 +554,7 @@ static int get_mclk_voltage_dependency_t
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   mclk_dep_table->ucNumEntries;
> 
> - mclk_table = (phm_ppt_v1_clock_voltage_dependency_table *)
> + mclk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!mclk_table)
> @@ -596,7 +596,7 @@ static int get_gfxclk_voltage_dependency
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   clk_dep_table->ucNumEntries;
> 
> - clk_table = (struct phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -663,7 +663,7 @@ static int get_pix_clk_voltage_dependenc
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   clk_dep_table->ucNumEntries;
> 
> - clk_table = (struct phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -728,7 +728,7 @@ static int get_dcefclk_voltage_dependenc
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   num_entries;
> 
> - clk_table = (struct phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -772,7 +772,7 @@ static int get_pcie_table(struct pp_hwmg
>   sizeof(struct phm_ppt_v1_pcie_record) *
>   atom_pcie_table->ucNumEntries;
> 
> - pcie_table = (struct phm_ppt_v1_pcie_table *)
> + pcie_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!pcie_table)
> @@ -1026,7 +1026,7 @@ static int get_vddc_lookup_table(
>   table_size = sizeof(uint32_t) +
>   sizeof(phm_ppt_v1_voltage_lookup_record) *
> max_levels;
> 
> - table = (phm_ppt_v1_voltage_lookup_table *)
> + table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (NULL == table)

RE: [PATCH 1/4] drm/amd/powerplay: Cocci spatch "alloc_cast"

2017-09-21 Thread Deucher, Alexander

> -Original Message-
> From: Thomas Meyer [mailto:tho...@m3y3r.de]
> Sent: Thursday, September 21, 2017 2:34 AM
> To: Deucher, Alexander; Koenig, Christian; airl...@linux.ie; amd-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH 1/4] drm/amd/powerplay: Cocci spatch "alloc_cast"
> 
> Remove casting the values returned by memory allocation functions like
> kmalloc, kzalloc, kmem_cache_alloc, kmem_cache_zalloc etc."
> Found by coccinelle spatch "api/alloc/alloc_cast.cocci"
> 
> Signed-off-by: Thomas Meyer 
> ---
> 
> diff -u -p
> a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> @@ -291,7 +291,7 @@ static int get_mm_clock_voltage_table(
>   table_size = sizeof(uint32_t) +
> 
>   sizeof(phm_ppt_v1_mm_clock_voltage_dependency_record) *
>   mm_dependency_table->ucNumEntries;
> - mm_table = (phm_ppt_v1_mm_clock_voltage_dependency_table
> *)
> + mm_table =
>   kzalloc(table_size, GFP_KERNEL);

Please fix up the whitespace here and below.

Alex

> 
>   if (!mm_table)
> @@ -519,7 +519,7 @@ static int get_socclk_voltage_dependency
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   clk_dep_table->ucNumEntries;
> 
> - clk_table = (phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -554,7 +554,7 @@ static int get_mclk_voltage_dependency_t
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   mclk_dep_table->ucNumEntries;
> 
> - mclk_table = (phm_ppt_v1_clock_voltage_dependency_table *)
> + mclk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!mclk_table)
> @@ -596,7 +596,7 @@ static int get_gfxclk_voltage_dependency
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   clk_dep_table->ucNumEntries;
> 
> - clk_table = (struct phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -663,7 +663,7 @@ static int get_pix_clk_voltage_dependenc
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   clk_dep_table->ucNumEntries;
> 
> - clk_table = (struct phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -728,7 +728,7 @@ static int get_dcefclk_voltage_dependenc
> 
>   sizeof(phm_ppt_v1_clock_voltage_dependency_record) *
>   num_entries;
> 
> - clk_table = (struct phm_ppt_v1_clock_voltage_dependency_table *)
> + clk_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!clk_table)
> @@ -772,7 +772,7 @@ static int get_pcie_table(struct pp_hwmg
>   sizeof(struct phm_ppt_v1_pcie_record) *
>   atom_pcie_table->ucNumEntries;
> 
> - pcie_table = (struct phm_ppt_v1_pcie_table *)
> + pcie_table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (!pcie_table)
> @@ -1026,7 +1026,7 @@ static int get_vddc_lookup_table(
>   table_size = sizeof(uint32_t) +
>   sizeof(phm_ppt_v1_voltage_lookup_record) *
> max_levels;
> 
> - table = (phm_ppt_v1_voltage_lookup_table *)
> + table =
>   kzalloc(table_size, GFP_KERNEL);
> 
>   if (NULL == table)

RE: [PATCH v3 00/28] DRM API Conversions

2017-08-11 Thread Deucher, Alexander

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Cihangir Akturk
> Sent: Friday, August 11, 2017 8:33 AM
> Cc: de...@driverdev.osuosl.org; linux-arm-...@vger.kernel.org; intel-
> g...@lists.freedesktop.org; linux-kernel@vger.kernel.org; dri-
> de...@lists.freedesktop.org; etna...@lists.freedesktop.org; Cihangir Akturk;
> amd-...@lists.freedesktop.org; dan...@ffwll.ch;
> nouv...@lists.freedesktop.org; linux-te...@vger.kernel.org;
> virtualizat...@lists.linux-foundation.org; freedr...@lists.freedesktop.org
> Subject: [PATCH v3 00/28] DRM API Conversions
> 
> Changes since v2:
> 
> - Patch series is based on *drm-misc-next* as suggested by Sean Paul.
> 
> - Dropped patch 05 (drm/atmel-hlcdc) and patch 25 (drm/vc4) from v2,
>   since they were already pulled in the drm-misc-next
> 
> Changes since v1:
> 
> - This time patches were generated with coccinelle instead of my own
>   script, as suggested by Daniel Vetter.
> 
> - Fixed the typo in commit messages. s/adn/and
> 

FWIW, I already picked up v1 of these patches for radeon and amdgpu.

Alex

> Note: I've included r-b, a-b tags, as these patches are identical to v1
> except for the file: drivers/gpu/drm/i915/i915_gem_object.h
> 
> This patch set replaces the occurrences of drm_*_reference() and
> drm_*_unreference() with the new drm_*_get() and drm_*_put()
> functions.
> All patches in the series do the same thing, converting to the new APIs.
> I created patches per DRM driver as suggested by Daniel Vetter.
> 
> This patch set was generated by scripts/coccinelle/api/drm-get-put.cocci
> 
> Previous thread can be reached at:
> https://marc.info/?l=dri-devel=150178288816047
> 
> Background:
> 
> In the kernel, reference counting APIs use *_get(), *_put() style naming
> to reference-count the objects. But DRM subsystem uses a different
> naming for them such as *_reference(), *_unreference() which is
> inconsistent with the other reference counting APIs in the kernel. To
> solve this consistency issue, Thierry Reding introduced a couple of
> functions and compatibility aliases in the following commits for them.
> 
> commit 020a218f95bd3ceff7dd1022ff7ebc0497bc7bf9
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:38 2017 +0100
> 
> drm: Introduce drm_mode_object_{get,put}()
> 
> commit ad09360750afa18a0a0ce0253d6ea6033abc22e7
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:39 2017 +0100
> 
> drm: Introduce drm_connector_{get,put}()
> 
> commit a4a69da06bc11a937a6e417938b1bb698ee1fa46
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:40 2017 +0100
> 
> drm: Introduce drm_framebuffer_{get,put}()
> 
> commit e6b62714e87c8811d5564b6a0738dcde63a51774
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:41 2017 +0100
> 
> drm: Introduce drm_gem_object_{get,put}()
> 
> commit 6472e5090be7c78749a3c279b4faae87ab835c40
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:42 2017 +0100
> 
> drm: Introduce drm_property_blob_{get,put}()
> 
> Cihangir Akturk (28):
>   drm/amdgpu: switch to drm_*_get(), drm_*_put() helpers
>   drm: mali-dp: switch to drm_*_get(), drm_*_put() helpers
>   drm/armada: switch to drm_*_get(), drm_*_put() helpers
>   drm/ast: switch to drm_*_get(), drm_*_put() helpers
>   drm/bochs: switch to drm_*_get(), drm_*_put() helpers
>   drm/cirrus: switch to drm_*_get(), drm_*_put() helpers
>   drm/etnaviv: switch to drm_*_get(), drm_*_put() helpers
>   drm/exynos: switch to drm_*_get(), drm_*_put() helpers
>   drm/gma500: switch to drm_*_get(), drm_*_put() helpers
>   drm/hisilicon: switch to drm_*_get(), drm_*_put() helpers
>   drm/i915: switch to drm_*_get(), drm_*_put() helpers
>   drm/imx: switch to drm_*_get(), drm_*_put() helpers
>   drm/mediatek: switch to drm_*_get(), drm_*_put() helpers
>   drm/mgag200: switch to drm_*_get(), drm_*_put() helpers
>   drm/msm: switch to drm_*_get(), drm_*_put() helpers
>   drm/nouveau: switch to drm_*_get(), drm_*_put() helpers
>   drm/omapdrm: switch to drm_*_get(), drm_*_put() helpers
>   drm/qxl: switch to drm_*_get(), drm_*_put() helpers
>   drm/radeon: switch to drm_*_get(), drm_*_put() helpers
>   drm/rockchip: switch to drm_*_get(), drm_*_put() helpers
>   drm/tegra: switch to drm_*_get(), drm_*_put() helpers
>   drm/tilcdc: switch to drm_*_get(), drm_*_put() helpers
>   drm/udl: switch to drm_*_get(), drm_*_put() helpers
>   drm/vc4: switch to drm_*_get(), drm_*_put() helpers
>   drm/vgem: switch to drm_*_get(), drm_*_put() helpers
>   drm/virtio: switch to drm_*_get(), drm_*_put() helpers
>   drm/vmwgfx: switch to drm_*_get(), drm_*_put() helpers
>   drm: vboxvideo: switch to drm_*_get(), drm_*_put() helpers
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  6 ++---
>

RE: [PATCH v3 00/28] DRM API Conversions

2017-08-11 Thread Deucher, Alexander

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Cihangir Akturk
> Sent: Friday, August 11, 2017 8:33 AM
> Cc: de...@driverdev.osuosl.org; linux-arm-...@vger.kernel.org; intel-
> g...@lists.freedesktop.org; linux-kernel@vger.kernel.org; dri-
> de...@lists.freedesktop.org; etna...@lists.freedesktop.org; Cihangir Akturk;
> amd-...@lists.freedesktop.org; dan...@ffwll.ch;
> nouv...@lists.freedesktop.org; linux-te...@vger.kernel.org;
> virtualizat...@lists.linux-foundation.org; freedr...@lists.freedesktop.org
> Subject: [PATCH v3 00/28] DRM API Conversions
> 
> Changes since v2:
> 
> - Patch series is based on *drm-misc-next* as suggested by Sean Paul.
> 
> - Dropped patch 05 (drm/atmel-hlcdc) and patch 25 (drm/vc4) from v2,
>   since they were already pulled in the drm-misc-next
> 
> Changes since v1:
> 
> - This time patches were generated with coccinelle instead of my own
>   script, as suggested by Daniel Vetter.
> 
> - Fixed the typo in commit messages. s/adn/and
> 

FWIW, I already picked up v1 of these patches for radeon and amdgpu.

Alex

> Note: I've included r-b, a-b tags, as these patches are identical to v1
> except for the file: drivers/gpu/drm/i915/i915_gem_object.h
> 
> This patch set replaces the occurrences of drm_*_reference() and
> drm_*_unreference() with the new drm_*_get() and drm_*_put()
> functions.
> All patches in the series do the same thing, converting to the new APIs.
> I created patches per DRM driver as suggested by Daniel Vetter.
> 
> This patch set was generated by scripts/coccinelle/api/drm-get-put.cocci
> 
> Previous thread can be reached at:
> https://marc.info/?l=dri-devel=150178288816047
> 
> Background:
> 
> In the kernel, reference counting APIs use *_get(), *_put() style naming
> to reference-count the objects. But DRM subsystem uses a different
> naming for them such as *_reference(), *_unreference() which is
> inconsistent with the other reference counting APIs in the kernel. To
> solve this consistency issue, Thierry Reding introduced a couple of
> functions and compatibility aliases in the following commits for them.
> 
> commit 020a218f95bd3ceff7dd1022ff7ebc0497bc7bf9
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:38 2017 +0100
> 
> drm: Introduce drm_mode_object_{get,put}()
> 
> commit ad09360750afa18a0a0ce0253d6ea6033abc22e7
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:39 2017 +0100
> 
> drm: Introduce drm_connector_{get,put}()
> 
> commit a4a69da06bc11a937a6e417938b1bb698ee1fa46
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:40 2017 +0100
> 
> drm: Introduce drm_framebuffer_{get,put}()
> 
> commit e6b62714e87c8811d5564b6a0738dcde63a51774
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:41 2017 +0100
> 
> drm: Introduce drm_gem_object_{get,put}()
> 
> commit 6472e5090be7c78749a3c279b4faae87ab835c40
> Author: Thierry Reding 
> Date:   Tue Feb 28 15:46:42 2017 +0100
> 
> drm: Introduce drm_property_blob_{get,put}()
> 
> Cihangir Akturk (28):
>   drm/amdgpu: switch to drm_*_get(), drm_*_put() helpers
>   drm: mali-dp: switch to drm_*_get(), drm_*_put() helpers
>   drm/armada: switch to drm_*_get(), drm_*_put() helpers
>   drm/ast: switch to drm_*_get(), drm_*_put() helpers
>   drm/bochs: switch to drm_*_get(), drm_*_put() helpers
>   drm/cirrus: switch to drm_*_get(), drm_*_put() helpers
>   drm/etnaviv: switch to drm_*_get(), drm_*_put() helpers
>   drm/exynos: switch to drm_*_get(), drm_*_put() helpers
>   drm/gma500: switch to drm_*_get(), drm_*_put() helpers
>   drm/hisilicon: switch to drm_*_get(), drm_*_put() helpers
>   drm/i915: switch to drm_*_get(), drm_*_put() helpers
>   drm/imx: switch to drm_*_get(), drm_*_put() helpers
>   drm/mediatek: switch to drm_*_get(), drm_*_put() helpers
>   drm/mgag200: switch to drm_*_get(), drm_*_put() helpers
>   drm/msm: switch to drm_*_get(), drm_*_put() helpers
>   drm/nouveau: switch to drm_*_get(), drm_*_put() helpers
>   drm/omapdrm: switch to drm_*_get(), drm_*_put() helpers
>   drm/qxl: switch to drm_*_get(), drm_*_put() helpers
>   drm/radeon: switch to drm_*_get(), drm_*_put() helpers
>   drm/rockchip: switch to drm_*_get(), drm_*_put() helpers
>   drm/tegra: switch to drm_*_get(), drm_*_put() helpers
>   drm/tilcdc: switch to drm_*_get(), drm_*_put() helpers
>   drm/udl: switch to drm_*_get(), drm_*_put() helpers
>   drm/vc4: switch to drm_*_get(), drm_*_put() helpers
>   drm/vgem: switch to drm_*_get(), drm_*_put() helpers
>   drm/virtio: switch to drm_*_get(), drm_*_put() helpers
>   drm/vmwgfx: switch to drm_*_get(), drm_*_put() helpers
>   drm: vboxvideo: switch to drm_*_get(), drm_*_put() helpers
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  6 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 22

RE: [PATCH] drm/amd/powerplay: rv: Use designated initializers

2017-07-28 Thread Deucher, Alexander

> -Original Message-
> From: keesc...@google.com [mailto:keesc...@google.com] On Behalf Of
> Kees Cook
> Sent: Friday, July 28, 2017 1:16 PM
> To: Alex Deucher
> Cc: LKML; David Airlie; amd-gfx list; Maling list - DRI developers; Deucher,
> Alexander; Zhu, Rex; Koenig, Christian; Zhang, Hawking
> Subject: Re: [PATCH] drm/amd/powerplay: rv: Use designated initializers
> 
> On Thu, Jul 27, 2017 at 6:43 PM, Alex Deucher <alexdeuc...@gmail.com>
> wrote:
> > On Tue, Jul 25, 2017 at 5:47 PM, Kees Cook <keesc...@chromium.org>
> wrote:
> >> As done for vega10 in commit 3ddd396f6b57 ("drm/amd/powerplay: Use
> >> designated initializers") mark other tableFunction entries with designated
> >> initializers. The randstruct plugin requires designated initializers for
> >> structures that are entirely function pointers.
> >>
> >> Cc: Rex Zhu <rex@amd.com>
> >> Cc: Hawking Zhang <hawking.zh...@amd.com>
> >> Cc: Alex Deucher <alexander.deuc...@amd.com>
> >> Signed-off-by: Kees Cook <keesc...@chromium.org>
> >> ---
> >> If I can get an Ack for this, I'll carry it in the gcc-plugins tree, unless
> >> you think this is worth landing for v4.13, in which case, please take it
> >> now. :)
> >>
> >
> > Acked-by: Alex Deucher <alexander.deuc...@amd.com>
> >
> > I'm happy to take this through my tree if that is ok with you.
> 
> Since the randstruct patch depends on this fix, it's likely best to go
> through my tree unless you can get this into v4.13. (Since then when
> the randstruct patch lands in v4.14, it'll already be there.) I'm fine
> either way.

Go ahead and take it through your tree.  Thanks!

Alex

> 
> Thanks!
> 
> -Kees
> 
> --
> Kees Cook
> Pixel Security

RE: [PATCH] drm/amd/powerplay: rv: Use designated initializers

2017-07-28 Thread Deucher, Alexander

> -Original Message-
> From: keesc...@google.com [mailto:keesc...@google.com] On Behalf Of
> Kees Cook
> Sent: Friday, July 28, 2017 1:16 PM
> To: Alex Deucher
> Cc: LKML; David Airlie; amd-gfx list; Maling list - DRI developers; Deucher,
> Alexander; Zhu, Rex; Koenig, Christian; Zhang, Hawking
> Subject: Re: [PATCH] drm/amd/powerplay: rv: Use designated initializers
> 
> On Thu, Jul 27, 2017 at 6:43 PM, Alex Deucher 
> wrote:
> > On Tue, Jul 25, 2017 at 5:47 PM, Kees Cook 
> wrote:
> >> As done for vega10 in commit 3ddd396f6b57 ("drm/amd/powerplay: Use
> >> designated initializers") mark other tableFunction entries with designated
> >> initializers. The randstruct plugin requires designated initializers for
> >> structures that are entirely function pointers.
> >>
> >> Cc: Rex Zhu 
> >> Cc: Hawking Zhang 
> >> Cc: Alex Deucher 
> >> Signed-off-by: Kees Cook 
> >> ---
> >> If I can get an Ack for this, I'll carry it in the gcc-plugins tree, unless
> >> you think this is worth landing for v4.13, in which case, please take it
> >> now. :)
> >>
> >
> > Acked-by: Alex Deucher 
> >
> > I'm happy to take this through my tree if that is ok with you.
> 
> Since the randstruct patch depends on this fix, it's likely best to go
> through my tree unless you can get this into v4.13. (Since then when
> the randstruct patch lands in v4.14, it'll already be there.) I'm fine
> either way.

Go ahead and take it through your tree.  Thanks!

Alex

> 
> Thanks!
> 
> -Kees
> 
> --
> Kees Cook
> Pixel Security

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-07-11 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:jroe...@suse.de]
> Sent: Tuesday, July 11, 2017 7:50 AM
> To: Bjorn Helgaas
> Cc: Bjorn Helgaas; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> Daniel Drake; Deucher, Alexander; Samuel Sieb; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Bjorn,
> 
> On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> > I'm still confused.  Per Samuel
> > (6dd9dbac-9b65-bc7c-bb08-413a05d09...@sieb.net):
> >
> > Samuel> The other patch seems to fix this issue without disabling ATS.
> > Samuel> Isn't that better?
> >
> > and Alex
> >
> (BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd1
> 2.prod.outlook.com):
> >
> > Alex> I talked to our validation team and ATS was validated on Stoney,
> > Alex> so this patch is just working around something else.  The other
> > Alex> patch fixes it and is a valid optimization ...
> >
> > I'm confused about what this "other patch" is and whether we want that
> > one, this one, or both.

Here's the other patch:
https://lists.freedesktop.org/archives/amd-gfx/2017-May/009421.html

> 
> The other patches floating around lowered the ATS flush-rate from the
> AMD IOMMU driver, which makes the issue disappear as well. But the issue
> only disappeared, it is not solved and could probably still be
> reproduced with a GPU usage pattern that increases the ATS flush-rate.
> 
> So blacklisting the device for ATS is still the safest thing we could do
> here.

I don't have any objection per se, but I'd hate to add a quirk to disable it 
only to remove it again in the future if we needed ATS related functionality 
later.  We are in the process of upstreaming KFD support for Carrizo (which is 
a bigger version of Stoney) and that utilizes ATS related functionality to 
provide GPU access to pageable memory.  There are no immediate requirements for 
Stoney, but that may change.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-07-11 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:jroe...@suse.de]
> Sent: Tuesday, July 11, 2017 7:50 AM
> To: Bjorn Helgaas
> Cc: Bjorn Helgaas; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> Daniel Drake; Deucher, Alexander; Samuel Sieb; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Bjorn,
> 
> On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> > I'm still confused.  Per Samuel
> > (6dd9dbac-9b65-bc7c-bb08-413a05d09...@sieb.net):
> >
> > Samuel> The other patch seems to fix this issue without disabling ATS.
> > Samuel> Isn't that better?
> >
> > and Alex
> >
> (BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd1
> 2.prod.outlook.com):
> >
> > Alex> I talked to our validation team and ATS was validated on Stoney,
> > Alex> so this patch is just working around something else.  The other
> > Alex> patch fixes it and is a valid optimization ...
> >
> > I'm confused about what this "other patch" is and whether we want that
> > one, this one, or both.

Here's the other patch:
https://lists.freedesktop.org/archives/amd-gfx/2017-May/009421.html

> 
> The other patches floating around lowered the ATS flush-rate from the
> AMD IOMMU driver, which makes the issue disappear as well. But the issue
> only disappeared, it is not solved and could probably still be
> reproduced with a GPU usage pattern that increases the ATS flush-rate.
> 
> So blacklisting the device for ATS is still the safest thing we could do
> here.

I don't have any objection per se, but I'd hate to add a quirk to disable it 
only to remove it again in the future if we needed ATS related functionality 
later.  We are in the process of upstreaming KFD support for Carrizo (which is 
a bigger version of Stoney) and that utilizes ATS related functionality to 
provide GPU access to pageable memory.  There are no immediate requirements for 
Stoney, but that may change.

Alex

RE: [gpu-drm-radeon] question about potential dead code in vce_v2_0_enable_mgcg()

2017-06-28 Thread Deucher, Alexander

> -Original Message-
> From: Gustavo A. R. Silva [mailto:garsi...@embeddedor.com]
> Sent: Wednesday, June 28, 2017 10:22 AM
> To: Deucher, Alexander; Koenig, Christian; David Airlie
> Cc: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: [gpu-drm-radeon] question about potential dead code in
> vce_v2_0_enable_mgcg()
> 
> 
> Hello everybody,
> 
> While looking into Coverity ID 1198635 I ran into the following piece
> of code at drivers/gpu/drm/radeon/vce_v2_0.c:107:
> 
> 107void vce_v2_0_enable_mgcg(struct radeon_device *rdev, bool enable)
> 108{
> 109bool sw_cg = false;
> 110
> 111if (enable && (rdev->cg_flags &
> RADEON_CG_SUPPORT_VCE_MGCG)) {
> 112if (sw_cg)
> 113vce_v2_0_set_sw_cg(rdev, true);
> 114else
> 115vce_v2_0_set_dyn_cg(rdev, true);
> 116} else {
> 117vce_v2_0_disable_cg(rdev);
> 118
> 119if (sw_cg)
> 120vce_v2_0_set_sw_cg(rdev, false);
> 121else
> 122vce_v2_0_set_dyn_cg(rdev, false);
> 123}
> 124}
> 
> The issue here is that local variable sw_cg is never updated again
> after its initialization; which cause some code to be logically dead.
> 
> My question here is if such variable is there for testing purposes or
> if it is a sort of an old code leftover that should be removed?
> 
> In any case I can send a patch to add a comment or remove the dead code.
> 
> I'd really appreciate any comments on this.

I wanted to leave the code in for debugging if we ran into problems with 
dynamic clockgating.

Alex

RE: [gpu-drm-radeon] question about potential dead code in vce_v2_0_enable_mgcg()

2017-06-28 Thread Deucher, Alexander

> -Original Message-
> From: Gustavo A. R. Silva [mailto:garsi...@embeddedor.com]
> Sent: Wednesday, June 28, 2017 10:22 AM
> To: Deucher, Alexander; Koenig, Christian; David Airlie
> Cc: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: [gpu-drm-radeon] question about potential dead code in
> vce_v2_0_enable_mgcg()
> 
> 
> Hello everybody,
> 
> While looking into Coverity ID 1198635 I ran into the following piece
> of code at drivers/gpu/drm/radeon/vce_v2_0.c:107:
> 
> 107void vce_v2_0_enable_mgcg(struct radeon_device *rdev, bool enable)
> 108{
> 109bool sw_cg = false;
> 110
> 111if (enable && (rdev->cg_flags &
> RADEON_CG_SUPPORT_VCE_MGCG)) {
> 112if (sw_cg)
> 113vce_v2_0_set_sw_cg(rdev, true);
> 114else
> 115vce_v2_0_set_dyn_cg(rdev, true);
> 116} else {
> 117vce_v2_0_disable_cg(rdev);
> 118
> 119if (sw_cg)
> 120vce_v2_0_set_sw_cg(rdev, false);
> 121else
> 122vce_v2_0_set_dyn_cg(rdev, false);
> 123}
> 124}
> 
> The issue here is that local variable sw_cg is never updated again
> after its initialization; which cause some code to be logically dead.
> 
> My question here is if such variable is there for testing purposes or
> if it is a sort of an old code leftover that should be removed?
> 
> In any case I can send a patch to add a comment or remove the dead code.
> 
> I'd really appreciate any comments on this.

I wanted to leave the code in for debugging if we ran into problems with 
dynamic clockgating.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-06-15 Thread Deucher, Alexander

> -Original Message-
> From: Samuel Sieb [mailto:sam...@sieb.net]
> Sent: Thursday, June 15, 2017 1:02 PM
> To: Joerg Roedel; Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Deucher, Alexander; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On 06/15/2017 07:04 AM, Joerg Roedel wrote:
> > Hi Bjorn,
> >
> > On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> >> From: Joerg Roedel <jroe...@suse.de>
> >>
> >> ATS is broken on this hardware and causes IOMMU stalls and
> >> system failure. Disable ATS on these devices to make them
> >> usable again with IOMMU enabled.
> >>
> >> Note that the commit in the Fixes-tag is not buggy, it
> >> just uncovers the problem in the hardware by increasing
> >> the ATS-flush rate.
> >>
> >> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> >> Signed-off-by: Joerg Roedel <jroe...@suse.de>
> >> ---
> >>   drivers/pci/quirks.c | 19 +++
> >>   1 file changed, 19 insertions(+)
> >
> > Any more objections on this patch? Please let me know if you want to
> > have something changed.
> 
> The other patch seems to fix this issue without disabling ATS.  Isn't
> that better?

I talked to our validation team and ATS was validated on Stoney, so this patch 
is just working around something else.  The other patch fixes it and is a valid 
optimization (it should be applied eventually), but apparently the current 
behavior is allowed even if it's now optimal.  I'm not really an ATS expert.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-06-15 Thread Deucher, Alexander

> -Original Message-
> From: Samuel Sieb [mailto:sam...@sieb.net]
> Sent: Thursday, June 15, 2017 1:02 PM
> To: Joerg Roedel; Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Deucher, Alexander; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On 06/15/2017 07:04 AM, Joerg Roedel wrote:
> > Hi Bjorn,
> >
> > On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> >> From: Joerg Roedel 
> >>
> >> ATS is broken on this hardware and causes IOMMU stalls and
> >> system failure. Disable ATS on these devices to make them
> >> usable again with IOMMU enabled.
> >>
> >> Note that the commit in the Fixes-tag is not buggy, it
> >> just uncovers the problem in the hardware by increasing
> >> the ATS-flush rate.
> >>
> >> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> >> Signed-off-by: Joerg Roedel 
> >> ---
> >>   drivers/pci/quirks.c | 19 +++
> >>   1 file changed, 19 insertions(+)
> >
> > Any more objections on this patch? Please let me know if you want to
> > have something changed.
> 
> The other patch seems to fix this issue without disabling ATS.  Isn't
> that better?

I talked to our validation team and ATS was validated on Stoney, so this patch 
is just working around something else.  The other patch fixes it and is a valid 
optimization (it should be applied eventually), but apparently the current 
behavior is allowed even if it's now optimal.  I'm not really an ATS expert.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-26 Thread Deucher, Alexander

> -Original Message-
> From: David Woodhouse [mailto:dw...@infradead.org]
> Sent: Friday, May 26, 2017 8:55 AM
> To: Deucher, Alexander; 'Joerg Roedel'
> Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Daniel Drake; Samuel Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-05-26 at 11:57 +, Deucher, Alexander wrote:
> >
> > FWIW, the GPU driver does not actually use ATS at the moment so I
> > don't think we should see any ATS transactions.
> 
> That's a confusing sentence. The "GPU driver", if you mean software
> running in the OS, wouldn't be expected to have anything to do with
> ATS.
> 
> ATS is something that the CPU itself (or its DMA engine) would do.
> Instead of just performing a DMA transaction to a given bus address,
> and letting the IOMMU do the translation, the hardware might choose to
> first perform an IOTLB lookup, and then later do the actual DMA
> transaction to the pre-translated, raw physical address. Which kind of
> makes a mockery of any kind of protection the IOMMU is supposed to give
> you, but does shave a cycle or two of latency off the DMA when it
> finally happens, since the translation can be done in advance.

+ John, Suravee

Full disclosure, I'm not by any means an expert with ATS.  I guess I'm thinking 
of PRI support rather than ATS per se.  On the GPU side the GPU's memory 
controller has multiple paths to system memory, the non-ATS/PRI path and the 
ATS/PRI path.  The GPU has its own integrated MMU to virtualize the GPU's 
internal address space per GPU client.  The non-ATS/PRI path uses the GPU's MMU 
and is just "regular" dma to addresses potentially translated by the IOMMU just 
like any other device that may not have ATS support.  The system memory has to 
be resident because if the GPU faults, it can't retry the transaction.  For the 
ATS/PRI path, the GPU's MMU is bypassed and PASIDs need to be setup on the 
IOMMU for each client, but once done, transactions that use that interface 
support retries on GPU page faults (after the OS had paged the memory in and 
the IOMMU tables been updated) and other features.  I think only the ATS/PRI 
case uses the ATC on the end point.  John, Suravee, correct me if I'm wrong.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-26 Thread Deucher, Alexander

> -Original Message-
> From: David Woodhouse [mailto:dw...@infradead.org]
> Sent: Friday, May 26, 2017 8:55 AM
> To: Deucher, Alexander; 'Joerg Roedel'
> Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Daniel Drake; Samuel Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-05-26 at 11:57 +, Deucher, Alexander wrote:
> >
> > FWIW, the GPU driver does not actually use ATS at the moment so I
> > don't think we should see any ATS transactions.
> 
> That's a confusing sentence. The "GPU driver", if you mean software
> running in the OS, wouldn't be expected to have anything to do with
> ATS.
> 
> ATS is something that the CPU itself (or its DMA engine) would do.
> Instead of just performing a DMA transaction to a given bus address,
> and letting the IOMMU do the translation, the hardware might choose to
> first perform an IOTLB lookup, and then later do the actual DMA
> transaction to the pre-translated, raw physical address. Which kind of
> makes a mockery of any kind of protection the IOMMU is supposed to give
> you, but does shave a cycle or two of latency off the DMA when it
> finally happens, since the translation can be done in advance.

+ John, Suravee

Full disclosure, I'm not by any means an expert with ATS.  I guess I'm thinking 
of PRI support rather than ATS per se.  On the GPU side the GPU's memory 
controller has multiple paths to system memory, the non-ATS/PRI path and the 
ATS/PRI path.  The GPU has its own integrated MMU to virtualize the GPU's 
internal address space per GPU client.  The non-ATS/PRI path uses the GPU's MMU 
and is just "regular" dma to addresses potentially translated by the IOMMU just 
like any other device that may not have ATS support.  The system memory has to 
be resident because if the GPU faults, it can't retry the transaction.  For the 
ATS/PRI path, the GPU's MMU is bypassed and PASIDs need to be setup on the 
IOMMU for each client, but once done, transactions that use that interface 
support retries on GPU page faults (after the OS had paged the memory in and 
the IOMMU tables been updated) and other features.  I think only the ATS/PRI 
case uses the ATC on the end point.  John, Suravee, correct me if I'm wrong.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-26 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:jroe...@suse.de]
> Sent: Wednesday, May 24, 2017 4:45 AM
> To: Deucher, Alexander
> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
> p...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
> Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +, Deucher, Alexander wrote:
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
> 
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

FWIW, the GPU driver does not actually use ATS at the moment so I don't think 
we should see any ATS transactions.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-26 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:jroe...@suse.de]
> Sent: Wednesday, May 24, 2017 4:45 AM
> To: Deucher, Alexander
> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
> p...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
> Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +, Deucher, Alexander wrote:
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
> 
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

FWIW, the GPU driver does not actually use ATS at the moment so I don't think 
we should see any ATS transactions.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-24 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:jroe...@suse.de]
> Sent: Wednesday, May 24, 2017 4:45 AM
> To: Deucher, Alexander
> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
> p...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
> Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +, Deucher, Alexander wrote:
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
> 
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

Yeah, it's weird.  My ack on the patch still stands.  Just adding some 
additional data.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-24 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:jroe...@suse.de]
> Sent: Wednesday, May 24, 2017 4:45 AM
> To: Deucher, Alexander
> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
> p...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
> Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +, Deucher, Alexander wrote:
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
> 
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

Yeah, it's weird.  My ack on the patch still stands.  Just adding some 
additional data.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: David Woodhouse [mailto:dw...@infradead.org]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-04-07 at 16:46 +, Deucher, Alexander wrote:
> > >
> > > -Original Message-
> > > From: Joerg Roedel [mailto:j...@8bytes.org]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel <jroe...@suse.de>
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel <jroe...@suse.de>
> > Acked-by: Alex Deucher <alexander.deuc...@amd.com>
> 
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.
> 
> We'll want the same blacklist in Xen too, won't we?

I finally got an answer from the hw team and we validated ATS on stoney as well 
so in theory this patch shouldn’t actually be needed.  I think we may actually 
be papering over some other issue.  The following patch seems to also fix this 
issue (and other issues):
https://www.spinics.net/lists/stable/msg172631.html

Alex

> 
> > >
> > > ---
> > >  drivers/pci/quirks.c | 19 +++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > + /*
> > > +  * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > +  * early.
> > > +  */
> > > + dev_info(>dev, "QUIRK: Disabling ATS");
> > > + pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: David Woodhouse [mailto:dw...@infradead.org]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-04-07 at 16:46 +, Deucher, Alexander wrote:
> > >
> > > -Original Message-
> > > From: Joerg Roedel [mailto:j...@8bytes.org]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel 
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel 
> > Acked-by: Alex Deucher 
> 
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.
> 
> We'll want the same blacklist in Xen too, won't we?

I finally got an answer from the hw team and we validated ATS on stoney as well 
so in theory this patch shouldn’t actually be needed.  I think we may actually 
be papering over some other issue.  The following patch seems to also fix this 
issue (and other issues):
https://www.spinics.net/lists/stable/msg172631.html

Alex

> 
> > >
> > > ---
> > >  drivers/pci/quirks.c | 19 +++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > + /*
> > > +  * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > +  * early.
> > > +  */
> > > + dev_info(>dev, "QUIRK: Disabling ATS");
> > > + pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1

RE: [PATCH] drm/amd: include instead of "linux/delay.h"

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: Koenig, Christian
> Sent: Monday, May 22, 2017 4:12 AM
> To: Christian König; Masahiro Yamada; dri-de...@lists.freedesktop.org;
> Daniel Vetter; Deucher, Alexander; Daenzer, Michel; linux-
> ker...@vger.kernel.org; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd: include  instead of
> "linux/delay.h"
> 
> Am 22.05.2017 um 10:06 schrieb Daniel Vetter:
> > On Mon, May 22, 2017 at 09:55:46AM +0200, Christian König wrote:
> >> Am 22.05.2017 um 09:39 schrieb Daniel Vetter:
> >>> On Thu, May 18, 2017 at 08:47:34AM +0200, Christian König wrote:
> >>>> Am 18.05.2017 um 06:43 schrieb Masahiro Yamada:
> >>>>> Use <...> notation to include headers located in include/linux.
> >>>>> While we are here, tweak the includes order a bit to sort them
> >>>>> alphabetically.
> >>>>>
> >>>>> Signed-off-by: Masahiro Yamada
> <yamada.masah...@socionext.com>
> >>>> Reviewed-by: Christian König <christian.koe...@amd.com>
> >>> I think I'll leave this one for Alex, but I guess I can pick it up into
> >>> drm-misc too if that's simpler ... All the other include patches are in
> >>> there already.
> >> Please pick that up for drm-misc. Alex is on vacation this week and I
> >> already have all hands full replacing him.
> > Done. Aside: Switching to commit rights is a nice way to make maintainer
> > vacations real smooth :-) I wanted to chat with Alex about that anyway, I
> > guess I'll ping him when he's back.
> 
> Completely agree. One lesson learned from the past week is that Alex
> needs to stop using his personal repository on fdo.
> 
> We were asked a couple of times if I couldn't update a branch there from
> different directions, which we obviously can't do.

Regardless of what tree we use for -fixes and -next pulls or who happens to be 
on vacation, we can still pull patches like this into our internal tree for 
testing and eventual integration into -fixes or -next.  The same as any other 
patches we integrate.

Alex

> 
> Christian.
> 
> >
> > Cheers, Daniel
> >
> >> Christian.
> >>
> >>> -Daniel
> >>>>> ---
> >>>>>
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c| 4 ++--
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c  | 2 +-
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c   | 4 ++--
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 5
> +++--
> >>>>> drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c   | 8
> +---
> >>>>> drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c  | 5 +++--
> >>>>> 6 files changed, 16 insertions(+), 12 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> >>>>> index ff4ae3d..963a9e0 100644
> >>>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> >>>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> >>>>> @@ -22,10 +22,10 @@
> >>>>>  */
> >>>>> #include "pp_debug.h"
> >>>>> -#include "linux/delay.h"
> >>>>> -#include 
> >>>>> +#include 
> >>>>> #include 
> >>>>> #include 
> >>>>> +#include 
> >>>>> #include 
> >>>>> #include "cgs_common.h"
> >>>>> #include "power_state.h"
> >>>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> >>>>> index f5e8fda..f6b4dd9 100644
> >>>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> >>>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> >>>>> @@ -21,8 +21,8 @@
> >>>>>  *
> >>>>>  */
> >>>>> +#include 
> >>>>> #include 
> >>>>> -#include "linux/delay.h"
> >>>>> #include "hwmgr.h"
> >>>>> #include "amd_acpi.h"
> >>>>> #include "pp_acpi.h"
> >>>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> >>>>&

RE: [PATCH] drm/amd: include instead of "linux/delay.h"

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: Koenig, Christian
> Sent: Monday, May 22, 2017 4:12 AM
> To: Christian König; Masahiro Yamada; dri-de...@lists.freedesktop.org;
> Daniel Vetter; Deucher, Alexander; Daenzer, Michel; linux-
> ker...@vger.kernel.org; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd: include  instead of
> "linux/delay.h"
> 
> Am 22.05.2017 um 10:06 schrieb Daniel Vetter:
> > On Mon, May 22, 2017 at 09:55:46AM +0200, Christian König wrote:
> >> Am 22.05.2017 um 09:39 schrieb Daniel Vetter:
> >>> On Thu, May 18, 2017 at 08:47:34AM +0200, Christian König wrote:
> >>>> Am 18.05.2017 um 06:43 schrieb Masahiro Yamada:
> >>>>> Use <...> notation to include headers located in include/linux.
> >>>>> While we are here, tweak the includes order a bit to sort them
> >>>>> alphabetically.
> >>>>>
> >>>>> Signed-off-by: Masahiro Yamada
> 
> >>>> Reviewed-by: Christian König 
> >>> I think I'll leave this one for Alex, but I guess I can pick it up into
> >>> drm-misc too if that's simpler ... All the other include patches are in
> >>> there already.
> >> Please pick that up for drm-misc. Alex is on vacation this week and I
> >> already have all hands full replacing him.
> > Done. Aside: Switching to commit rights is a nice way to make maintainer
> > vacations real smooth :-) I wanted to chat with Alex about that anyway, I
> > guess I'll ping him when he's back.
> 
> Completely agree. One lesson learned from the past week is that Alex
> needs to stop using his personal repository on fdo.
> 
> We were asked a couple of times if I couldn't update a branch there from
> different directions, which we obviously can't do.

Regardless of what tree we use for -fixes and -next pulls or who happens to be 
on vacation, we can still pull patches like this into our internal tree for 
testing and eventual integration into -fixes or -next.  The same as any other 
patches we integrate.

Alex

> 
> Christian.
> 
> >
> > Cheers, Daniel
> >
> >> Christian.
> >>
> >>> -Daniel
> >>>>> ---
> >>>>>
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c| 4 ++--
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c  | 2 +-
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c   | 4 ++--
> >>>>> drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 5
> +++--
> >>>>> drivers/gpu/drm/amd/powerplay/smumgr/cz_smumgr.c   | 8
> +---
> >>>>> drivers/gpu/drm/amd/powerplay/smumgr/smumgr.c  | 5 +++--
> >>>>> 6 files changed, 16 insertions(+), 12 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> >>>>> index ff4ae3d..963a9e0 100644
> >>>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> >>>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
> >>>>> @@ -22,10 +22,10 @@
> >>>>>  */
> >>>>> #include "pp_debug.h"
> >>>>> -#include "linux/delay.h"
> >>>>> -#include 
> >>>>> +#include 
> >>>>> #include 
> >>>>> #include 
> >>>>> +#include 
> >>>>> #include 
> >>>>> #include "cgs_common.h"
> >>>>> #include "power_state.h"
> >>>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> >>>>> index f5e8fda..f6b4dd9 100644
> >>>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> >>>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/pp_acpi.c
> >>>>> @@ -21,8 +21,8 @@
> >>>>>  *
> >>>>>  */
> >>>>> +#include 
> >>>>> #include 
> >>>>> -#include "linux/delay.h"
> >>>>> #include "hwmgr.h"
> >>>>> #include "amd_acpi.h"
> >>>>> #include "pp_acpi.h"
> >>>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> >>>>> index 8f663ab..581374d 100644
> >>>>> --- a/drivers/gpu

RE: [PATCH] gpu: drm: radeon: refactor code

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, May 17, 2017 4:23 AM
> To: Gustavo A. R. Silva; Deucher, Alexander; David Airlie
> Cc: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] gpu: drm: radeon: refactor code
> 
> Am 17.05.2017 um 04:20 schrieb Gustavo A. R. Silva:
> > Local variable _color_ is assigned to a constant value and it is
> > never updated again. Remove this variable and refactor the code it
> > affects.
> >
> > Addresses-Coverity-ID: 1226745
> > Signed-off-by: Gustavo A. R. Silva <garsi...@embeddedor.com>
> 
> Mhm, on the one hand it looks like a valid cleanup. On the other that is
> legacy code we haven't touched in a while.
> 
> Feel free to put my Reviewed-by: Christian König
> <christian.koe...@amd.com> on it, but I'm not sure if Alex will pick it up.

It's like that to show how to do color vs. mono load detection.  It's not 
something we supported, but others using the hw may be interested.

Alex

> 
> Regards,
> Christian.
> 
> > ---
> >   drivers/gpu/drm/radeon/radeon_legacy_encoders.c | 8 +---
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> > index 222a1fa..7235d0c 100644
> > --- a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> > +++ b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> > @@ -640,7 +640,6 @@ static enum drm_connector_status
> radeon_legacy_primary_dac_detect(struct drm_enc
> > uint32_t vclk_ecp_cntl, crtc_ext_cntl;
> > uint32_t dac_ext_cntl, dac_cntl, dac_macro_cntl, tmp;
> > enum drm_connector_status found =
> connector_status_disconnected;
> > -   bool color = true;
> >
> > /* just don't bother on RN50 those chip are often connected to
> remoting
> >  * console hw and often we get failure to load detect those. So to
> make
> > @@ -665,12 +664,7 @@ static enum drm_connector_status
> radeon_legacy_primary_dac_detect(struct drm_enc
> > WREG32(RADEON_CRTC_EXT_CNTL, tmp);
> >
> > tmp = RADEON_DAC_FORCE_BLANK_OFF_EN |
> > -   RADEON_DAC_FORCE_DATA_EN;
> > -
> > -   if (color)
> > -   tmp |= RADEON_DAC_FORCE_DATA_SEL_RGB;
> > -   else
> > -   tmp |= RADEON_DAC_FORCE_DATA_SEL_G;
> > +   RADEON_DAC_FORCE_DATA_EN |
> RADEON_DAC_FORCE_DATA_SEL_RGB;
> >
> > if (ASIC_IS_R300(rdev))
> > tmp |= (0x1b6 << RADEON_DAC_FORCE_DATA_SHIFT);
>

RE: [PATCH] gpu: drm: radeon: refactor code

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, May 17, 2017 4:23 AM
> To: Gustavo A. R. Silva; Deucher, Alexander; David Airlie
> Cc: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] gpu: drm: radeon: refactor code
> 
> Am 17.05.2017 um 04:20 schrieb Gustavo A. R. Silva:
> > Local variable _color_ is assigned to a constant value and it is
> > never updated again. Remove this variable and refactor the code it
> > affects.
> >
> > Addresses-Coverity-ID: 1226745
> > Signed-off-by: Gustavo A. R. Silva 
> 
> Mhm, on the one hand it looks like a valid cleanup. On the other that is
> legacy code we haven't touched in a while.
> 
> Feel free to put my Reviewed-by: Christian König
>  on it, but I'm not sure if Alex will pick it up.

It's like that to show how to do color vs. mono load detection.  It's not 
something we supported, but others using the hw may be interested.

Alex

> 
> Regards,
> Christian.
> 
> > ---
> >   drivers/gpu/drm/radeon/radeon_legacy_encoders.c | 8 +---
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> > index 222a1fa..7235d0c 100644
> > --- a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> > +++ b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
> > @@ -640,7 +640,6 @@ static enum drm_connector_status
> radeon_legacy_primary_dac_detect(struct drm_enc
> > uint32_t vclk_ecp_cntl, crtc_ext_cntl;
> > uint32_t dac_ext_cntl, dac_cntl, dac_macro_cntl, tmp;
> > enum drm_connector_status found =
> connector_status_disconnected;
> > -   bool color = true;
> >
> > /* just don't bother on RN50 those chip are often connected to
> remoting
> >  * console hw and often we get failure to load detect those. So to
> make
> > @@ -665,12 +664,7 @@ static enum drm_connector_status
> radeon_legacy_primary_dac_detect(struct drm_enc
> > WREG32(RADEON_CRTC_EXT_CNTL, tmp);
> >
> > tmp = RADEON_DAC_FORCE_BLANK_OFF_EN |
> > -   RADEON_DAC_FORCE_DATA_EN;
> > -
> > -   if (color)
> > -   tmp |= RADEON_DAC_FORCE_DATA_SEL_RGB;
> > -   else
> > -   tmp |= RADEON_DAC_FORCE_DATA_SEL_G;
> > +   RADEON_DAC_FORCE_DATA_EN |
> RADEON_DAC_FORCE_DATA_SEL_RGB;
> >
> > if (ASIC_IS_R300(rdev))
> > tmp |= (0x1b6 << RADEON_DAC_FORCE_DATA_SHIFT);
>

RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: Cheng, Collins
> Sent: Thursday, May 11, 2017 10:51 PM
> To: Bjorn Helgaas; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Deucher, Alexander; Zytaruk, Kelly
> Subject: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
> incapable platform
> 
> Hi Helgaas,
> 
> Some AMD GPUs have hardware support for graphics SR-IOV.
> If the SR-IOV capable GPU is plugged into the SR-IOV incapable
> platform. It would cause a problem on PCI resource allocation in
> current Linux kernel.
> 
> Therefore in order to allow the PF (Physical Function) device of
> SR-IOV capable GPU to work on the SR-IOV incapable platform,
> it is required to verify conditions for initializing BAR resources
> on AMD SR-IOV capable GPUs.
> 
> If the device is an AMD graphics device and it supports
> SR-IOV it will require a large amount of resources.
> Before calling sriov_init() must ensure that the system
> BIOS also supports SR-IOV and that system BIOS has been
> able to allocate enough resources.
> If the VF BARs are zero then the system BIOS does not
> support SR-IOV or it could not allocate the resources
> and this platform will not support AMD graphics SR-IOV.
> Therefore do not call sriov_init().
> If the system BIOS does support SR-IOV then the VF BARs
> will be properly initialized to non-zero values.
> 
> Below is the patch against to Kernel 4.8 & 4.9. Please review.

For upstream, the patch should be against Linus' master or the Bjorn's pci-next 
tree.

Alex

> 
> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
> quirks.c are for specific devices and one or more device ID are defined
> for the specific devices. However my patch is for all AMD SR-IOV
> capable GPUs, that includes all existing and future AMD server GPUs.
> So it doesn't seem like a good fit to put the fix in quirks.c.
> 
> 
> 
> Signed-off-by: Collins Cheng <collins.ch...@amd.com>
> ---
>  drivers/pci/iov.c | 63
> ---
>  1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index e30f05c..e4f1405 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>   msleep(100);
>  }
> 
> +/*
> + * pci_vf_bar_valid - check if VF BARs have resource allocated
> + * @dev: the PCI device
> + * @pos: register offset of SR-IOV capability in PCI config space
> + * Returns true any VF BAR has resource allocated, false
> + * if all VF BARs are empty.
> + */
> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos)
> +{
> + int i;
> + u32 bar_value;
> + u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> + PCI_BASE_ADDRESS_MEM_TYPE_64 |
> + PCI_BASE_ADDRESS_MEM_PREFETCH);
> +
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4,
> _value);
> + if (bar_value & bar_size_mask)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> + * @dev: the PCI device
> + *
> + * Returns true if device is an AMD/ATI display adapter,
> + * otherwise return false.
> + */
> +
> +static bool is_amd_display_adapter(struct pci_dev *dev)
> +{
> + return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> + (dev->vendor == PCI_VENDOR_ID_ATI ||
> + dev->vendor == PCI_VENDOR_ID_AMD));
> +}
> +
>  /**
>   * pci_iov_init - initialize the IOV capability
>   * @dev: the PCI device
> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>   return -ENODEV;
> 
>   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> - if (pos)
> - return sriov_init(dev, pos);
> -
> + if (pos) {
> + /*
> +  * If the device is an AMD graphics device and it supports
> +  * SR-IOV it will require a large amount of resources.
> +  * Before calling sriov_init() must ensure that the system
> +  * BIOS also supports SR-IOV and that system BIOS has been
> +  * able to allocate enough resources.
> +  * If the VF BARs are zero then the system BIOS does not
> +  * support SR-IOV or it could not allocate the resources
> +  * and this platform will not support AMD graphics SR-IOV.
> +  * Therefore do not call sriov_init().
> +  * If the system BIOS does support SR-IOV then the VF BARs
> +  * will be properly initialized to non-zero values.
> +  */
> + if (is_amd_display_adapter(dev)) {
> + if (pci_vf_bar_valid(dev, pos))
> + return sriov_init(dev, pos);
> + } else {
> + return sriov_init(dev, pos);
> + }
> + }
>   return -ENODEV;
>  }
> 
> --
> 1.9.1
> 
> 
> 
> -Collins Cheng

RE: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV incapable platform

2017-05-23 Thread Deucher, Alexander

> -Original Message-
> From: Cheng, Collins
> Sent: Thursday, May 11, 2017 10:51 PM
> To: Bjorn Helgaas; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Deucher, Alexander; Zytaruk, Kelly
> Subject: [PATCH] PCI: Make SR-IOV capable GPU working on the SR-IOV
> incapable platform
> 
> Hi Helgaas,
> 
> Some AMD GPUs have hardware support for graphics SR-IOV.
> If the SR-IOV capable GPU is plugged into the SR-IOV incapable
> platform. It would cause a problem on PCI resource allocation in
> current Linux kernel.
> 
> Therefore in order to allow the PF (Physical Function) device of
> SR-IOV capable GPU to work on the SR-IOV incapable platform,
> it is required to verify conditions for initializing BAR resources
> on AMD SR-IOV capable GPUs.
> 
> If the device is an AMD graphics device and it supports
> SR-IOV it will require a large amount of resources.
> Before calling sriov_init() must ensure that the system
> BIOS also supports SR-IOV and that system BIOS has been
> able to allocate enough resources.
> If the VF BARs are zero then the system BIOS does not
> support SR-IOV or it could not allocate the resources
> and this platform will not support AMD graphics SR-IOV.
> Therefore do not call sriov_init().
> If the system BIOS does support SR-IOV then the VF BARs
> will be properly initialized to non-zero values.
> 
> Below is the patch against to Kernel 4.8 & 4.9. Please review.

For upstream, the patch should be against Linus' master or the Bjorn's pci-next 
tree.

Alex

> 
> I checked the drivers/pci/quirks.c, it looks the workarounds/fixes in
> quirks.c are for specific devices and one or more device ID are defined
> for the specific devices. However my patch is for all AMD SR-IOV
> capable GPUs, that includes all existing and future AMD server GPUs.
> So it doesn't seem like a good fit to put the fix in quirks.c.
> 
> 
> 
> Signed-off-by: Collins Cheng 
> ---
>  drivers/pci/iov.c | 63
> ---
>  1 file changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index e30f05c..e4f1405 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -523,6 +523,45 @@ static void sriov_restore_state(struct pci_dev *dev)
>   msleep(100);
>  }
> 
> +/*
> + * pci_vf_bar_valid - check if VF BARs have resource allocated
> + * @dev: the PCI device
> + * @pos: register offset of SR-IOV capability in PCI config space
> + * Returns true any VF BAR has resource allocated, false
> + * if all VF BARs are empty.
> + */
> +static bool pci_vf_bar_valid(struct pci_dev *dev, int pos)
> +{
> + int i;
> + u32 bar_value;
> + u32 bar_size_mask = ~(PCI_BASE_ADDRESS_SPACE |
> + PCI_BASE_ADDRESS_MEM_TYPE_64 |
> + PCI_BASE_ADDRESS_MEM_PREFETCH);
> +
> + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> + pci_read_config_dword(dev, pos + PCI_SRIOV_BAR + i * 4,
> _value);
> + if (bar_value & bar_size_mask)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * is_amd_display_adapter - check if it is an AMD/ATI GPU device
> + * @dev: the PCI device
> + *
> + * Returns true if device is an AMD/ATI display adapter,
> + * otherwise return false.
> + */
> +
> +static bool is_amd_display_adapter(struct pci_dev *dev)
> +{
> + return (((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) &&
> + (dev->vendor == PCI_VENDOR_ID_ATI ||
> + dev->vendor == PCI_VENDOR_ID_AMD));
> +}
> +
>  /**
>   * pci_iov_init - initialize the IOV capability
>   * @dev: the PCI device
> @@ -537,9 +576,27 @@ int pci_iov_init(struct pci_dev *dev)
>   return -ENODEV;
> 
>   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> - if (pos)
> - return sriov_init(dev, pos);
> -
> + if (pos) {
> + /*
> +  * If the device is an AMD graphics device and it supports
> +  * SR-IOV it will require a large amount of resources.
> +  * Before calling sriov_init() must ensure that the system
> +  * BIOS also supports SR-IOV and that system BIOS has been
> +  * able to allocate enough resources.
> +  * If the VF BARs are zero then the system BIOS does not
> +  * support SR-IOV or it could not allocate the resources
> +  * and this platform will not support AMD graphics SR-IOV.
> +  * Therefore do not call sriov_init().
> +  * If the system BIOS does support SR-IOV then the VF BARs
> +  * will be properly initialized to non-zero values.
> +  */
> + if (is_amd_display_adapter(dev)) {
> + if (pci_vf_bar_valid(dev, pos))
> + return sriov_init(dev, pos);
> + } else {
> + return sriov_init(dev, pos);
> + }
> + }
>   return -ENODEV;
>  }
> 
> --
> 1.9.1
> 
> 
> 
> -Collins Cheng

RE: PROBLEM: Issue With Radeon graphics With Lenovo Laptop G50-80

2017-05-08 Thread Deucher, Alexander

> -Original Message-
> From: Anil Nair [mailto:anilcol...@gmail.com]
> Sent: Monday, May 08, 2017 11:42 AM
> To: Deucher, Alexander; lkml
> Subject: PROBLEM: Issue With Radeon graphics With Lenovo Laptop G50-80
> 
> Hi,
> 
> I recently decided to try out the latest Linux kernel release v4.11, I
> compiled and installed it, while booting the latest kernel; the screen
> freezes.There is no response from the kernel, the only option I have
> is to turn off and start it again.

Can you get the dmesg output after boot?  Defer loading the radeon driver 
(append modprobe.blacklist=radeon to the kernel command line in grub), then 
boot into a text console and attempt to manually load the driver (modprobe 
radeon).

> 
> I tried booting the kernel using debian recovery option, when i tried
> to load the radeon module here is the error i got,
> 
> anilnair@anilnair-lenovo:~$ dmesg | grep radeon
> [1.255633] [drm] VGACON disable radeon kernel modesetting.
> [1.255698] [drm:radeon_init [radeon]] *ERROR* No UMS support in
> radeon module!
> [   22.261475] [drm] VGACON disable radeon kernel modesetting.
> [   22.261500] [drm:radeon_init [radeon]] *ERROR* No UMS support in
> radeon module!
> anilnair@anilnair-lenovo:~$ sudo modprobe radeon
> [  716.704231] [drm] VGACON disable radeon kernel modesetting.
> [  716.704267] [drm:radeon_init [radeon]] *ERROR* No UMS support in
> radeon module!
> 
> I am attaching the relevant details, Please let me know if any further
> information is needed.

If you have nomodeset or radeon.modeset=0 on the kernel command line you are 
effectively disabling the driver.  The driver refuses to load if those are set. 
 Please remove them from your kernel command line to load the driver.

Alex

RE: PROBLEM: Issue With Radeon graphics With Lenovo Laptop G50-80

2017-05-08 Thread Deucher, Alexander

> -Original Message-
> From: Anil Nair [mailto:anilcol...@gmail.com]
> Sent: Monday, May 08, 2017 11:42 AM
> To: Deucher, Alexander; lkml
> Subject: PROBLEM: Issue With Radeon graphics With Lenovo Laptop G50-80
> 
> Hi,
> 
> I recently decided to try out the latest Linux kernel release v4.11, I
> compiled and installed it, while booting the latest kernel; the screen
> freezes.There is no response from the kernel, the only option I have
> is to turn off and start it again.

Can you get the dmesg output after boot?  Defer loading the radeon driver 
(append modprobe.blacklist=radeon to the kernel command line in grub), then 
boot into a text console and attempt to manually load the driver (modprobe 
radeon).

> 
> I tried booting the kernel using debian recovery option, when i tried
> to load the radeon module here is the error i got,
> 
> anilnair@anilnair-lenovo:~$ dmesg | grep radeon
> [1.255633] [drm] VGACON disable radeon kernel modesetting.
> [1.255698] [drm:radeon_init [radeon]] *ERROR* No UMS support in
> radeon module!
> [   22.261475] [drm] VGACON disable radeon kernel modesetting.
> [   22.261500] [drm:radeon_init [radeon]] *ERROR* No UMS support in
> radeon module!
> anilnair@anilnair-lenovo:~$ sudo modprobe radeon
> [  716.704231] [drm] VGACON disable radeon kernel modesetting.
> [  716.704267] [drm:radeon_init [radeon]] *ERROR* No UMS support in
> radeon module!
> 
> I am attaching the relevant details, Please let me know if any further
> information is needed.

If you have nomodeset or radeon.modeset=0 on the kernel command line you are 
effectively disabling the driver.  The driver refuses to load if those are set. 
 Please remove them from your kernel command line to load the driver.

Alex

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-04 Thread Deucher, Alexander

> -Original Message-
> From: David Woodhouse [mailto:dw...@infradead.org]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-04-07 at 16:46 +, Deucher, Alexander wrote:
> > >
> > > -Original Message-
> > > From: Joerg Roedel [mailto:j...@8bytes.org]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel <jroe...@suse.de>
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel <jroe...@suse.de>
> > Acked-by: Alex Deucher <alexander.deuc...@amd.com>
> 
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.

It's just Stoney, that is the only ID.  ATS is validated on the other GPU parts.

Alex

> 
> We'll want the same blacklist in Xen too, won't we?
> 
> > >
> > > ---
> > >  drivers/pci/quirks.c | 19 +++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > + /*
> > > +  * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > +  * early.
> > > +  */
> > > + dev_info(>dev, "QUIRK: Disabling ATS");
> > > + pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-05-04 Thread Deucher, Alexander

> -Original Message-
> From: David Woodhouse [mailto:dw...@infradead.org]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-04-07 at 16:46 +, Deucher, Alexander wrote:
> > >
> > > -Original Message-
> > > From: Joerg Roedel [mailto:j...@8bytes.org]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel 
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel 
> > Acked-by: Alex Deucher 
> 
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.

It's just Stoney, that is the only ID.  ATS is validated on the other GPU parts.

Alex

> 
> We'll want the same blacklist in Xen too, won't we?
> 
> > >
> > > ---
> > >  drivers/pci/quirks.c | 19 +++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > + /*
> > > +  * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > +  * early.
> > > +  */
> > > + dev_info(>dev, "QUIRK: Disabling ATS");
> > > + pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-04-07 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Friday, April 07, 2017 10:32 AM
> To: Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> From: Joerg Roedel <jroe...@suse.de>
> 
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
> 
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
> 
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <jroe...@suse.de>

Acked-by: Alex Deucher <alexander.deuc...@amd.com>

> ---
>  drivers/pci/quirks.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> + /*
> +  * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> +  * early.
> +  */
> + dev_info(>dev, "QUIRK: Disabling ATS");
> + pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1

RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

2017-04-07 Thread Deucher, Alexander

> -Original Message-
> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Friday, April 07, 2017 10:32 AM
> To: Bjorn Helgaas
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> From: Joerg Roedel 
> 
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
> 
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
> 
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel 

Acked-by: Alex Deucher 

> ---
>  drivers/pci/quirks.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> + /*
> +  * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> +  * early.
> +  */
> + dev_info(>dev, "QUIRK: Disabling ATS");
> + pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1

RE: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS

2017-03-29 Thread Deucher, Alexander

> -Original Message-
> From: 'Joerg Roedel' [mailto:jroe...@suse.de]
> Sent: Tuesday, March 28, 2017 6:26 PM
> To: Deucher, Alexander
> Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Daniel Drake; Nath, Arindam
> Subject: Re: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS
> 
> On Tue, Mar 28, 2017 at 09:13:23PM +, Deucher, Alexander wrote:
> > If I understand Arindam's patch correctly, it only flushes TLB entries
> > for domains in the flush queue whereas the previous behavior was to
> > flush all domains.  If there was no TLB flush in the queue for that
> > domain, could flushing it cause a problem?
> 
> No, that can't cause a problem. An io/tlb flush for the device is just a
> message that the device should invalidate its own tlb. The device can't
> know and doesn't need to know whether the page-tables it used to fill
> the tlb really changed.
> 
> As it looks, the problem we are seeing here is that we are sending a
> large amount of these requests to the GPU device, and wait for its
> completion every time. This shouldn't be a problem for ATS devices, but
> the GPU here seems to fail at some point and doesn't answer to the
> invalidation request anymore, causing the completion-wait loop timeouts.
> 
> Arindam's patch makes the high flush-frequency less likely, but it can
> still happen, depending on how the GPU is used. So its the best to
> keep ATS disabled on the device as it doesn't work correctly and we risk
> running in the same problem again when we leave it enabled and just make
> the trigger less likely.

Thanks for clarifying.  The patch is:
Acked-by: Alex Deucher <alexander.deuc...@amd.com>

1 2 3 4 >

1 - 100 of 333 matches

Mail list logo