linux-next: build failure after merge of the drm-misc tree
Hi all, After merging the drm-misc tree, today's linux-next build (powerpc allyesconfig) failed like this: drivers/gpu/drm/nouveau/nouveau_connector.c: In function 'nouveau_connector_of_detect': drivers/gpu/drm/nouveau/nouveau_connector.c:463:59: error: 'struct drm_device' has no member named 'pdev'; did you mean 'dev'? 463 | struct device_node *cn, *dn = pci_device_to_OF_node(dev->pdev); | ^~~~ | dev Caused by commit b347e04452ff ("drm: Remove pdev field from struct drm_device") I have reverted that commit for today. -- Cheers, Stephen Rothwell pgpkMWV4Jnc95.pgp Description: OpenPGP digital signature
[Bug 212957] [radeon] kernel NULL pointer dereference during system boot
https://bugzilla.kernel.org/show_bug.cgi?id=212957 --- Comment #4 from Dennis Foster (m...@dennisfoster.us) --- Created attachment 296723 --> https://bugzilla.kernel.org/attachment.cgi?id=296723=edit journalctl - bad commit Attached is a part of the system log after checking out the bisected commit. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 212957] [radeon] kernel NULL pointer dereference during system boot
https://bugzilla.kernel.org/show_bug.cgi?id=212957 --- Comment #3 from Dennis Foster (m...@dennisfoster.us) --- (In reply to Alex Deucher from comment #1) > Can you bisect? 0575ff3d33cd62123991d2a5d0d8459d72592388 is the first bad commit commit 0575ff3d33cd62123991d2a5d0d8459d72592388 Author: Christian König Date: Thu Oct 8 13:01:35 2020 +0200 drm/radeon: stop using pages with drm_prime_sg_to_page_addr_arrays v2 This is deprecated. v2: also use ttm_sg_tt_init to avoid allocating the page array. Signed-off-by: Christian König Acked-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/403832/ drivers/gpu/drm/radeon/radeon_ttm.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) I wasn't able to revert this commit on v5.12, because there's another commit c67e62790f5c156705fb162da840c6d89d0af6e0 where it seems like that file was changed drastically, in particular drm_prime_sg_to_page_addr_arrays() was replaced with drm_prime_sg_to_dma_addr_array(). -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
linux-next: manual merge of the drm-intel tree with Linus' tree
Hi all, Today's linux-next merge of the drm-intel tree got a conflict in: drivers/gpu/drm/i915/intel_pm.c between commit: e7c6e405e171 ("Fix misc new gcc warnings") from Linus' tree and commit: c6deb5e97ded ("drm/i915/pm: Make the wm parameter of print_wm_latency a pointer") from the drm-intel tree. I fixed it up (I just used the latter version) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpuMhYkQ758Y.pgp Description: OpenPGP digital signature
linux-next: manual merge of the amdgpu tree with the drm-misc tree
Hi all, Today's linux-next merge of the amdgpu tree got a conflict in: drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c between commit: c777dc9e7933 ("drm/ttm: move the page_alignment into the BO v2") from the drm-misc tree and commit: dd03daec0ff1 ("drm/amdgpu: restructure amdgpu_vram_mgr_new") from the amdgpu tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index f7235438535f,e2cbe19404c0.. --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@@ -448,10 -391,10 +391,10 @@@ static int amdgpu_vram_mgr_new(struct t pages_per_node = HPAGE_PMD_NR; #else /* default to 2MB */ - pages_per_node = (2UL << (20UL - PAGE_SHIFT)); + pages_per_node = 2UL << (20UL - PAGE_SHIFT); #endif - pages_per_node = max((uint32_t)pages_per_node, -tbo->page_alignment); + pages_per_node = max_t(uint32_t, pages_per_node, - mem->page_alignment); ++ tbo->page_alignment); num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node); } @@@ -469,38 -412,29 +412,29 @@@ mem->start = 0; pages_left = mem->num_pages; - spin_lock(>lock); - for (i = 0; pages_left >= pages_per_node; ++i) { - unsigned long pages = rounddown_pow_of_two(pages_left); + /* Limit maximum size to 2GB due to SG table limitations */ + pages = min(pages_left, 2UL << (30 - PAGE_SHIFT)); - /* Limit maximum size to 2GB due to SG table limitations */ - pages = min(pages, (2UL << (30 - PAGE_SHIFT))); - - r = drm_mm_insert_node_in_range(mm, [i], pages, - pages_per_node, 0, - place->fpfn, lpfn, - mode); - if (unlikely(r)) - break; - - vis_usage += amdgpu_vram_mgr_vis_size(adev, [i]); - amdgpu_vram_mgr_virt_start(mem, [i]); - pages_left -= pages; - } - - for (; pages_left; ++i) { - unsigned long pages = min(pages_left, pages_per_node); + i = 0; + spin_lock(>lock); + while (pages_left) { - uint32_t alignment = mem->page_alignment; + uint32_t alignment = tbo->page_alignment; - if (pages == pages_per_node) + if (pages >= pages_per_node) alignment = pages_per_node; - r = drm_mm_insert_node_in_range(mm, [i], - pages, alignment, 0, - place->fpfn, lpfn, - mode); - if (unlikely(r)) + r = drm_mm_insert_node_in_range(mm, [i], pages, alignment, + 0, place->fpfn, lpfn, mode); + if (unlikely(r)) { + if (pages > pages_per_node) { + if (is_power_of_2(pages)) + pages = pages / 2; + else + pages = rounddown_pow_of_two(pages); + continue; + } goto error; + } vis_usage += amdgpu_vram_mgr_vis_size(adev, [i]); amdgpu_vram_mgr_virt_start(mem, [i]); pgpxc9ZLjozxw.pgp Description: OpenPGP digital signature
Re: [v3 1/2] dt-bindings: backlight: add DisplayPort aux backlight
Hi, On Tue, May 11, 2021 at 11:12 AM wrote: > > On 01-05-2021 03:08, Doug Anderson wrote: > > Hi, > > > > On Fri, Apr 30, 2021 at 8:10 AM wrote: > >> > >> On 30-04-2021 02:33, Doug Anderson wrote: > >> > Hi, > >> > > >> > On Thu, Apr 29, 2021 at 11:04 AM Rob Herring wrote: > >> >> > >> >> On Mon, Apr 26, 2021 at 11:29:15AM +0530, Rajeev Nandan wrote: > >> >> > Add bindings for DisplayPort aux backlight driver. > >> >> > > >> >> > Changes in v2: > >> >> > - New > >> >> > > >> >> > Signed-off-by: Rajeev Nandan > >> >> > --- > >> >> > .../bindings/leds/backlight/dp-aux-backlight.yaml | 49 > >> >> > ++ > >> >> > 1 file changed, 49 insertions(+) > >> >> > create mode 100644 > >> >> > Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > > >> >> > diff --git > >> >> > a/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > > >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > new file mode 100644 > >> >> > index ..0fa8bf0 > >> >> > --- /dev/null > >> >> > +++ > >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > @@ -0,0 +1,49 @@ > >> >> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > >> >> > +%YAML 1.2 > >> >> > +--- > >> >> > +$id: > >> >> > http://devicetree.org/schemas/leds/backlight/dp-aux-backlight.yaml# > >> >> > +$schema: http://devicetree.org/meta-schemas/core.yaml# > >> >> > + > >> >> > +title: DisplayPort aux backlight driver bindings > >> >> > + > >> >> > +maintainers: > >> >> > + - Rajeev Nandan > >> >> > + > >> >> > +description: > >> >> > + Backlight driver to control the brightness over DisplayPort aux > >> >> > channel. > >> >> > + > >> >> > +allOf: > >> >> > + - $ref: common.yaml# > >> >> > + > >> >> > +properties: > >> >> > + compatible: > >> >> > +const: dp-aux-backlight > >> >> > + > >> >> > + ddc-i2c-bus: > >> >> > +$ref: /schemas/types.yaml#/definitions/phandle > >> >> > +description: > >> >> > + A phandle to the system I2C controller connected to the DDC > >> >> > bus used > >> >> > + for the DisplayPort AUX channel. > >> >> > + > >> >> > + enable-gpios: > >> >> > +maxItems: 1 > >> >> > +description: GPIO specifier for backlight enable pin. > >> >> > + > >> >> > + max-brightness: true > >> >> > + > >> >> > +required: > >> >> > + - compatible > >> >> > + - ddc-i2c-bus > >> >> > + > >> >> > +additionalProperties: false > >> >> > + > >> >> > +examples: > >> >> > + - | > >> >> > +backlight { > >> >> > +compatible = "dp-aux-backlight"; > >> >> > +ddc-i2c-bus = <_bridge>; > >> >> > +enable-gpios = < 12 GPIO_ACTIVE_HIGH>; > >> >> > >> >> So the DDC bus is connected to a backlight and also a panel? This > >> >> binding is not reflecting the h/w, but rather what you want for some > >> >> driver. > >> >> > >> >> There's only one thing here and that's an eDP panel which supports > >> >> backlight control via DP aux channel. You can figure all that out from > >> >> the panel's compatible and/or reading the EDID. > >> >> > >> >> You might also be interested in this thread: > >> >> > >> >> https://lore.kernel.org/lkml/yiksdtjcihgnv...@orome.fritz.box/ > >> > > >> > I think Rajeev needs to rework everything anyway as per: > >> > > >> > https://lore.kernel.org/r/87zgxl5qar@intel.com > >> > > >> > ...but you're right that it makes sense not to model the backlight as > >> > a separate node in the device tree. The panel driver can handle > >> > setting up the backlight. > >> > > >> > -Doug > >> > >> It was not a good idea to create a separate backlight driver and use > >> ddc-i2c-bus to get access to DP aux. I am working to move the code > >> to the panel driver and to utilize the new DRM helper functions > >> (drm_edp_backlight_*) Lyude has added [1]. > >> > >> To use these helper functions, the panel driver should have access to > >> the > >> "struct drm_dp_aux *". The simple-panel has a "ddc-i2c-bus" property > >> to give the panel access to the DDC bus and is currently being used to > >> get the EDID from the panel. Can I use the same ddc bus i2c_adapter to > >> get > >> the "struct drm_dp_aux *"? > >> > >> As per the suggestion [2], I get the "struct drm_dp_aux *" from the > >> i2c_adapter of ddc bus (maybe I didn't understand the suggestion > >> correctly), > >> and, it turned out, the way I have implemented is not the right way > >> [3]. > >> So, I am afraid to use the same method in the panel driver. > >> > >> > >> [1] https://lore.kernel.org/dri-devel/871rb5bcf9@intel.com/ > >> [2] https://www.spinics.net/lists/dri-devel/msg295429.html > >> [3] > >> https://lore.kernel.org/dri-devel/2021042616.4lc3ekxjugjr3...@maple.lan/ > > > > So it's definitely up to maintainers, not me. ...but I guess I would > > have expected something like a new property called "ddc-aux-bus". Then > > you'd have to create a new API call called something
Re: [v3 1/2] dt-bindings: backlight: add DisplayPort aux backlight
Hi Rajeevny, On Tue, May 11, 2021 at 11:41:57PM +0530, rajee...@codeaurora.org wrote: > On 01-05-2021 03:08, Doug Anderson wrote: > > On Fri, Apr 30, 2021 at 8:10 AM wrote: > >> On 30-04-2021 02:33, Doug Anderson wrote: > >> > On Thu, Apr 29, 2021 at 11:04 AM Rob Herring wrote: > >> >> On Mon, Apr 26, 2021 at 11:29:15AM +0530, Rajeev Nandan wrote: > >> >> > Add bindings for DisplayPort aux backlight driver. > >> >> > > >> >> > Changes in v2: > >> >> > - New > >> >> > > >> >> > Signed-off-by: Rajeev Nandan > >> >> > --- > >> >> > .../bindings/leds/backlight/dp-aux-backlight.yaml | 49 > >> >> > ++ > >> >> > 1 file changed, 49 insertions(+) > >> >> > create mode 100644 > >> >> > Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > > >> >> > diff --git > >> >> > a/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > > >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > new file mode 100644 > >> >> > index ..0fa8bf0 > >> >> > --- /dev/null > >> >> > +++ > >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml > >> >> > @@ -0,0 +1,49 @@ > >> >> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > >> >> > +%YAML 1.2 > >> >> > +--- > >> >> > +$id: > >> >> > http://devicetree.org/schemas/leds/backlight/dp-aux-backlight.yaml# > >> >> > +$schema: http://devicetree.org/meta-schemas/core.yaml# > >> >> > + > >> >> > +title: DisplayPort aux backlight driver bindings > >> >> > + > >> >> > +maintainers: > >> >> > + - Rajeev Nandan > >> >> > + > >> >> > +description: > >> >> > + Backlight driver to control the brightness over DisplayPort aux > >> >> > channel. > >> >> > + > >> >> > +allOf: > >> >> > + - $ref: common.yaml# > >> >> > + > >> >> > +properties: > >> >> > + compatible: > >> >> > +const: dp-aux-backlight > >> >> > + > >> >> > + ddc-i2c-bus: > >> >> > +$ref: /schemas/types.yaml#/definitions/phandle > >> >> > +description: > >> >> > + A phandle to the system I2C controller connected to the DDC > >> >> > bus used > >> >> > + for the DisplayPort AUX channel. > >> >> > + > >> >> > + enable-gpios: > >> >> > +maxItems: 1 > >> >> > +description: GPIO specifier for backlight enable pin. > >> >> > + > >> >> > + max-brightness: true > >> >> > + > >> >> > +required: > >> >> > + - compatible > >> >> > + - ddc-i2c-bus > >> >> > + > >> >> > +additionalProperties: false > >> >> > + > >> >> > +examples: > >> >> > + - | > >> >> > +backlight { > >> >> > +compatible = "dp-aux-backlight"; > >> >> > +ddc-i2c-bus = <_bridge>; > >> >> > +enable-gpios = < 12 GPIO_ACTIVE_HIGH>; > >> >> > >> >> So the DDC bus is connected to a backlight and also a panel? This > >> >> binding is not reflecting the h/w, but rather what you want for some > >> >> driver. > >> >> > >> >> There's only one thing here and that's an eDP panel which supports > >> >> backlight control via DP aux channel. You can figure all that out from > >> >> the panel's compatible and/or reading the EDID. > >> >> > >> >> You might also be interested in this thread: > >> >> > >> >> https://lore.kernel.org/lkml/yiksdtjcihgnv...@orome.fritz.box/ > >> > > >> > I think Rajeev needs to rework everything anyway as per: > >> > > >> > https://lore.kernel.org/r/87zgxl5qar@intel.com > >> > > >> > ...but you're right that it makes sense not to model the backlight as > >> > a separate node in the device tree. The panel driver can handle > >> > setting up the backlight. > >> > >> It was not a good idea to create a separate backlight driver and use > >> ddc-i2c-bus to get access to DP aux. I am working to move the code > >> to the panel driver and to utilize the new DRM helper functions > >> (drm_edp_backlight_*) Lyude has added [1]. > >> > >> To use these helper functions, the panel driver should have access to the > >> "struct drm_dp_aux *". The simple-panel has a "ddc-i2c-bus" property > >> to give the panel access to the DDC bus and is currently being used to > >> get the EDID from the panel. Can I use the same ddc bus i2c_adapter to get > >> the "struct drm_dp_aux *"? > >> > >> As per the suggestion [2], I get the "struct drm_dp_aux *" from the > >> i2c_adapter of ddc bus (maybe I didn't understand the suggestion > >> correctly), > >> and, it turned out, the way I have implemented is not the right way [3]. > >> So, I am afraid to use the same method in the panel driver. > >> > >> > >> [1] https://lore.kernel.org/dri-devel/871rb5bcf9@intel.com/ > >> [2] https://www.spinics.net/lists/dri-devel/msg295429.html > >> [3] > >> https://lore.kernel.org/dri-devel/2021042616.4lc3ekxjugjr3...@maple.lan/ > > > > So it's definitely up to maintainers, not me. ...but I guess I would > > have expected something like a new property called "ddc-aux-bus". Then > > you'd have to create a new API call called something like > >
Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
On 11.05.2021 17:16, Daniel Vetter wrote: > On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote: >> From: Michal Wajdeczko >> >> New GuC firmware will unify format of MMIO and CTB H2G messages. >> Introduce their definitions now to allow gradual transition of >> our code to match new changes. >> >> Signed-off-by: Michal Wajdeczko >> Signed-off-by: Matthew Brost >> Cc: Michał Winiarski >> --- >> .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++ >> 1 file changed, 226 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h >> b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h >> index 775e21f3058c..1c264819aa03 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h >> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h >> @@ -6,6 +6,232 @@ >> #ifndef _ABI_GUC_MESSAGES_ABI_H >> #define _ABI_GUC_MESSAGES_ABI_H >> >> +/** >> + * DOC: HXG Message > > These aren't useful if we don't pull them in somewhere in the > Documentation/gpu hierarchy. General comment, and also please check that > it all renders correctly still. Patch that connects all these DOC sections into i915.rst is still on private branch, where I'm trying to verify all html rendering, and ... > > btw if you respin a patch not originally by you we generally add a (v1) to > the original s-o-b line (or whever the version split was) and explain in > the usual changelog in the commit message what was changed. > > This holds for the entire series ofc. > -Daniel > >> + * >> + * All messages exchanged with GuC are defined using 32 bit dwords. >> + * First dword is treated as a message header. Remaining dwords are >> optional. >> + * >> + * .. _HXG Message: where such workarounds from early documentation are already removed, since they are not needed any more starting from commit ef09989594bf ("scripts/kernel-doc: add internal hyperlink to DOC: sections") Michal >> + * >> + * >> +---+---+--+ >> + * | | Bits | Description >>| >> + * >> +===+===+==+ >> + * | | | >>| >> + * | 0 |31 | **ORIGIN** - originator of the message >>| >> + * | | | - _`GUC_HXG_ORIGIN_HOST` = 0 >>| >> + * | | | - _`GUC_HXG_ORIGIN_GUC` = 1 >>| >> + * | | | >>| >> + * | >> +---+--+ >> + * | | 30:28 | **TYPE** - message type >>| >> + * | | | - _`GUC_HXG_TYPE_REQUEST` = 0 >>| >> + * | | | - _`GUC_HXG_TYPE_EVENT` = 1 >>| >> + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3 >>| >> + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5 >>| >> + * | | | - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6 >>| >> + * | | | - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7 >>| >> + * | >> +---+--+ >> + * | | 27:0 | **AUX** - auxiliary data (depends TYPE) >>| >> + * >> +---+---+--+ >> + * | 1 | 31:0 | optional payload (depends on TYPE) >>| >> + * +---+---+ >>| >> + * |...| | >>| >> + * +---+---+ >>| >> + * | n | 31:0 | >>| >> + * >> +---+---+--+ >> + */ >> + >> +#define GUC_HXG_MSG_MIN_LEN 1u >> +#define GUC_HXG_MSG_0_ORIGIN(0x1 << 31) >> +#define GUC_HXG_ORIGIN_HOST 0u >> +#define GUC_HXG_ORIGIN_GUC1u >> +#define GUC_HXG_MSG_0_TYPE (0x7 << 28) >> +#define GUC_HXG_TYPE_REQUEST 0u >> +#define GUC_HXG_TYPE_EVENT1u >> +#define GUC_HXG_TYPE_NO_RESPONSE_BUSY 3u >> +#define GUC_HXG_TYPE_NO_RESPONSE_RETRY5u >> +#define GUC_HXG_TYPE_RESPONSE_FAILURE 6u >> +#define GUC_HXG_TYPE_RESPONSE_SUCCESS 7u >> +#define GUC_HXG_MSG_0_AUX (0xfff << 0) >> + >> +/** >> + * DOC: HXG Request >> + * >> + * The `HXG Request`_ message should be used to initiate
RE: [PATCH 1/3] virtio-gpu uapi: Add VIRTIO_GPU_F_EXPLICIT_FLUSH feature
Hi Gerd, > On Tue, May 11, 2021 at 01:36:08AM -0700, Vivek Kasireddy wrote: > > This feature enables the Guest to wait until a flush has been > > performed on a buffer it has submitted to the Host. > > This needs a virtio-spec update documenting the new feature. [Kasireddy, Vivek] Yes, I was planning to do that after getting your thoughts on this feature. > > > + VIRTIO_GPU_CMD_WAIT_FLUSH, > > Why a new command? > > If I understand it correctly you want wait until > VIRTIO_GPU_CMD_RESOURCE_FLUSH is done. We could > extend the VIRTIO_GPU_CMD_RESOURCE_FLUSH command > for that instead. [Kasireddy, Vivek] VIRTIO_GPU_CMD_RESOURCE_FLUSH can trigger/queue a redraw that may be performed synchronously or asynchronously depending on the UI (Glarea is async and gtk-egl is sync but can be made async). I'd like to make the Guest wait until the actual redraw happens (until GlFLush or eglSwapBuffers, again depending on the UI). However, as part of this feature (explicit flush), I'd like to make the Guest wait until the current resource (as specified by resource_flush or set_scanout) is flushed or synchronized. But for a different feature I am thinking of (explicit sync), I'd like to make the Guest wait for the previous buffer/resource submitted (available via old_state->fb). I think it may be possible to accomplish both features by overloading resource_flush but given the various combinations of Guests (Android/Chrome OS, Windows, Linux) and Hosts (Android/Chrome OS, Linux) that are or will be supported with virtio-gpu + i915, I figured adding a new command might be cleaner. Thanks, Vivek > > take care, > Gerd
Re: [PATCH v2] drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
On Mon, May 10, 2021 at 11:33 PM Kai-Heng Feng wrote: > > On Fri, Apr 30, 2021 at 12:57 PM Kai-Heng Feng > wrote: > > > > Screen flickers rapidly when two 4K 60Hz monitors are in use. This issue > > doesn't happen when one monitor is 4K 60Hz (pixelclock 594MHz) and > > another one is 4K 30Hz (pixelclock 297MHz). > > > > The issue is gone after setting "power_dpm_force_performance_level" to > > "high". Following the indication, we found that the issue occurs when > > sclk is too low. > > > > So resolve the issue by disabling sclk switching when there are two > > monitors requires high pixelclock (> 297MHz). > > > > v2: > > - Only apply the fix to Oland. > > Signed-off-by: Kai-Heng Feng > > A gentle ping... Applied. Thanks for the reminder. Alex > > > --- > > drivers/gpu/drm/radeon/radeon.h| 1 + > > drivers/gpu/drm/radeon/radeon_pm.c | 8 > > drivers/gpu/drm/radeon/si_dpm.c| 3 +++ > > 3 files changed, 12 insertions(+) > > > > diff --git a/drivers/gpu/drm/radeon/radeon.h > > b/drivers/gpu/drm/radeon/radeon.h > > index 42281fce552e6..56ed5634cebef 100644 > > --- a/drivers/gpu/drm/radeon/radeon.h > > +++ b/drivers/gpu/drm/radeon/radeon.h > > @@ -1549,6 +1549,7 @@ struct radeon_dpm { > > void*priv; > > u32 new_active_crtcs; > > int new_active_crtc_count; > > + int high_pixelclock_count; > > u32 current_active_crtcs; > > int current_active_crtc_count; > > bool single_display; > > diff --git a/drivers/gpu/drm/radeon/radeon_pm.c > > b/drivers/gpu/drm/radeon/radeon_pm.c > > index 0c1950f4e146f..3861c0b98fcf3 100644 > > --- a/drivers/gpu/drm/radeon/radeon_pm.c > > +++ b/drivers/gpu/drm/radeon/radeon_pm.c > > @@ -1767,6 +1767,7 @@ static void radeon_pm_compute_clocks_dpm(struct > > radeon_device *rdev) > > struct drm_device *ddev = rdev->ddev; > > struct drm_crtc *crtc; > > struct radeon_crtc *radeon_crtc; > > + struct radeon_connector *radeon_connector; > > > > if (!rdev->pm.dpm_enabled) > > return; > > @@ -1776,6 +1777,7 @@ static void radeon_pm_compute_clocks_dpm(struct > > radeon_device *rdev) > > /* update active crtc counts */ > > rdev->pm.dpm.new_active_crtcs = 0; > > rdev->pm.dpm.new_active_crtc_count = 0; > > + rdev->pm.dpm.high_pixelclock_count = 0; > > if (rdev->num_crtc && rdev->mode_info.mode_config_initialized) { > > list_for_each_entry(crtc, > > >mode_config.crtc_list, head) { > > @@ -1783,6 +1785,12 @@ static void radeon_pm_compute_clocks_dpm(struct > > radeon_device *rdev) > > if (crtc->enabled) { > > rdev->pm.dpm.new_active_crtcs |= (1 << > > radeon_crtc->crtc_id); > > rdev->pm.dpm.new_active_crtc_count++; > > + if (!radeon_crtc->connector) > > + continue; > > + > > + radeon_connector = > > to_radeon_connector(radeon_crtc->connector); > > + if > > (radeon_connector->pixelclock_for_modeset > 297000) > > + > > rdev->pm.dpm.high_pixelclock_count++; > > } > > } > > } > > diff --git a/drivers/gpu/drm/radeon/si_dpm.c > > b/drivers/gpu/drm/radeon/si_dpm.c > > index 9186095518047..3cc2b96a7f368 100644 > > --- a/drivers/gpu/drm/radeon/si_dpm.c > > +++ b/drivers/gpu/drm/radeon/si_dpm.c > > @@ -2979,6 +2979,9 @@ static void si_apply_state_adjust_rules(struct > > radeon_device *rdev, > > (rdev->pdev->device == 0x6605)) { > > max_sclk = 75000; > > } > > + > > + if (rdev->pm.dpm.high_pixelclock_count > 1) > > + disable_sclk_switching = true; > > } > > > > if (rps->vce_active) { > > -- > > 2.30.2 > >
Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
On Tue, May 11, 2021 at 07:43:30PM +0200, Daniel Vetter wrote: > On Tue, May 11, 2021 at 10:01:28AM -0700, Matthew Brost wrote: > > On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote: > > > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote: > > > > Add lrc descriptor context lookup array which can resolve the > > > > intel_context from the lrc descriptor index. In addition to lookup, it > > > > can determine in the lrc descriptor context is currently registered with > > > > the GuC by checking if an entry for a descriptor index is present. > > > > Future patches in the series will make use of this array. > > > > > > > > Cc: John Harrison > > > > Signed-off-by: Matthew Brost > > > > --- > > > > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 5 +++ > > > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +-- > > > > 2 files changed, 35 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > index d84f37afb9d8..2eb6c497e43c 100644 > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > @@ -6,6 +6,8 @@ > > > > #ifndef _INTEL_GUC_H_ > > > > #define _INTEL_GUC_H_ > > > > > > > > +#include "linux/xarray.h" > > > > + > > > > #include "intel_uncore.h" > > > > #include "intel_guc_fw.h" > > > > #include "intel_guc_fwif.h" > > > > @@ -47,6 +49,9 @@ struct intel_guc { > > > > struct i915_vma *lrc_desc_pool; > > > > void *lrc_desc_pool_vaddr; > > > > > > > > + /* guc_id to intel_context lookup */ > > > > + struct xarray context_lookup; > > > > > > The current code sets a disastrous example, but for stuff like this it's > > > always good to explain the locking, and who's holding references and how > > > you're handling cycles. Since I guess the intel_context also holds the > > > guc_id alive somehow. > > > > > > > I think (?) I know what you mean by this comment. How about adding: > > > > 'If an entry in the the context_lookup is present, that means a context > > associated with the guc_id is registered with the GuC. We use this xarray > > as a > > lookup mechanism when the GuC communicate with the i915 about the context.' > > So no idea how this works, but generally we put a "Protecte by > " or similar in here (so you get a nice link plus something > you can use as jump label in your ide too). Plus since intel_context has > some lifetime rules, explaining whether you're allowed to use the pointer > after you unlock, or whether you need to grab a reference or what exactly > is going on. Usually there's three options: > > - No refcounting, you cannot access a pointer obtained through this after > you unluck. > - Weak reference, you upgrade to a full reference with > kref_get_unless_zero. If that fails it indicates a lookup failure, since > you raced with destruction. If it succeeds you can use the pointer after > unlock. > - Strong reference, you get your own reference that stays valid with > kref_get(). > I think the rules for this are 'if this exists in the xarray, we have ref'. Likewise if the GuC knows about the context we have a ref to the context. > I'm just bringing this up because the current i915-gem code is full of > very tricky locking and lifetime rules, and explains roughly nothing of it > in the data structures. Minimally some hints about the locking/lifetime > rules of important structs should be there. > Agree. I'll add some comments here and to other structures this code uses. > For locking rules it's good to double-down on them by adding > lockdep_assert_held to all relevant functions (where appropriate only > ofc). > Agree. I think I mostly do that in series. That being said the locking is going to be a bit ugly until we switch to the DRM scheduler because currently multiple processes can enter the GuC backend in parallel. With the DRM scheduler we allow a single point of entry which simplifies things quite a bit. The current locking rules are explained in the documentation patch: 'Update GuC documentation'. As the locking evolves so will the documentation + lockdep asserts. Matt > What I generally don't think makes sense is to then also document the > locking in the kerneldoc for the functions. That tends to be one place too > many and ime just gets out of date and not useful at all. > > > > Again holds for the entire series, where it makes sense (as in we don't > > > expect to rewrite the entire code anyway). > > > > Slightly out of order but one of the last patches in the series, 'Update GuC > > documentation' adds a big section of comments that attempts to clarify how > > all > > of this code works. I likely should add a section explaining the data > > structures > > as well. > > Yeah that would be nice. > -Daniel > > > > > > Matt > > > > > -Daniel > > > > > > > + > > > > /* Control params for fw initialization
Re: [RFC] Implicit vs explicit user fence sync
Am 11.05.21 um 18:48 schrieb Daniel Vetter: [SNIP] Why? If you allow implicit fencing then you can end up with - an implicit userspace fence as the in-fence - but an explicit dma_fence as the out fence Which is not allowed. So there's really no way to make this work, except if you stall in the ioctl, which also doesn't work. Ok, wait a second. I really don't understand what's going on here. The out fence is just to let the userspace know when the frame is displayed. Or rather when the old frame is no longer displayed so that it can be reused, right? Then why does that need to be a dma_fence? We don't use that for memory management anywhere, don't we? So you have to do an uapi change here. At that point we might as well do it right. I mean in the worst case we might need to allow user fences with sync_files as well when that is really used outside of Android. But I still don't see the fundamental problem here. Regards, Christian. Of course if you only care about some specific compositors (or maybe only the -amdgpu Xorg driver even) then this isn't a concern, but atomic is cross-driver so we can't do that. Or at least I don't see a way how to do this without causing endless amounts of fun down the road. So I have a plan here, what was yours? As far as I see that should still work perfectly fine and I have the strong feeling I'm missing something here. Transporting fences between processes is not the fundamental problem here, but rather the question how we represent all this in the kernel? In other words I think what you outlined above is just approaching it from the wrong side again. Instead of looking what the kernel needs to support this you take a look at userspace and the requirements there. Uh ... that was my idea here? That's why I put "build userspace fences in userspace only" as the very first thing. Then extend to winsys and atomic/display and all these cases where things get more tricky. I agree that transporting the fences is easy, which is why it's not interesting trying to solve that problem first. Which is kinda what you're trying to do here by adding implicit userspace fences (well not even that, just a bunch of function calls without any semantics attached to them). So if there's more here, you need to flesh it out more or I just dont get what you're actually trying to demonstrate. Well I'm trying to figure out why you see it as such a problem to keep implicit sync around. As far as I can tell it is completely octagonal if we use implicit/explicit and dma_fence/user_fence. It's just a different implementation inside the kernel. See above. It falls apart with the atomic ioctl. -Daniel
Re: [Intel-gfx] [RFC PATCH 4/5] drm/i915: Introduce 'set parallel submit' extension
On Tue, May 11, 2021 at 05:11:44PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 10:30:48AM -0700, Matthew Brost wrote: > > i915_drm.h updates for 'set parallel submit' extension. > > > > Cc: Tvrtko Ursulin > > Cc: Tony Ye > > CC: Carl Zhang > > Cc: Daniel Vetter > > Cc: Jason Ekstrand > > Signed-off-by: Matthew Brost > > --- > > include/uapi/drm/i915_drm.h | 126 > > 1 file changed, 126 insertions(+) > > > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > > index 26d2e135aa31..0175b12b33b8 100644 > > --- a/include/uapi/drm/i915_drm.h > > +++ b/include/uapi/drm/i915_drm.h > > @@ -1712,6 +1712,7 @@ struct drm_i915_gem_context_param { > > * Extensions: > > * i915_context_engines_load_balance > > (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE) > > * i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND) > > + * i915_context_engines_parallel_submit > > (I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT) > > Hm just relalized, but I don't think this hyperlinsk correctly, and I'm > also not sure this formats very well as a nice list. Using item lists > should look pretty nice like we're doing for the various kms properties, > e.g. > > FOO: > Explain what FOO does > > BAR: > Explain what BAR does. struct bar also automatically generates a link > > Please check with make htmldocs and polish this a bit (might need a small > prep patch). > I agree the doc should look nice. To get there I might need to chat with you on IRC as I'm new to this. > > */ > > #define I915_CONTEXT_PARAM_ENGINES 0xa > > > > @@ -1894,9 +1895,134 @@ struct i915_context_param_engines { > > __u64 extensions; /* linked chain of extension blocks, 0 terminates */ > > #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see > > i915_context_engines_load_balance */ > > #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */ > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see > > i915_context_engines_parallel_submit */ > > struct i915_engine_class_instance engines[0]; > > } __attribute__((packed)); > > > > +/* > > + * i915_context_engines_parallel_submit: > > + * > > + * Setup a gem context to allow multiple BBs to be submitted in a single > > execbuf > > + * IOCTL. Those BBs will then be scheduled to run on the GPU in parallel. > > + * > > + * All hardware contexts in the engine set are configured for parallel > > + * submission (i.e. once this gem context is configured for parallel > > submission, > > + * all the hardware contexts, regardless if a BB is available on each > > individual > > + * context, will be submitted to the GPU in parallel). A user can submit > > BBs to > > + * subset of the hardware contexts, in a single execbuf IOCTL, but it is > > not > > + * recommended as it may reserve physical engines with nothing to run on > > them. > > + * Highly recommended to configure the gem context with N hardware > > contexts then > > + * always submit N BBs in a single IOCTL. > > + * > > + * Their are two currently defined ways to control the placement of the > > + * hardware contexts on physical engines: default behavior (no flags) and > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the > > + * future as new hardware / use cases arise. Details of how to use this > > + * interface below above the flags. > > + * > > + * Returns -EINVAL if hardware context placement configuration invalid or > > if the > > + * placement configuration isn't supported on the platform / submission > > + * interface. > > + * Returns -ENODEV if extension isn't supported on the platform / > > submission > > + * inteface. > > + */ > > +struct i915_context_engines_parallel_submit { > > + struct i915_user_extension base; > > Ok this is good, since it makes sure we can't possible use this in > CTX_SETPARAM. > Yep, this is at context creation time. Technically you still can call this over and over on the same gem context but Jason is taking that ability away I believe. I've also told the media team to setup the context once and don't touch it again. > > + > > +/* > > + * Default placement behvavior (currently unsupported): > > + * > > + * Rather than restricting parallel submission to a single class with a > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a > > mode that > > + * enables parallel submission across multiple engine classes. In this > > case each > > + * context's logical engine mask indicates where that context can placed. > > It is > > + * implied in this mode that all contexts have mutual exclusive placement > > (e.g. > > + * if one context is running CS0 no other contexts can run on CS0). > > + * > > + * Example 1 pseudo code: > > + * CSX[Y] = engine class X, logical instance Y > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > > + * set_engines(INVALID, INVALID) > > + * set_load_balance(engine_index=0, num_siblings=2,
Re: [PATCH] drm: fix semicolon.cocci warnings
On Tue, May 11, 2021 at 7:17 PM Daniel Vetter wrote: > On Wed, May 12, 2021 at 12:11:23AM +0800, kernel test robot wrote: > > From: kernel test robot > > > > drivers/gpu/drm/kmb/kmb_dsi.c:284:3-4: Unneeded semicolon > > drivers/gpu/drm/kmb/kmb_dsi.c:304:3-4: Unneeded semicolon > > drivers/gpu/drm/kmb/kmb_dsi.c:321:3-4: Unneeded semicolon > > drivers/gpu/drm/kmb/kmb_dsi.c:340:3-4: Unneeded semicolon > > drivers/gpu/drm/kmb/kmb_dsi.c:364:2-3: Unneeded semicolon > > > > > > Remove unneeded semicolon. > > > > Generated by: scripts/coccinelle/misc/semicolon.cocci > > > > Fixes: ade896460e4a ("drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY") This Fixed-tag is completely bogus. The right one is Fixes: 98521f4d4b4cb265 ("drm/kmb: Mipi DSI part of the display driver") > > CC: Geert Uytterhoeven > > Reported-by: kernel test robot > > Signed-off-by: kernel test robot Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v3] drm/i915: Invoke another _DSM to enable MUX on HP Workstation laptops
On Mon, Apr 26, 2021 at 11:24:10PM +0800, Kai-Heng Feng wrote: > On HP Fury G7 Workstations, graphics output is re-routed from Intel GFX > to discrete GFX after S3. This is not desirable, because userspace will > treat connected display as a new one, losing display settings. > > The expected behavior is to let discrete GFX drives all external > displays. > > The platform in question uses ACPI method \_SB.PCI0.HGME to enable MUX. > The method is inside the another _DSM, so add the _DSM and call it > accordingly. > > I also tested some MUX-less and iGPU only laptops with that _DSM, no > regression was found. > > v3: > - Remove BXT from names. > - Change the parameter type. > - Fold the function into intel_modeset_init_hw(). > > v2: > - Forward declare struct pci_dev. > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3113 > References: > https://lore.kernel.org/intel-gfx/1460040732-31417-4-git-send-email-animesh.ma...@intel.com/ > Signed-off-by: Kai-Heng Feng > --- > drivers/gpu/drm/i915/display/intel_acpi.c| 18 ++ > drivers/gpu/drm/i915/display/intel_acpi.h| 3 +++ > drivers/gpu/drm/i915/display/intel_display.c | 2 ++ > 3 files changed, 23 insertions(+) > > diff --git a/drivers/gpu/drm/i915/display/intel_acpi.c > b/drivers/gpu/drm/i915/display/intel_acpi.c > index 833d0c1be4f1..d008d3976261 100644 > --- a/drivers/gpu/drm/i915/display/intel_acpi.c > +++ b/drivers/gpu/drm/i915/display/intel_acpi.c > @@ -13,12 +13,17 @@ > #include "intel_display_types.h" > > #define INTEL_DSM_REVISION_ID 1 /* For Calpella anyway... */ > +#define INTEL_DSM_FN_PLATFORM_MUX_ENABLE 0 /* No args */ This block of defines is for the other DSM. We don't want to mix these up. We also want to name it according to the spec, so something like GET_BIOS_DATA_FUNCS_SUPPORTED. Similarly for the intel_dsm_enable_mux() wrapper function. + it needs a comment to document that some BIOSes abuse it to do MUX initialization and whatnot. We should perhaps rename all the old DSM stuff to something a bit less generic as well... > #define INTEL_DSM_FN_PLATFORM_MUX_INFO 1 /* No args */ > > static const guid_t intel_dsm_guid = > GUID_INIT(0x7ed873d3, 0xc2d0, 0x4e4f, > 0xa8, 0x54, 0x0f, 0x13, 0x17, 0xb0, 0x1c, 0x2c); > > +static const guid_t intel_dsm_guid2 = > + GUID_INIT(0x3e5b41c6, 0xeb1d, 0x4260, > + 0x9d, 0x15, 0xc7, 0x1f, 0xba, 0xda, 0xe4, 0x14); > + > static char *intel_dsm_port_name(u8 id) > { > switch (id) { > @@ -176,6 +181,19 @@ void intel_unregister_dsm_handler(void) > { > } > > +void intel_dsm_enable_mux(struct drm_i915_private *i915) > +{ > + struct pci_dev *pdev = i915->drm.pdev; > + acpi_handle dhandle; > + > + dhandle = ACPI_HANDLE(>dev); > + if (!dhandle) > + return; > + > + acpi_evaluate_dsm(dhandle, _dsm_guid2, INTEL_DSM_REVISION_ID, > + INTEL_DSM_FN_PLATFORM_MUX_ENABLE, NULL); > +} > + > /* > * ACPI Specification, Revision 5.0, Appendix B.3.2 _DOD (Enumerate All > Devices > * Attached to the Display Adapter). > diff --git a/drivers/gpu/drm/i915/display/intel_acpi.h > b/drivers/gpu/drm/i915/display/intel_acpi.h > index e8b068661d22..def013cf6308 100644 > --- a/drivers/gpu/drm/i915/display/intel_acpi.h > +++ b/drivers/gpu/drm/i915/display/intel_acpi.h > @@ -11,11 +11,14 @@ struct drm_i915_private; > #ifdef CONFIG_ACPI > void intel_register_dsm_handler(void); > void intel_unregister_dsm_handler(void); > +void intel_dsm_enable_mux(struct drm_i915_private *i915); > void intel_acpi_device_id_update(struct drm_i915_private *i915); > #else > static inline void intel_register_dsm_handler(void) { return; } > static inline void intel_unregister_dsm_handler(void) { return; } > static inline > +void intel_dsm_enable_mux(struct drm_i915_private *i915) { return; } > +static inline > void intel_acpi_device_id_update(struct drm_i915_private *i915) { return; } > #endif /* CONFIG_ACPI */ > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index a10e26380ef3..d79dae370b20 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -11472,6 +11472,8 @@ void intel_modeset_init_hw(struct drm_i915_private > *i915) > { > struct intel_cdclk_state *cdclk_state; > > + intel_dsm_enable_mux(i915); > + This should probably be somewhere around where we do all the other semi ACPI related init (OpRegion/etc.). > if (!HAS_DISPLAY(i915)) > return; > > -- > 2.30.2 -- Ville Syrjälä Intel
Re: [v3 1/2] dt-bindings: backlight: add DisplayPort aux backlight
On 01-05-2021 03:08, Doug Anderson wrote: Hi, On Fri, Apr 30, 2021 at 8:10 AM wrote: On 30-04-2021 02:33, Doug Anderson wrote: > Hi, > > On Thu, Apr 29, 2021 at 11:04 AM Rob Herring wrote: >> >> On Mon, Apr 26, 2021 at 11:29:15AM +0530, Rajeev Nandan wrote: >> > Add bindings for DisplayPort aux backlight driver. >> > >> > Changes in v2: >> > - New >> > >> > Signed-off-by: Rajeev Nandan >> > --- >> > .../bindings/leds/backlight/dp-aux-backlight.yaml | 49 ++ >> > 1 file changed, 49 insertions(+) >> > create mode 100644 Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml >> > >> > diff --git a/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml >> > new file mode 100644 >> > index ..0fa8bf0 >> > --- /dev/null >> > +++ b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml >> > @@ -0,0 +1,49 @@ >> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) >> > +%YAML 1.2 >> > +--- >> > +$id: http://devicetree.org/schemas/leds/backlight/dp-aux-backlight.yaml# >> > +$schema: http://devicetree.org/meta-schemas/core.yaml# >> > + >> > +title: DisplayPort aux backlight driver bindings >> > + >> > +maintainers: >> > + - Rajeev Nandan >> > + >> > +description: >> > + Backlight driver to control the brightness over DisplayPort aux channel. >> > + >> > +allOf: >> > + - $ref: common.yaml# >> > + >> > +properties: >> > + compatible: >> > +const: dp-aux-backlight >> > + >> > + ddc-i2c-bus: >> > +$ref: /schemas/types.yaml#/definitions/phandle >> > +description: >> > + A phandle to the system I2C controller connected to the DDC bus used >> > + for the DisplayPort AUX channel. >> > + >> > + enable-gpios: >> > +maxItems: 1 >> > +description: GPIO specifier for backlight enable pin. >> > + >> > + max-brightness: true >> > + >> > +required: >> > + - compatible >> > + - ddc-i2c-bus >> > + >> > +additionalProperties: false >> > + >> > +examples: >> > + - | >> > +backlight { >> > +compatible = "dp-aux-backlight"; >> > +ddc-i2c-bus = <_bridge>; >> > +enable-gpios = < 12 GPIO_ACTIVE_HIGH>; >> >> So the DDC bus is connected to a backlight and also a panel? This >> binding is not reflecting the h/w, but rather what you want for some >> driver. >> >> There's only one thing here and that's an eDP panel which supports >> backlight control via DP aux channel. You can figure all that out from >> the panel's compatible and/or reading the EDID. >> >> You might also be interested in this thread: >> >> https://lore.kernel.org/lkml/yiksdtjcihgnv...@orome.fritz.box/ > > I think Rajeev needs to rework everything anyway as per: > > https://lore.kernel.org/r/87zgxl5qar@intel.com > > ...but you're right that it makes sense not to model the backlight as > a separate node in the device tree. The panel driver can handle > setting up the backlight. > > -Doug It was not a good idea to create a separate backlight driver and use ddc-i2c-bus to get access to DP aux. I am working to move the code to the panel driver and to utilize the new DRM helper functions (drm_edp_backlight_*) Lyude has added [1]. To use these helper functions, the panel driver should have access to the "struct drm_dp_aux *". The simple-panel has a "ddc-i2c-bus" property to give the panel access to the DDC bus and is currently being used to get the EDID from the panel. Can I use the same ddc bus i2c_adapter to get the "struct drm_dp_aux *"? As per the suggestion [2], I get the "struct drm_dp_aux *" from the i2c_adapter of ddc bus (maybe I didn't understand the suggestion correctly), and, it turned out, the way I have implemented is not the right way [3]. So, I am afraid to use the same method in the panel driver. [1] https://lore.kernel.org/dri-devel/871rb5bcf9@intel.com/ [2] https://www.spinics.net/lists/dri-devel/msg295429.html [3] https://lore.kernel.org/dri-devel/2021042616.4lc3ekxjugjr3...@maple.lan/ So it's definitely up to maintainers, not me. ...but I guess I would have expected something like a new property called "ddc-aux-bus". Then you'd have to create a new API call called something like "of_find_ddc_aux_adapter_by_node()" that would allow you to find it. To implement the first suggestion, I can think of the following way to get the "struct drm_dp_aux" in the panel_simple_probe function: - Create a new panel-simple DT property "ddc-aux-bus", a phandle to the platform device that implements the AUX channel. - Create a global list of drm_dp_aux in drm_dp_helper.c. Initialize list head in drm_dp_aux_init(), add the drm_dp_aux onto the list in drm_dp_aux_register(). Similarly, remove the drm_dp_aux from list in drm_dp_aux_unregister(). - Create a new function of_drm_find_dp_aux_by_node() to get the expected drm_dp_aux from this global list. Please let me know your views on this implementation. Below is the
Re: [Intel-gfx] [RFC PATCH 5/5] drm/i915: Update execbuf IOCTL to accept N BBs
On Tue, May 11, 2021 at 05:13:54PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 10:30:49AM -0700, Matthew Brost wrote: > > Add I915_EXEC_NUMBER_BB_* to drm_i915_gem_execbuffer2.flags which allows > > submitting N BBs per IOCTL. > > > > Cc: Tvrtko Ursulin > > Cc: Tony Ye > > CC: Carl Zhang > > Cc: Daniel Vetter > > Cc: Jason Ekstrand > > Signed-off-by: Matthew Brost > > I dropped my big question on the previous patch already, I'll check this > out again when it's all squashed into the parallel extension patch so we > have everything in one commit. I think we just drop this and only allow N BBs per IOCTL as discussed in patch #2 of this series. Matt > -Daniel > > > --- > > include/uapi/drm/i915_drm.h | 21 - > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > > index 0175b12b33b8..d3072cad4a7e 100644 > > --- a/include/uapi/drm/i915_drm.h > > +++ b/include/uapi/drm/i915_drm.h > > @@ -1291,7 +1291,26 @@ struct drm_i915_gem_execbuffer2 { > > */ > > #define I915_EXEC_USE_EXTENSIONS (1 << 21) > > > > -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1)) > > +/* > > + * Number of BB in execbuf2 IOCTL - 1, used to submit more than BB in a > > single > > + * execbuf2 IOCTL. > > + * > > + * Return -EINVAL if more than 1 BB (value 0) is specified if > > + * I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT hasn't been called on the gem > > + * context first. Also returns -EINVAL if gem context has been setup with > > + * I915_PARALLEL_NO_PREEMPT_MID_BATCH and the number BBs not equal to the > > total > > + * number hardware contexts in the gem context. > > + */ > > +#define I915_EXEC_NUMBER_BB_LSB(22) > > +#define I915_EXEC_NUMBER_BB_MASK (0x3f << I915_EXEC_NUMBER_BB_LSB) > > +#define I915_EXEC_NUMBER_BB_MSB(27) > > +#define i915_execbuffer2_set_number_bb(eb2, num_bb) \ > > + (eb2).flags = ((eb2).flags & ~I915_EXEC_NUMBER_BB_MASK) | \ > > + (((num_bb - 1) << I915_EXEC_NUMBER_BB_LSB) & I915_EXEC_NUMBER_BB_MASK) > > +#define i915_execbuffer2_get_number_bb(eb2) \ > > + eb2).flags & I915_EXEC_NUMBER_BB_MASK) >> I915_EXEC_NUMBER_BB_LSB) > > + 1) > > + > > +#define __I915_EXEC_UNKNOWN_FLAGS (-(1 << (I915_EXEC_NUMBER_BB_MSB + 1))) > > > > #define I915_EXEC_CONTEXT_ID_MASK (0x) > > #define i915_execbuffer2_set_context_id(eb2, context) \ > > -- > > 2.28.0 > > > > ___ > > Intel-gfx mailing list > > intel-...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
On Tue, May 11, 2021 at 05:16:38PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote: > > From: Michal Wajdeczko > > > > New GuC firmware will unify format of MMIO and CTB H2G messages. > > Introduce their definitions now to allow gradual transition of > > our code to match new changes. > > > > Signed-off-by: Michal Wajdeczko > > Signed-off-by: Matthew Brost > > Cc: Michał Winiarski > > --- > > .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++ > > 1 file changed, 226 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > > b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > > index 775e21f3058c..1c264819aa03 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > > @@ -6,6 +6,232 @@ > > #ifndef _ABI_GUC_MESSAGES_ABI_H > > #define _ABI_GUC_MESSAGES_ABI_H > > > > +/** > > + * DOC: HXG Message > > These aren't useful if we don't pull them in somewhere in the > Documentation/gpu hierarchy. General comment, and also please check that > it all renders correctly still. > Sure. Let me figure this out before my next rev. > btw if you respin a patch not originally by you we generally add a (v1) to > the original s-o-b line (or whever the version split was) and explain in > the usual changelog in the commit message what was changed. > Still new to this process. Will do. Matt > This holds for the entire series ofc. > -Daniel > > > + * > > + * All messages exchanged with GuC are defined using 32 bit dwords. > > + * First dword is treated as a message header. Remaining dwords are > > optional. > > + * > > + * .. _HXG Message: > > + * > > + * > > +---+---+--+ > > + * | | Bits | Description > > | > > + * > > +===+===+==+ > > + * | | | > > | > > + * | 0 |31 | **ORIGIN** - originator of the message > > | > > + * | | | - _`GUC_HXG_ORIGIN_HOST` = 0 > > | > > + * | | | - _`GUC_HXG_ORIGIN_GUC` = 1 > > | > > + * | | | > > | > > + * | > > +---+--+ > > + * | | 30:28 | **TYPE** - message type > > | > > + * | | | - _`GUC_HXG_TYPE_REQUEST` = 0 > > | > > + * | | | - _`GUC_HXG_TYPE_EVENT` = 1 > > | > > + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3 > > | > > + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5 > > | > > + * | | | - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6 > > | > > + * | | | - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7 > > | > > + * | > > +---+--+ > > + * | | 27:0 | **AUX** - auxiliary data (depends TYPE) > > | > > + * > > +---+---+--+ > > + * | 1 | 31:0 | optional payload (depends on TYPE) > > | > > + * +---+---+ > > | > > + * |...| | > > | > > + * +---+---+ > > | > > + * | n | 31:0 | > > | > > + * > > +---+---+--+ > > + */ > > + > > +#define GUC_HXG_MSG_MIN_LEN1u > > +#define GUC_HXG_MSG_0_ORIGIN (0x1 << 31) > > +#define GUC_HXG_ORIGIN_HOST 0u > > +#define GUC_HXG_ORIGIN_GUC 1u > > +#define GUC_HXG_MSG_0_TYPE (0x7 << 28) > > +#define GUC_HXG_TYPE_REQUEST 0u > > +#define GUC_HXG_TYPE_EVENT 1u > > +#define GUC_HXG_TYPE_NO_RESPONSE_BUSY3u > > +#define GUC_HXG_TYPE_NO_RESPONSE_RETRY 5u > > +#define GUC_HXG_TYPE_RESPONSE_FAILURE6u > > +#define GUC_HXG_TYPE_RESPONSE_SUCCESS7u > > +#define GUC_HXG_MSG_0_AUX (0xfff << 0) > > + > > +/** > > + * DOC: HXG Request > > + * > > + * The `HXG Request`_ message should be used to initiate synchronous > > activity > > + * for which confirmation or return data is expected. > > + * > > + * The recipient of this message shall use
Re: [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object
On Tue, May 11, 2021 at 05:18:22PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 12:13:46PM -0700, Matthew Brost wrote: > > Introduce i915_sched_engine object which is lower level data structure > > that i915_scheduler / generic code can operate on without touching > > execlist specific structures. This allows additional submission backends > > to be added without breaking the layer. > > Maybe add a comment here that this is defacto a detour since we're now > aiming to use drm/scheduler instead. But also since the current code is a > bit a mess, we expect this detour to be overall faster since we can then > refactor in-tree. > Agree. I think in the end we will still have a i915_sched_engine which more or less encapsulates a 'struct drm_gpu_scheduler' plus a few common variables between the execlist and GuC backends. Matt > Maybe also highlight this a bit more in the rfc to make sure this is > clear. > -Daniel > > > > > Cc: Daniele Ceraolo Spurio > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gem/i915_gem_wait.c | 4 +- > > drivers/gpu/drm/i915/gt/intel_engine.h| 16 - > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 77 ++-- > > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 4 +- > > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 10 +- > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 42 +-- > > drivers/gpu/drm/i915/gt/intel_engine_user.c | 2 +- > > .../drm/i915/gt/intel_execlists_submission.c | 350 +++--- > > .../gpu/drm/i915/gt/intel_ring_submission.c | 13 +- > > drivers/gpu/drm/i915/gt/mock_engine.c | 17 +- > > drivers/gpu/drm/i915/gt/selftest_execlists.c | 36 +- > > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 6 +- > > drivers/gpu/drm/i915/gt/selftest_lrc.c| 6 +- > > drivers/gpu/drm/i915/gt/selftest_reset.c | 2 +- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 75 ++-- > > drivers/gpu/drm/i915/i915_gpu_error.c | 7 +- > > drivers/gpu/drm/i915/i915_request.c | 50 +-- > > drivers/gpu/drm/i915/i915_request.h | 2 +- > > drivers/gpu/drm/i915/i915_scheduler.c | 168 - > > drivers/gpu/drm/i915/i915_scheduler.h | 65 +++- > > drivers/gpu/drm/i915/i915_scheduler_types.h | 63 > > 21 files changed, 575 insertions(+), 440 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c > > b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > > index 4b9856d5ba14..af1fbf8e2a9a 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > > @@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence, > > engine = rq->engine; > > > > rcu_read_lock(); /* RCU serialisation for set-wedged protection */ > > - if (engine->schedule) > > - engine->schedule(rq, attr); > > + if (engine->sched_engine->schedule) > > + engine->sched_engine->schedule(rq, attr); > > rcu_read_unlock(); > > } > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h > > b/drivers/gpu/drm/i915/gt/intel_engine.h > > index 8d9184920c51..988d9688ae4d 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h > > @@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists > > *execlists) > > return active; > > } > > > > -static inline void > > -execlists_active_lock_bh(struct intel_engine_execlists *execlists) > > -{ > > - local_bh_disable(); /* prevent local softirq and lock recursion */ > > - tasklet_lock(>tasklet); > > -} > > - > > -static inline void > > -execlists_active_unlock_bh(struct intel_engine_execlists *execlists) > > -{ > > - tasklet_unlock(>tasklet); > > - local_bh_enable(); /* restore softirq, and kick ksoftirqd! */ > > -} > > - > > struct i915_request * > > execlists_unwind_incomplete_requests(struct intel_engine_execlists > > *execlists); > > > > @@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs > > *engine); > > > > u32 intel_engine_context_size(struct intel_gt *gt, u8 class); > > > > -void intel_engine_init_active(struct intel_engine_cs *engine, > > - unsigned int subclass); > > #define ENGINE_PHYSICAL0 > > #define ENGINE_MOCK1 > > #define ENGINE_VIRTUAL 2 > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > index 828e1669f92c..ec82a7ec0c8d 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > @@ -8,6 +8,7 @@ > > #include "gem/i915_gem_context.h" > > > > #include "i915_drv.h" > > +#include "i915_scheduler.h" > > > > #include "intel_breadcrumbs.h" > > #include "intel_context.h" > > @@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum > > intel_engine_id id) > > if (engine->context_size) > >
Re: [Intel-gfx] [RFC PATCH 2/5] drm/doc/rfc: i915 new parallel submission uAPI plan
On Tue, May 11, 2021 at 04:49:58PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 10:30:46AM -0700, Matthew Brost wrote: > > Add entry fpr i915 new parallel submission uAPI plan. > > > > Cc: Tvrtko Ursulin > > Cc: Tony Ye > > CC: Carl Zhang > > Cc: Daniel Vetter > > Cc: Jason Ekstrand > > Signed-off-by: Matthew Brost > > --- > > Documentation/gpu/rfc/i915_scheduler.rst | 56 +++- > > 1 file changed, 54 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/gpu/rfc/i915_scheduler.rst > > b/Documentation/gpu/rfc/i915_scheduler.rst > > index fa6780a11c86..e3455b33edfe 100644 > > --- a/Documentation/gpu/rfc/i915_scheduler.rst > > +++ b/Documentation/gpu/rfc/i915_scheduler.rst > > @@ -13,7 +13,8 @@ i915 with the DRM scheduler is: > > modparam enable_guc > > * Lots of rework will need to be done to integrate with DRM scheduler so > > no need to nit pick everything in the code, it just should be > > - functional and not regress execlists > > + functional, no major coding style / layering errors, and not regress > > + execlists > > I guess this hunk should be in the previous patch? > Yep, noticed this after sending. > > * Update IGTs / selftests as needed to work with GuC submission > > * Enable CI on supported platforms for a baseline > > * Rework / get CI heathly for GuC submission in place as needed > > @@ -67,4 +68,55 @@ levels too. > > > > New parallel submission uAPI > > > > -Details to come in a following patch. > > +The existing bonding uAPI is completely broken with GuC submission because > > +whether a submission is a single context submit or parallel submit isn't > > known > > +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple > > +contexts in parallel with the GuC the context must be explictly registered > > with > > +N contexts and all N contexts must be submitted in a single command to the > > GuC. > > +This interfaces doesn't support dynamically changing between N contexts as > > the > > +bonding uAPI does. Hence the need for a new parallel submission interface. > > Also > > +the legacy bonding uAPI is quite confusing and not intuitive at all. > > I think you should sit together with Jason on irc or so for a bit and get > an earful of how it's all broken irrespective of GuC submission or not. > Just to hammer in our case :-) > Sounds like a fun conversation, will do. > > + > > +The new parallel submission uAPI consists of 3 parts: > > + > > +* Export engines logical mapping > > +* A 'set_parallel' extension to configure contexts for parallel > > + submission > > +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL > > + > > +Export engines logical mapping > > +-- > > +Certain use cases require BBs to be placed on engine instances in logical > > order > > +(e.g. split-frame on gen11+). The logical mapping of engine instances can > > change > > +based on fusing. Rather than making UMDs be aware of fusing, simply expose > > the > > +logical mapping with the existing query engine info IOCTL. Also the GuC > > +submission interface currently only supports submitting multiple contexts > > to > > +engines in logical order. > > Maybe highlight more that this is a new restriction with GuC compared to > execlist, which is why we need to expose this information to userspace. > Also on the platforms thus far supported in upstream there's at most 2 > engines of the same type, so really not an issue. > Sure. This is a limitation of the GuC interface + really isn't needed unless we have more than 2 engines of the same type. > > + > > +A single bit will be added to drm_i915_engine_info.flags indicating that > > the > > +logical instance has been returned and a new field, > > +drm_i915_engine_info.logical_instance, returns the logical instance. > > + > > +A 'set_parallel' extension to configure contexts for parallel submission > > + > > +The 'set_parallel' extension configures N contexts for parallel > > submission. It > > +is setup step that should be called before using any of the contexts. See > > +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for > > +similar existing examples. Once the N contexts are configured for parallel > > +submission the execbuf2 IOCTL can be called submiting 1-N BBs in a single > > IOCTL. > > +Although submitting less than N BBs is allowed it is not recommended as > > that > > +will likely leave parts of the hardware reserved and idle. Initially only > > +support GuC submission. Execlist support can be added later if needed. > > Can we just require that you always submit N batchbuffers, or does this > create a problem for userspace? Allowing things just because is generally > not a good idea with uapi, it's better to limit and then allow when > there's a need. > Yes, we can
Re: [PATCH v6 10/16] drm/amdgpu: Guard against write accesses after device removal
On 2021-05-11 2:50 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place Protect IB submissions v6: Switch to !drm_dev_enter instead of scoping entire code with brackets. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 9 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 17 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 63 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 70 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 49 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 31 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 11 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 44 ++-- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 8 +-- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 8 +-- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 26 --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 22 +++--- .../drm/amd/pm/powerplay/smumgr/smu7_smumgr.c | 2 + 17 files changed, 257 insertions(+), 145 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a0bff4713672..94c415176cdc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -71,6 +71,8 @@ #include #include +#include + MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin"); MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin"); @@ -281,7 +283,10 @@ void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos, unsigned long flags; uint32_t hi = ~0; uint64_t last; + int idx; + if (!drm_dev_enter(>ddev, )) + return; #ifdef CONFIG_64BIT last = min(pos + size, adev->gmc.visible_vram_size); @@ -299,8 +304,10 @@ void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos, memcpy_fromio(buf, addr, count); } - if (count == size) + if (count == size) { + drm_dev_exit(idx); return; + } Maybe use a goto instead, but really just a nit pick. pos += count; buf += count / 4; @@ -323,6 +330,8 @@ void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos, *buf++ = RREG32_NO_KIQ(mmMM_DATA); } spin_unlock_irqrestore(>mmio_idx_lock, flags); + + drm_dev_exit(idx); } /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index 4d32233cde92..04ba5eef1e88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -31,6 +31,8 @@ #include "amdgpu_ras.h" #include "amdgpu_xgmi.h" +#include + /** * amdgpu_gmc_pdb0_alloc - allocate vram for pdb0 * @@ -151,6 +153,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, { void __iomem *ptr = (void *)cpu_pt_addr; uint64_t value; + int idx; + + if (!drm_dev_enter(>ddev, )) + return 0; /* * The following is for PTE only. GART does not have PDEs. @@ -158,6 +164,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device *adev, void *cpu_pt_addr, value = addr & 0xF000ULL; value |= flags; writeq(value, ptr + (gpu_page_idx * 8)); + + drm_dev_exit(idx); + return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 148a3b481b12..62fcbd446c71 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -30,6 +30,7 @@ #include #include +#include #include "amdgpu.h" #include "atom.h" @@ -137,7 +138,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, bool secure; unsigned i; - int r = 0; + int idx, r = 0; bool need_pipe_sync = false; if (num_ibs == 0) @@ -169,13 +170,16 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, return -EINVAL; } + if (!drm_dev_enter(>ddev, )) + return -ENODEV; + alloc_size = ring->funcs->emit_frame_size + num_ibs * ring->funcs->emit_ib_size; r = amdgpu_ring_alloc(ring, alloc_size); if (r) { dev_err(adev->dev, "scheduling IB failed (%d).\n", r); - return r; + goto exit; } need_ctx_switch = ring->current_ctx != fence_ctx; @@ -205,7 +209,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, r =
Re: [PATCH 1/2] drm: Fix dirtyfb stalls
On Tue, May 11, 2021 at 10:42:58AM -0700, Rob Clark wrote: > On Tue, May 11, 2021 at 10:21 AM Daniel Vetter wrote: > > > > On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote: > > > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter wrote: > > > > > > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote: > > > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter > > > > > wrote: > > > > > > > > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark > > > > > > wrote: > > > > > > > > > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote: > > > > > > > > > From: Rob Clark > > > > > > > > > > > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank > > > > > > > > > on "video > > > > > > > > > mode" type displays, which is pointless and unnecessary. Add > > > > > > > > > an > > > > > > > > > optional helper vfunc to determine if a plane is attached to > > > > > > > > > a CRTC > > > > > > > > > that actually needs dirtyfb, and skip over them. > > > > > > > > > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > > > > > > > > > So this is a bit annoying because the idea of all these "remap > > > > > > > > legacy uapi > > > > > > > > to atomic constructs" helpers is that they shouldn't need/use > > > > > > > > anything > > > > > > > > beyond what userspace also has available. So adding hacks for > > > > > > > > them feels > > > > > > > > really bad. > > > > > > > > > > > > > > I suppose the root problem is that userspace doesn't know if > > > > > > > dirtyfb > > > > > > > (or similar) is actually required or is a no-op. > > > > > > > > > > > > > > But it is perhaps less of a problem because this essentially boils > > > > > > > down to "x11 vs wayland", and it seems like wayland compositors > > > > > > > for > > > > > > > non-vsync'd rendering just pageflips and throws away extra frames > > > > > > > from > > > > > > > the app? > > > > > > > > > > > > Yeah it's about not adequately batching up rendering and syncing > > > > > > with > > > > > > hw. bare metal x11 is just especially stupid about it :-) > > > > > > > > > > > > > > Also I feel like it's not entirely the right thing to do here > > > > > > > > either. > > > > > > > > We've had this problem already on the fbcon emulation side > > > > > > > > (which also > > > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), > > > > > > > > and the fix > > > > > > > > there was to have a worker which batches up all the updates and > > > > > > > > avoids any > > > > > > > > stalls in bad places. > > > > > > > > > > > > > > I'm not too worried about fbcon not being able to render faster > > > > > > > than > > > > > > > vblank. OTOH it is a pretty big problem for x11 > > > > > > > > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do > > > > > > the same with fbcon, which trivially can get ahead of vblank > > > > > > otherwise > > > > > > (if sometimes flushes each character, so you have to pile them up > > > > > > into > > > > > > a single update if that's still pending). > > > > > > > > > > > > > > Since this is for frontbuffer rendering userspace only we can > > > > > > > > probably get > > > > > > > > away with assuming there's only a single fb, so the > > > > > > > > implementation becomes > > > > > > > > pretty simple: > > > > > > > > > > > > > > > > - 1 worker, and we keep track of a single pending fb > > > > > > > > - if there's already a dirty fb pending on a different fb, we > > > > > > > > stall for > > > > > > > > the worker to start processing that one already (i.e. the fb > > > > > > > > we track is > > > > > > > > reset to NULL) > > > > > > > > - if it's pending on the same fb we just toss away all the > > > > > > > > updates and go > > > > > > > > with a full update, since merging the clip rects is too much > > > > > > > > work :-) I > > > > > > > > think there's helpers so you could be slightly more clever > > > > > > > > and just have > > > > > > > > an overall bounding box > > > > > > > > > > > > > > This doesn't really fix the problem, you still end up delaying > > > > > > > sending > > > > > > > the next back-buffer to mesa > > > > > > > > > > > > With this the dirtyfb would never block. Also glorious frontbuffer > > > > > > tracking corruption is possible, but that's not the kernel's > > > > > > problem. > > > > > > So how would anything get held up in userspace. > > > > > > > > > > the part about stalling if a dirtyfb is pending was what I was worried > > > > > about.. but I suppose you meant the worker stalling, rather than > > > > > userspace stalling (where I had interpreted it the other way around). > > > > > As soon as userspace needs to stall, you're losing again. > > > > > > > > Nah, I did mean userspace stalling, so we can't pile up unlimited > > > > amounts > > > > of dirtyfb request in the kernel. > > >
Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
On Tue, May 11, 2021 at 10:12:32AM -0700, Matthew Brost wrote: > On Tue, May 11, 2021 at 06:28:25PM +0200, Daniel Vetter wrote: > > On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote: > > > We receive notification of an engine reset from GuC at its > > > completion. Meaning GuC has potentially cleared any HW state > > > we may have been interested in capturing. GuC resumes scheduling > > > on the engine post-reset, as the resets are meant to be transparent, > > > further muddling our error state. > > > > > > There is ongoing work to define an API for a GuC debug state dump. The > > > suggestion for now is to manually disable FW initiated resets in cases > > > where debug state is needed. > > > > > > Signed-off-by: Matthew Brost > > > > This looks a bit backwards to me: > > > > Definitely a bit hacky but this patch does the best to capture the error as it > can, > > > - I figured we should capture error state when we get the G2H, in which > > case I hope we do know which the offending context was that got shot. > > > > We know which context was shot based on the G2H. See 'hung_ce' in this patch. Ah maybe I should read more. Would be good to have comments on how the locking works here, especially around reset it tends to be tricky. Comments in the data structs/members. > > > - For now we're missing the hw state, but we should still be able to > > capture the buffers userspace wants us to capture. So that could be > > wired up already? > > Which buffers exactly? We dump all buffers associated with the context. There's an opt-in list that userspace can set in execbuf. Maybe that's the one you mean. -Daniel > > > > > But yeah register state capturing needs support from GuC fw. > > > > I think this is a big enough miss in GuC features that we should list it > > on the rfc as a thing to fix. > > Agree this needs to be fixed. > > Matt > > > -Daniel > > > > > --- > > > drivers/gpu/drm/i915/gt/intel_context.c | 20 +++ > > > drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ > > > drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++- > > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 -- > > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ > > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +-- > > > drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++--- > > > 7 files changed, 91 insertions(+), 26 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > > b/drivers/gpu/drm/i915/gt/intel_context.c > > > index 2f01437056a8..3fe7794b2bfd 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > > @@ -514,6 +514,26 @@ struct i915_request > > > *intel_context_create_request(struct intel_context *ce) > > > return rq; > > > } > > > > > > +struct i915_request *intel_context_find_active_request(struct > > > intel_context *ce) > > > +{ > > > + struct i915_request *rq, *active = NULL; > > > + unsigned long flags; > > > + > > > + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine)); > > > + > > > + spin_lock_irqsave(>guc_active.lock, flags); > > > + list_for_each_entry_reverse(rq, >guc_active.requests, > > > + sched.link) { > > > + if (i915_request_completed(rq)) > > > + break; > > > + > > > + active = rq; > > > + } > > > + spin_unlock_irqrestore(>guc_active.lock, flags); > > > + > > > + return active; > > > +} > > > + > > > #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) > > > #include "selftest_context.c" > > > #endif > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > > > b/drivers/gpu/drm/i915/gt/intel_context.h > > > index 9b211ca5ecc7..d2b499ed8a05 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_context.h > > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > > > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct > > > intel_context *ce, > > > > > > struct i915_request *intel_context_create_request(struct intel_context > > > *ce); > > > > > > +struct i915_request * > > > +intel_context_find_active_request(struct intel_context *ce); > > > + > > > static inline struct intel_ring *__intel_context_ring_size(u64 sz) > > > { > > > return u64_to_ptr(struct intel_ring, sz); > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h > > > b/drivers/gpu/drm/i915/gt/intel_engine.h > > > index 3321d0917a99..bb94963a9fa2 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h > > > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct > > > intel_engine_cs *engine, > > > ktime_t *now); > > > > > > struct i915_request * > > > -intel_engine_find_active_request(struct intel_engine_cs *engine); > > > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine); > > > > > > u32 intel_engine_context_size(struct intel_gt *gt, u8 class); > > >
Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
On Tue, May 11, 2021 at 10:01:28AM -0700, Matthew Brost wrote: > On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote: > > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote: > > > Add lrc descriptor context lookup array which can resolve the > > > intel_context from the lrc descriptor index. In addition to lookup, it > > > can determine in the lrc descriptor context is currently registered with > > > the GuC by checking if an entry for a descriptor index is present. > > > Future patches in the series will make use of this array. > > > > > > Cc: John Harrison > > > Signed-off-by: Matthew Brost > > > --- > > > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 5 +++ > > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +-- > > > 2 files changed, 35 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > index d84f37afb9d8..2eb6c497e43c 100644 > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > @@ -6,6 +6,8 @@ > > > #ifndef _INTEL_GUC_H_ > > > #define _INTEL_GUC_H_ > > > > > > +#include "linux/xarray.h" > > > + > > > #include "intel_uncore.h" > > > #include "intel_guc_fw.h" > > > #include "intel_guc_fwif.h" > > > @@ -47,6 +49,9 @@ struct intel_guc { > > > struct i915_vma *lrc_desc_pool; > > > void *lrc_desc_pool_vaddr; > > > > > > + /* guc_id to intel_context lookup */ > > > + struct xarray context_lookup; > > > > The current code sets a disastrous example, but for stuff like this it's > > always good to explain the locking, and who's holding references and how > > you're handling cycles. Since I guess the intel_context also holds the > > guc_id alive somehow. > > > > I think (?) I know what you mean by this comment. How about adding: > > 'If an entry in the the context_lookup is present, that means a context > associated with the guc_id is registered with the GuC. We use this xarray as a > lookup mechanism when the GuC communicate with the i915 about the context.' So no idea how this works, but generally we put a "Protecte by " or similar in here (so you get a nice link plus something you can use as jump label in your ide too). Plus since intel_context has some lifetime rules, explaining whether you're allowed to use the pointer after you unlock, or whether you need to grab a reference or what exactly is going on. Usually there's three options: - No refcounting, you cannot access a pointer obtained through this after you unluck. - Weak reference, you upgrade to a full reference with kref_get_unless_zero. If that fails it indicates a lookup failure, since you raced with destruction. If it succeeds you can use the pointer after unlock. - Strong reference, you get your own reference that stays valid with kref_get(). I'm just bringing this up because the current i915-gem code is full of very tricky locking and lifetime rules, and explains roughly nothing of it in the data structures. Minimally some hints about the locking/lifetime rules of important structs should be there. For locking rules it's good to double-down on them by adding lockdep_assert_held to all relevant functions (where appropriate only ofc). What I generally don't think makes sense is to then also document the locking in the kerneldoc for the functions. That tends to be one place too many and ime just gets out of date and not useful at all. > > Again holds for the entire series, where it makes sense (as in we don't > > expect to rewrite the entire code anyway). > > Slightly out of order but one of the last patches in the series, 'Update GuC > documentation' adds a big section of comments that attempts to clarify how all > of this code works. I likely should add a section explaining the data > structures > as well. Yeah that would be nice. -Daniel > > Matt > > > -Daniel > > > > > + > > > /* Control params for fw initialization */ > > > u32 params[GUC_CTL_MAX_DWORDS]; > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > index 6acc1ef34f92..c2b6d27404b7 100644 > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct > > > rb_node *rb) > > > return rb_entry(rb, struct i915_priolist, node); > > > } > > > > > > -/* Future patches will use this function */ > > > -__attribute__ ((unused)) > > > static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 > > > index) > > > { > > > struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; > > > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct > > > intel_guc *guc, u32 index) > > > return [index]; > > > } > > > > > > +static inline struct intel_context *__get_context(struct intel_guc *guc, > > >
Re: [PATCH 1/2] drm: Fix dirtyfb stalls
On Tue, May 11, 2021 at 10:21 AM Daniel Vetter wrote: > > On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote: > > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter wrote: > > > > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote: > > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter wrote: > > > > > > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark wrote: > > > > > > > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter > > > > > > wrote: > > > > > > > > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote: > > > > > > > > From: Rob Clark > > > > > > > > > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on > > > > > > > > "video > > > > > > > > mode" type displays, which is pointless and unnecessary. Add an > > > > > > > > optional helper vfunc to determine if a plane is attached to a > > > > > > > > CRTC > > > > > > > > that actually needs dirtyfb, and skip over them. > > > > > > > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > > > > > > > So this is a bit annoying because the idea of all these "remap > > > > > > > legacy uapi > > > > > > > to atomic constructs" helpers is that they shouldn't need/use > > > > > > > anything > > > > > > > beyond what userspace also has available. So adding hacks for > > > > > > > them feels > > > > > > > really bad. > > > > > > > > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb > > > > > > (or similar) is actually required or is a no-op. > > > > > > > > > > > > But it is perhaps less of a problem because this essentially boils > > > > > > down to "x11 vs wayland", and it seems like wayland compositors for > > > > > > non-vsync'd rendering just pageflips and throws away extra frames > > > > > > from > > > > > > the app? > > > > > > > > > > Yeah it's about not adequately batching up rendering and syncing with > > > > > hw. bare metal x11 is just especially stupid about it :-) > > > > > > > > > > > > Also I feel like it's not entirely the right thing to do here > > > > > > > either. > > > > > > > We've had this problem already on the fbcon emulation side (which > > > > > > > also > > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), > > > > > > > and the fix > > > > > > > there was to have a worker which batches up all the updates and > > > > > > > avoids any > > > > > > > stalls in bad places. > > > > > > > > > > > > I'm not too worried about fbcon not being able to render faster than > > > > > > vblank. OTOH it is a pretty big problem for x11 > > > > > > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do > > > > > the same with fbcon, which trivially can get ahead of vblank otherwise > > > > > (if sometimes flushes each character, so you have to pile them up into > > > > > a single update if that's still pending). > > > > > > > > > > > > Since this is for frontbuffer rendering userspace only we can > > > > > > > probably get > > > > > > > away with assuming there's only a single fb, so the > > > > > > > implementation becomes > > > > > > > pretty simple: > > > > > > > > > > > > > > - 1 worker, and we keep track of a single pending fb > > > > > > > - if there's already a dirty fb pending on a different fb, we > > > > > > > stall for > > > > > > > the worker to start processing that one already (i.e. the fb we > > > > > > > track is > > > > > > > reset to NULL) > > > > > > > - if it's pending on the same fb we just toss away all the > > > > > > > updates and go > > > > > > > with a full update, since merging the clip rects is too much > > > > > > > work :-) I > > > > > > > think there's helpers so you could be slightly more clever and > > > > > > > just have > > > > > > > an overall bounding box > > > > > > > > > > > > This doesn't really fix the problem, you still end up delaying > > > > > > sending > > > > > > the next back-buffer to mesa > > > > > > > > > > With this the dirtyfb would never block. Also glorious frontbuffer > > > > > tracking corruption is possible, but that's not the kernel's problem. > > > > > So how would anything get held up in userspace. > > > > > > > > the part about stalling if a dirtyfb is pending was what I was worried > > > > about.. but I suppose you meant the worker stalling, rather than > > > > userspace stalling (where I had interpreted it the other way around). > > > > As soon as userspace needs to stall, you're losing again. > > > > > > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts > > > of dirtyfb request in the kernel. > > > > > > But also I never expect userspace that uses dirtyfb to actually hit this > > > stall point (otherwise we'd need to look at this again). It would really > > > be only there as defense against abuse. > > > > I don't believe modesetting ddx throttles dirtyfb, it (indirectly) > > calls this from it's BlockHandler.. so if you do end up blocking after > > the N'th dirtyfb, you are still going to end
Re: [PATCH] drm/doc/rfc: drop the i915_gem_lmem.h header
On Tue, May 11, 2021 at 07:28:08PM +0200, Daniel Vetter wrote: > On Tue, May 11, 2021 at 06:03:56PM +0100, Matthew Auld wrote: > > The proper headers have now landed in include/uapi/drm/i915_drm.h, so we > > can drop i915_gem_lmem.h and instead just reference the real headers for > > pulling in the kernel doc. > > > > Suggested-by: Daniel Vetter > > Signed-off-by: Matthew Auld > > Reviewed-by: Daniel Vetter > > I guess we need to have a note that when we land the pciid for dg1 to move > all the remaining bits over to real docs and delete the i915 lmem rfc. But > everything in due time. One thing I forgot: The include stanza will I think result in the explicitly included functions not showing up in the normal driver uapi docs. Which I think is fine while we settle all this. Or do I get this wrong? -Daniel > -Daniel > > > --- > > Documentation/gpu/rfc/i915_gem_lmem.h | 237 > > Documentation/gpu/rfc/i915_gem_lmem.rst | 6 +- > > 2 files changed, 3 insertions(+), 240 deletions(-) > > delete mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > > > > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > > b/Documentation/gpu/rfc/i915_gem_lmem.h > > deleted file mode 100644 > > index d9c61bea0556.. > > --- a/Documentation/gpu/rfc/i915_gem_lmem.h > > +++ /dev/null > > @@ -1,237 +0,0 @@ > > -/** > > - * enum drm_i915_gem_memory_class - Supported memory classes > > - */ > > -enum drm_i915_gem_memory_class { > > - /** @I915_MEMORY_CLASS_SYSTEM: System memory */ > > - I915_MEMORY_CLASS_SYSTEM = 0, > > - /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */ > > - I915_MEMORY_CLASS_DEVICE, > > -}; > > - > > -/** > > - * struct drm_i915_gem_memory_class_instance - Identify particular memory > > region > > - */ > > -struct drm_i915_gem_memory_class_instance { > > - /** @memory_class: See enum drm_i915_gem_memory_class */ > > - __u16 memory_class; > > - > > - /** @memory_instance: Which instance */ > > - __u16 memory_instance; > > -}; > > - > > -/** > > - * struct drm_i915_memory_region_info - Describes one region as known to > > the > > - * driver. > > - * > > - * Note that we reserve some stuff here for potential future work. As an > > example > > - * we might want expose the capabilities for a given region, which could > > include > > - * things like if the region is CPU mappable/accessible, what are the > > supported > > - * mapping types etc. > > - * > > - * Note that to extend struct drm_i915_memory_region_info and struct > > - * drm_i915_query_memory_regions in the future the plan is to do the > > following: > > - * > > - * .. code-block:: C > > - * > > - * struct drm_i915_memory_region_info { > > - * struct drm_i915_gem_memory_class_instance region; > > - * union { > > - * __u32 rsvd0; > > - * __u32 new_thing1; > > - * }; > > - * ... > > - * union { > > - * __u64 rsvd1[8]; > > - * struct { > > - * __u64 new_thing2; > > - * __u64 new_thing3; > > - * ... > > - * }; > > - * }; > > - * }; > > - * > > - * With this things should remain source compatible between versions for > > - * userspace, even as we add new fields. > > - * > > - * Note this is using both struct drm_i915_query_item and struct > > drm_i915_query. > > - * For this new query we are adding the new query id > > DRM_I915_QUERY_MEMORY_REGIONS > > - * at _i915_query_item.query_id. > > - */ > > -struct drm_i915_memory_region_info { > > - /** @region: The class:instance pair encoding */ > > - struct drm_i915_gem_memory_class_instance region; > > - > > - /** @rsvd0: MBZ */ > > - __u32 rsvd0; > > - > > - /** @probed_size: Memory probed by the driver (-1 = unknown) */ > > - __u64 probed_size; > > - > > - /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ > > - __u64 unallocated_size; > > - > > - /** @rsvd1: MBZ */ > > - __u64 rsvd1[8]; > > -}; > > - > > -/** > > - * struct drm_i915_query_memory_regions > > - * > > - * The region info query enumerates all regions known to the driver by > > filling > > - * in an array of struct drm_i915_memory_region_info structures. > > - * > > - * Example for getting the list of supported regions: > > - * > > - * .. code-block:: C > > - * > > - * struct drm_i915_query_memory_regions *info; > > - * struct drm_i915_query_item item = { > > - * .query_id = DRM_I915_QUERY_MEMORY_REGIONS; > > - * }; > > - * struct drm_i915_query query = { > > - * .num_items = 1, > > - * .items_ptr = (uintptr_t), > > - * }; > > - * int err, i; > > - * > > - * // First query the size of the blob we need, this needs to be large > > - * // enough to hold our array of regions. The kernel will fill out the > > - * // item.length for us, which is the number of bytes we need. > > - * err = ioctl(fd, DRM_IOCTL_I915_QUERY, );
Re: [PATCH] drm/doc/rfc: drop the i915_gem_lmem.h header
On Tue, May 11, 2021 at 06:03:56PM +0100, Matthew Auld wrote: > The proper headers have now landed in include/uapi/drm/i915_drm.h, so we > can drop i915_gem_lmem.h and instead just reference the real headers for > pulling in the kernel doc. > > Suggested-by: Daniel Vetter > Signed-off-by: Matthew Auld Reviewed-by: Daniel Vetter I guess we need to have a note that when we land the pciid for dg1 to move all the remaining bits over to real docs and delete the i915 lmem rfc. But everything in due time. -Daniel > --- > Documentation/gpu/rfc/i915_gem_lmem.h | 237 > Documentation/gpu/rfc/i915_gem_lmem.rst | 6 +- > 2 files changed, 3 insertions(+), 240 deletions(-) > delete mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h > b/Documentation/gpu/rfc/i915_gem_lmem.h > deleted file mode 100644 > index d9c61bea0556.. > --- a/Documentation/gpu/rfc/i915_gem_lmem.h > +++ /dev/null > @@ -1,237 +0,0 @@ > -/** > - * enum drm_i915_gem_memory_class - Supported memory classes > - */ > -enum drm_i915_gem_memory_class { > - /** @I915_MEMORY_CLASS_SYSTEM: System memory */ > - I915_MEMORY_CLASS_SYSTEM = 0, > - /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */ > - I915_MEMORY_CLASS_DEVICE, > -}; > - > -/** > - * struct drm_i915_gem_memory_class_instance - Identify particular memory > region > - */ > -struct drm_i915_gem_memory_class_instance { > - /** @memory_class: See enum drm_i915_gem_memory_class */ > - __u16 memory_class; > - > - /** @memory_instance: Which instance */ > - __u16 memory_instance; > -}; > - > -/** > - * struct drm_i915_memory_region_info - Describes one region as known to the > - * driver. > - * > - * Note that we reserve some stuff here for potential future work. As an > example > - * we might want expose the capabilities for a given region, which could > include > - * things like if the region is CPU mappable/accessible, what are the > supported > - * mapping types etc. > - * > - * Note that to extend struct drm_i915_memory_region_info and struct > - * drm_i915_query_memory_regions in the future the plan is to do the > following: > - * > - * .. code-block:: C > - * > - * struct drm_i915_memory_region_info { > - * struct drm_i915_gem_memory_class_instance region; > - * union { > - * __u32 rsvd0; > - * __u32 new_thing1; > - * }; > - * ... > - * union { > - * __u64 rsvd1[8]; > - * struct { > - * __u64 new_thing2; > - * __u64 new_thing3; > - * ... > - * }; > - * }; > - * }; > - * > - * With this things should remain source compatible between versions for > - * userspace, even as we add new fields. > - * > - * Note this is using both struct drm_i915_query_item and struct > drm_i915_query. > - * For this new query we are adding the new query id > DRM_I915_QUERY_MEMORY_REGIONS > - * at _i915_query_item.query_id. > - */ > -struct drm_i915_memory_region_info { > - /** @region: The class:instance pair encoding */ > - struct drm_i915_gem_memory_class_instance region; > - > - /** @rsvd0: MBZ */ > - __u32 rsvd0; > - > - /** @probed_size: Memory probed by the driver (-1 = unknown) */ > - __u64 probed_size; > - > - /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ > - __u64 unallocated_size; > - > - /** @rsvd1: MBZ */ > - __u64 rsvd1[8]; > -}; > - > -/** > - * struct drm_i915_query_memory_regions > - * > - * The region info query enumerates all regions known to the driver by > filling > - * in an array of struct drm_i915_memory_region_info structures. > - * > - * Example for getting the list of supported regions: > - * > - * .. code-block:: C > - * > - * struct drm_i915_query_memory_regions *info; > - * struct drm_i915_query_item item = { > - * .query_id = DRM_I915_QUERY_MEMORY_REGIONS; > - * }; > - * struct drm_i915_query query = { > - * .num_items = 1, > - * .items_ptr = (uintptr_t), > - * }; > - * int err, i; > - * > - * // First query the size of the blob we need, this needs to be large > - * // enough to hold our array of regions. The kernel will fill out the > - * // item.length for us, which is the number of bytes we need. > - * err = ioctl(fd, DRM_IOCTL_I915_QUERY, ); > - * if (err) ... > - * > - * info = calloc(1, item.length); > - * // Now that we allocated the required number of bytes, we call the ioctl > - * // again, this time with the data_ptr pointing to our newly allocated > - * // blob, which the kernel can then populate with the all the region > info. > - * item.data_ptr = (uintptr_t), > - * > - * err = ioctl(fd, DRM_IOCTL_I915_QUERY, ); > - * if (err) ... > - * > - * // We can now access each region
Re: [PATCH] component: Move host device to end of device lists on binding
On Tue, May 11, 2021 at 10:19:09AM -0700, Stephen Boyd wrote: > Quoting Daniel Vetter (2021-05-11 06:39:36) > > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki > > wrote: > > > > > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd wrote: > > > > > > [cut] > > > > > > > > > > > > > > > > > > I will try it, but then I wonder about things like system wide > > > > > > suspend/resume too. The drm encoder chain would need to reimplement > > > > > > the > > > > > > logic for system wide suspend/resume so that any PM ops attached to > > > > > > the > > > > > > msm device run in the correct order. Right now the bridge PM ops > > > > > > will > > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run. > > > > > > After this change, the msm PM ops will run, the bridge PM ops will > > > > > > run, > > > > > > and then the i2c bus PM ops will run. It feels like that could be a > > > > > > problem if we're suspending the DSI encoder while the bridge is > > > > > > still > > > > > > active. > > > > > > > > > > Yup suspend/resume has the exact same problem as shutdown. > > > > > > > > I think suspend/resume has the exact opposite problem. At least I think > > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI, > > > > like is happening today. It looks like drm_atomic_helper_shutdown() > > > > operates from the top down when we want bottom up? I admit I have no > > > > idea what is supposed to happen here. > > > > > > Why would the system-wide suspend ordering be different from the > > > shutdown ordering? > > > > At least my point was that both shutdown and suspend/resume have the > > same problem, and the righ fix is (I think at least) to add these > > hooks to the component.c aggregate ops structure. Hence just adding > > new callbacks for shutdown will be an incomplete solution. > > To add proper hooks to component.c we'll need to make the aggregate > device into a 'struct device' and make a bus for them that essentially > adds the aggregate device to the bus once all the components are > registered. The bind/unbind can be ported to probe/remove, and then the > aggregate driver can get PM ops that run before the component devices > run their PM ops. > > Let me go try it out and see if I can make it minimally invasive so that > the migration path is simple. Thanks for volunteeering. Please cc Greg KH so we make sure we're not doing this wrongly wrt the device model. -Daniel > > I don't feel like changing the global device order is the right > > approach, since essentially that's what component was meant to fix. > > Except it's incomplete since it only provides a solution for > > bind/unbind and not for shutdown or suspend/resume as other global > > state changes. I think some drivers "fixed" this by putting stuff like > > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to > > make sure that everything is ready with that trick. But that doesn't > > compose very well :-/ > > Yeah it looks like msm is using prepare/complete for this so that it can > jump in early and suspend the display pipeline before the components > suspend themselves. The shutdown path only has one callback so we can't > play the same games. Yeah there's tons of hacks. i915 component usage with audio has similar tricks to make suspend/resume work. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] component: Move host device to end of device lists on binding
Quoting Russell King - ARM Linux admin (2021-05-11 07:42:37) > On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote: > > Within the component device framework this usually isn't that bad > > because the real driver work is done at bind time via > > component{,master}_ops::bind(). It becomes a problem when the driver > > core, or host driver, wants to operate on the component device outside > > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The > > driver core doesn't understand the relationship between the host device > > and the component devices and could possibly try to operate on component > > devices when they're already removed from the system or shut down. > > You really are not supposed to be doing anything with component devices > once they have been unbound. You can do stuff with them only between the > bind() and the unbind() callbacks for the host device. Got it. The device is not unbound in this case so this isn't the problem. > > Access to the host devices outside of that is totally undefined and > should not be done. > > The shutdown callback should be fine as long as the other devices are > still bound, but there will be implications if the shutdown order > matters. > > However, randomly pulling devices around in the DPM list sounds to me > like a very bad idea. What happens if such re-orderings result in a > child device being shutdown after a parent device has been shut down? > Fair enough. I'll cook up a 'component' bus and see if that can fix this properly. It will add a new device for the aggregate driver that does the bind/unbind so the host/parent device will still be ordered on the DPM list at the same place. The new aggregate device will be after the components and we'll attach the PM ops and shutdown hooks to that.
Re: [PATCH 1/2] drm: Fix dirtyfb stalls
On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote: > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter wrote: > > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote: > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter wrote: > > > > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark wrote: > > > > > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter wrote: > > > > > > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote: > > > > > > > From: Rob Clark > > > > > > > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on > > > > > > > "video > > > > > > > mode" type displays, which is pointless and unnecessary. Add an > > > > > > > optional helper vfunc to determine if a plane is attached to a > > > > > > > CRTC > > > > > > > that actually needs dirtyfb, and skip over them. > > > > > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > > > > > So this is a bit annoying because the idea of all these "remap > > > > > > legacy uapi > > > > > > to atomic constructs" helpers is that they shouldn't need/use > > > > > > anything > > > > > > beyond what userspace also has available. So adding hacks for them > > > > > > feels > > > > > > really bad. > > > > > > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb > > > > > (or similar) is actually required or is a no-op. > > > > > > > > > > But it is perhaps less of a problem because this essentially boils > > > > > down to "x11 vs wayland", and it seems like wayland compositors for > > > > > non-vsync'd rendering just pageflips and throws away extra frames from > > > > > the app? > > > > > > > > Yeah it's about not adequately batching up rendering and syncing with > > > > hw. bare metal x11 is just especially stupid about it :-) > > > > > > > > > > Also I feel like it's not entirely the right thing to do here > > > > > > either. > > > > > > We've had this problem already on the fbcon emulation side (which > > > > > > also > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and > > > > > > the fix > > > > > > there was to have a worker which batches up all the updates and > > > > > > avoids any > > > > > > stalls in bad places. > > > > > > > > > > I'm not too worried about fbcon not being able to render faster than > > > > > vblank. OTOH it is a pretty big problem for x11 > > > > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do > > > > the same with fbcon, which trivially can get ahead of vblank otherwise > > > > (if sometimes flushes each character, so you have to pile them up into > > > > a single update if that's still pending). > > > > > > > > > > Since this is for frontbuffer rendering userspace only we can > > > > > > probably get > > > > > > away with assuming there's only a single fb, so the implementation > > > > > > becomes > > > > > > pretty simple: > > > > > > > > > > > > - 1 worker, and we keep track of a single pending fb > > > > > > - if there's already a dirty fb pending on a different fb, we stall > > > > > > for > > > > > > the worker to start processing that one already (i.e. the fb we > > > > > > track is > > > > > > reset to NULL) > > > > > > - if it's pending on the same fb we just toss away all the updates > > > > > > and go > > > > > > with a full update, since merging the clip rects is too much work > > > > > > :-) I > > > > > > think there's helpers so you could be slightly more clever and > > > > > > just have > > > > > > an overall bounding box > > > > > > > > > > This doesn't really fix the problem, you still end up delaying sending > > > > > the next back-buffer to mesa > > > > > > > > With this the dirtyfb would never block. Also glorious frontbuffer > > > > tracking corruption is possible, but that's not the kernel's problem. > > > > So how would anything get held up in userspace. > > > > > > the part about stalling if a dirtyfb is pending was what I was worried > > > about.. but I suppose you meant the worker stalling, rather than > > > userspace stalling (where I had interpreted it the other way around). > > > As soon as userspace needs to stall, you're losing again. > > > > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts > > of dirtyfb request in the kernel. > > > > But also I never expect userspace that uses dirtyfb to actually hit this > > stall point (otherwise we'd need to look at this again). It would really > > be only there as defense against abuse. > > I don't believe modesetting ddx throttles dirtyfb, it (indirectly) > calls this from it's BlockHandler.. so if you do end up blocking after > the N'th dirtyfb, you are still going to end up stalling for vblank, > you are just deferring that for a frame or two.. Nope, that's not what I mean. By default we pile up the updates, so you _never_ stall. The worker then takes the entire update every time it runs and batches them up. We _only_ stall when we get a dirtyfb with a
Re: [PATCH] component: Move host device to end of device lists on binding
On Tue, May 11, 2021 at 7:00 PM Stephen Boyd wrote: > > Quoting Rafael J. Wysocki (2021-05-11 03:52:06) > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd wrote: > > > > [cut] > > > > > > > > > > > > > > I will try it, but then I wonder about things like system wide > > > > > suspend/resume too. The drm encoder chain would need to reimplement > > > > > the > > > > > logic for system wide suspend/resume so that any PM ops attached to > > > > > the > > > > > msm device run in the correct order. Right now the bridge PM ops will > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run. > > > > > After this change, the msm PM ops will run, the bridge PM ops will > > > > > run, > > > > > and then the i2c bus PM ops will run. It feels like that could be a > > > > > problem if we're suspending the DSI encoder while the bridge is still > > > > > active. > > > > > > > > Yup suspend/resume has the exact same problem as shutdown. > > > > > > I think suspend/resume has the exact opposite problem. At least I think > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI, > > > like is happening today. It looks like drm_atomic_helper_shutdown() > > > operates from the top down when we want bottom up? I admit I have no > > > idea what is supposed to happen here. > > > > Why would the system-wide suspend ordering be different from the > > shutdown ordering? > > I don't really know. I'm mostly noting that today the order of suspend > is to suspend the bridge device first and then the aggregate device. If > the suspend of the aggregate device is traversing the devices like > drm_atomic_helper_shutdown() then it would operate on the bridge device > after it has been suspended, like is happening during shutdown. But it > looks like that isn't happening. At least for the msm driver we're > suspending the aggregate device after the bridge, and there are some > weird usages of prepare and complete in there (see msm_pm_prepare() and > msm_pm_complete) which makes me think that it's all working around this > component code. Well, it looks like the "prepare" phase is used sort-of against the rules (because "prepare" is not supposed to make changes to the hardware configuration or at least that is not its role) in order to work around an ordering issue that is present in shutdown which doesn't have a "prepare" phase. > The prepare phase is going to suspend the display pipeline, and then the > bridge device will run its suspend hooks, and then the aggregate driver > will run its suspend hooks. If we had a proper device for the aggregate > device instead of the bind/unbind component hooks we could clean this > up. I'm not sufficiently familiar with the component code to add anything constructive here, but generally speaking it looks like the "natural" dpm_list ordering does not match the order in which the devices in question should be suspended (or shut down for that matter), so indeed it is necessary to reorder dpm_list this way or another. Please also note that it generally may not be sufficient to reorder dpm_list if the devices are suspended and resumed asynchronously during system-wide transitions, because in that case the callbacks of different devices are only started in the dpm_list order, but they may be completed in a different order (and of course they may run in parallel with each other). Shutdown is simpler, because it runs the callback synchronously for all devices IIRC.
Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
On Tue, May 11, 2021 at 06:28:25PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote: > > We receive notification of an engine reset from GuC at its > > completion. Meaning GuC has potentially cleared any HW state > > we may have been interested in capturing. GuC resumes scheduling > > on the engine post-reset, as the resets are meant to be transparent, > > further muddling our error state. > > > > There is ongoing work to define an API for a GuC debug state dump. The > > suggestion for now is to manually disable FW initiated resets in cases > > where debug state is needed. > > > > Signed-off-by: Matthew Brost > > This looks a bit backwards to me: > Definitely a bit hacky but this patch does the best to capture the error as it can, > - I figured we should capture error state when we get the G2H, in which > case I hope we do know which the offending context was that got shot. > We know which context was shot based on the G2H. See 'hung_ce' in this patch. > - For now we're missing the hw state, but we should still be able to > capture the buffers userspace wants us to capture. So that could be > wired up already? Which buffers exactly? We dump all buffers associated with the context. > > But yeah register state capturing needs support from GuC fw. > > I think this is a big enough miss in GuC features that we should list it > on the rfc as a thing to fix. Agree this needs to be fixed. Matt > -Daniel > > > --- > > drivers/gpu/drm/i915/gt/intel_context.c | 20 +++ > > drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ > > drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++- > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 -- > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +-- > > drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++--- > > 7 files changed, 91 insertions(+), 26 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > b/drivers/gpu/drm/i915/gt/intel_context.c > > index 2f01437056a8..3fe7794b2bfd 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > @@ -514,6 +514,26 @@ struct i915_request > > *intel_context_create_request(struct intel_context *ce) > > return rq; > > } > > > > +struct i915_request *intel_context_find_active_request(struct > > intel_context *ce) > > +{ > > + struct i915_request *rq, *active = NULL; > > + unsigned long flags; > > + > > + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine)); > > + > > + spin_lock_irqsave(>guc_active.lock, flags); > > + list_for_each_entry_reverse(rq, >guc_active.requests, > > + sched.link) { > > + if (i915_request_completed(rq)) > > + break; > > + > > + active = rq; > > + } > > + spin_unlock_irqrestore(>guc_active.lock, flags); > > + > > + return active; > > +} > > + > > #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) > > #include "selftest_context.c" > > #endif > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > > b/drivers/gpu/drm/i915/gt/intel_context.h > > index 9b211ca5ecc7..d2b499ed8a05 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.h > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct > > intel_context *ce, > > > > struct i915_request *intel_context_create_request(struct intel_context > > *ce); > > > > +struct i915_request * > > +intel_context_find_active_request(struct intel_context *ce); > > + > > static inline struct intel_ring *__intel_context_ring_size(u64 sz) > > { > > return u64_to_ptr(struct intel_ring, sz); > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h > > b/drivers/gpu/drm/i915/gt/intel_engine.h > > index 3321d0917a99..bb94963a9fa2 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h > > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct > > intel_engine_cs *engine, > >ktime_t *now); > > > > struct i915_request * > > -intel_engine_find_active_request(struct intel_engine_cs *engine); > > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine); > > > > u32 intel_engine_context_size(struct intel_gt *gt, u8 class); > > > > @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs > > *engine, unsigned int sibling) > > return engine->cops->get_sibling(engine, sibling); > > } > > > > +static inline void > > +intel_engine_set_hung_context(struct intel_engine_cs *engine, > > + struct intel_context *ce) > > +{ > > + engine->hung_ce = ce; > > +} > > + > > +static inline void > > +intel_engine_clear_hung_context(struct intel_engine_cs *engine) > > +{ > > + intel_engine_set_hung_context(engine, NULL); > > +} > > + > >
Re: [PATCH] component: Move host device to end of device lists on binding
Quoting Daniel Vetter (2021-05-11 06:39:36) > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki wrote: > > > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd wrote: > > > > [cut] > > > > > > > > > > > > > > I will try it, but then I wonder about things like system wide > > > > > suspend/resume too. The drm encoder chain would need to reimplement > > > > > the > > > > > logic for system wide suspend/resume so that any PM ops attached to > > > > > the > > > > > msm device run in the correct order. Right now the bridge PM ops will > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run. > > > > > After this change, the msm PM ops will run, the bridge PM ops will > > > > > run, > > > > > and then the i2c bus PM ops will run. It feels like that could be a > > > > > problem if we're suspending the DSI encoder while the bridge is still > > > > > active. > > > > > > > > Yup suspend/resume has the exact same problem as shutdown. > > > > > > I think suspend/resume has the exact opposite problem. At least I think > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI, > > > like is happening today. It looks like drm_atomic_helper_shutdown() > > > operates from the top down when we want bottom up? I admit I have no > > > idea what is supposed to happen here. > > > > Why would the system-wide suspend ordering be different from the > > shutdown ordering? > > At least my point was that both shutdown and suspend/resume have the > same problem, and the righ fix is (I think at least) to add these > hooks to the component.c aggregate ops structure. Hence just adding > new callbacks for shutdown will be an incomplete solution. To add proper hooks to component.c we'll need to make the aggregate device into a 'struct device' and make a bus for them that essentially adds the aggregate device to the bus once all the components are registered. The bind/unbind can be ported to probe/remove, and then the aggregate driver can get PM ops that run before the component devices run their PM ops. Let me go try it out and see if I can make it minimally invasive so that the migration path is simple. > > I don't feel like changing the global device order is the right > approach, since essentially that's what component was meant to fix. > Except it's incomplete since it only provides a solution for > bind/unbind and not for shutdown or suspend/resume as other global > state changes. I think some drivers "fixed" this by putting stuff like > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to > make sure that everything is ready with that trick. But that doesn't > compose very well :-/ Yeah it looks like msm is using prepare/complete for this so that it can jump in early and suspend the display pipeline before the components suspend themselves. The shutdown path only has one callback so we can't play the same games.
Re: [PATCH] drm: fix semicolon.cocci warnings
On Wed, May 12, 2021 at 12:11:23AM +0800, kernel test robot wrote: > From: kernel test robot > > drivers/gpu/drm/kmb/kmb_dsi.c:284:3-4: Unneeded semicolon > drivers/gpu/drm/kmb/kmb_dsi.c:304:3-4: Unneeded semicolon > drivers/gpu/drm/kmb/kmb_dsi.c:321:3-4: Unneeded semicolon > drivers/gpu/drm/kmb/kmb_dsi.c:340:3-4: Unneeded semicolon > drivers/gpu/drm/kmb/kmb_dsi.c:364:2-3: Unneeded semicolon > > > Remove unneeded semicolon. > > Generated by: scripts/coccinelle/misc/semicolon.cocci > > Fixes: ade896460e4a ("drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY") > CC: Geert Uytterhoeven > Reported-by: kernel test robot > Signed-off-by: kernel test robot Applied to drm-misc-next for 5.14, thanks for the patch. -Daniel > --- > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master > head: 1140ab592e2ebf8153d2b322604031a8868ce7a5 > commit: ade896460e4a62f5e4a892a98d254937f6f5b64c drm: DRM_KMB_DISPLAY should > depend on ARCH_KEEMBAY > :: branch date: 18 hours ago > :: commit date: 6 months ago > > kmb_dsi.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > --- a/drivers/gpu/drm/kmb/kmb_dsi.c > +++ b/drivers/gpu/drm/kmb/kmb_dsi.c > @@ -281,7 +281,7 @@ static u32 mipi_get_datatype_params(u32 > default: > DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); > return -EINVAL; > - }; > + } > break; > case DSI_LP_DT_PPS_YCBCR422_16B: > data_type_param.size_constraint_pixels = 2; > @@ -301,7 +301,7 @@ static u32 mipi_get_datatype_params(u32 > default: > DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); > return -EINVAL; > - }; > + } > break; > case DSI_LP_DT_LPPS_YCBCR422_20B: > case DSI_LP_DT_PPS_YCBCR422_24B: > @@ -318,7 +318,7 @@ static u32 mipi_get_datatype_params(u32 > default: > DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); > return -EINVAL; > - }; > + } > break; > case DSI_LP_DT_PPS_RGB565_16B: > data_type_param.size_constraint_pixels = 1; > @@ -337,7 +337,7 @@ static u32 mipi_get_datatype_params(u32 > default: > DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); > return -EINVAL; > - }; > + } > break; > case DSI_LP_DT_PPS_RGB666_18B: > data_type_param.size_constraint_pixels = 4; > @@ -361,7 +361,7 @@ static u32 mipi_get_datatype_params(u32 > default: > DRM_ERROR("DSI: Invalid data_type %d\n", data_type); > return -EINVAL; > - }; > + } > > *params = data_type_param; > return 0; -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] Documentation: gpu: Mention the requirements for new properties
On Tue, May 11, 2021 at 05:55:12PM +0200, Maxime Ripard wrote: > New KMS properties come with a bunch of requirements to avoid each > driver from running their own, inconsistent, set of properties, > eventually leading to issues like property conflicts, inconsistencies > between drivers and semantics, etc. > > Let's document what we expect. > > Signed-off-by: Maxime Ripard > --- > Documentation/gpu/drm-kms.rst | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst > index 87e5023e3f55..30f4c376f419 100644 > --- a/Documentation/gpu/drm-kms.rst > +++ b/Documentation/gpu/drm-kms.rst > @@ -463,6 +463,24 @@ KMS Properties > This section of the documentation is primarily aimed at user-space > developers. > For the driver APIs, see the other sections. > > +Requirements > + > + > +KMS drivers might need to add extra properties to support new features. > +Each new property introduced in a driver need to meet a few > +requirements, in addition to the one mentioned above.: > + > +- It must be standardized, with some documentation to describe the > + property can be used. > + > +- It must provide a generic helper in the core code to register that > + property on the object it attaches to. Maybe also include anything that drivers might want to precompute, e.g. we have helpers for cliprects. > + > +- Its content must be decoded by the core and provided in the object object's > + associated state structure. > + > +- An IGT test must be submitted. "... where reasonable." We have that disclaimer already here: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#testing-requirements-for-userspace-api I think would be good to cross-reference the uapi rules in general https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements With the bikesheds addressed: Reviewed-by: Daniel Vetter But this needs ideally a pile of acks from most display driver teams. -Daniel > + > Property Types and Blob Property Support > > > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 1/2] drm: Fix dirtyfb stalls
On Tue, May 11, 2021 at 9:44 AM Daniel Vetter wrote: > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote: > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter wrote: > > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark wrote: > > > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter wrote: > > > > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote: > > > > > > From: Rob Clark > > > > > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on > > > > > > "video > > > > > > mode" type displays, which is pointless and unnecessary. Add an > > > > > > optional helper vfunc to determine if a plane is attached to a CRTC > > > > > > that actually needs dirtyfb, and skip over them. > > > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > > > So this is a bit annoying because the idea of all these "remap legacy > > > > > uapi > > > > > to atomic constructs" helpers is that they shouldn't need/use anything > > > > > beyond what userspace also has available. So adding hacks for them > > > > > feels > > > > > really bad. > > > > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb > > > > (or similar) is actually required or is a no-op. > > > > > > > > But it is perhaps less of a problem because this essentially boils > > > > down to "x11 vs wayland", and it seems like wayland compositors for > > > > non-vsync'd rendering just pageflips and throws away extra frames from > > > > the app? > > > > > > Yeah it's about not adequately batching up rendering and syncing with > > > hw. bare metal x11 is just especially stupid about it :-) > > > > > > > > Also I feel like it's not entirely the right thing to do here either. > > > > > We've had this problem already on the fbcon emulation side (which also > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and > > > > > the fix > > > > > there was to have a worker which batches up all the updates and > > > > > avoids any > > > > > stalls in bad places. > > > > > > > > I'm not too worried about fbcon not being able to render faster than > > > > vblank. OTOH it is a pretty big problem for x11 > > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do > > > the same with fbcon, which trivially can get ahead of vblank otherwise > > > (if sometimes flushes each character, so you have to pile them up into > > > a single update if that's still pending). > > > > > > > > Since this is for frontbuffer rendering userspace only we can > > > > > probably get > > > > > away with assuming there's only a single fb, so the implementation > > > > > becomes > > > > > pretty simple: > > > > > > > > > > - 1 worker, and we keep track of a single pending fb > > > > > - if there's already a dirty fb pending on a different fb, we stall > > > > > for > > > > > the worker to start processing that one already (i.e. the fb we > > > > > track is > > > > > reset to NULL) > > > > > - if it's pending on the same fb we just toss away all the updates > > > > > and go > > > > > with a full update, since merging the clip rects is too much work > > > > > :-) I > > > > > think there's helpers so you could be slightly more clever and just > > > > > have > > > > > an overall bounding box > > > > > > > > This doesn't really fix the problem, you still end up delaying sending > > > > the next back-buffer to mesa > > > > > > With this the dirtyfb would never block. Also glorious frontbuffer > > > tracking corruption is possible, but that's not the kernel's problem. > > > So how would anything get held up in userspace. > > > > the part about stalling if a dirtyfb is pending was what I was worried > > about.. but I suppose you meant the worker stalling, rather than > > userspace stalling (where I had interpreted it the other way around). > > As soon as userspace needs to stall, you're losing again. > > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts > of dirtyfb request in the kernel. > > But also I never expect userspace that uses dirtyfb to actually hit this > stall point (otherwise we'd need to look at this again). It would really > be only there as defense against abuse. I don't believe modesetting ddx throttles dirtyfb, it (indirectly) calls this from it's BlockHandler.. so if you do end up blocking after the N'th dirtyfb, you are still going to end up stalling for vblank, you are just deferring that for a frame or two.. The thing is, for a push style panel, you don't necessarily have to wait for "vblank" (because "vblank" isn't necessarily a real thing), so in that scenario dirtyfb could in theory be fast. What you want to do is fundamentally different for push vs pull style displays. > > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a > > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out > > > > of drm_atomic_helper_dirtyfb() > > > > > > That's still using information that userspace doesn't
Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote: > > Add lrc descriptor context lookup array which can resolve the > > intel_context from the lrc descriptor index. In addition to lookup, it > > can determine in the lrc descriptor context is currently registered with > > the GuC by checking if an entry for a descriptor index is present. > > Future patches in the series will make use of this array. > > > > Cc: John Harrison > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 5 +++ > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +-- > > 2 files changed, 35 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > index d84f37afb9d8..2eb6c497e43c 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > @@ -6,6 +6,8 @@ > > #ifndef _INTEL_GUC_H_ > > #define _INTEL_GUC_H_ > > > > +#include "linux/xarray.h" > > + > > #include "intel_uncore.h" > > #include "intel_guc_fw.h" > > #include "intel_guc_fwif.h" > > @@ -47,6 +49,9 @@ struct intel_guc { > > struct i915_vma *lrc_desc_pool; > > void *lrc_desc_pool_vaddr; > > > > + /* guc_id to intel_context lookup */ > > + struct xarray context_lookup; > > The current code sets a disastrous example, but for stuff like this it's > always good to explain the locking, and who's holding references and how > you're handling cycles. Since I guess the intel_context also holds the > guc_id alive somehow. > I think (?) I know what you mean by this comment. How about adding: 'If an entry in the the context_lookup is present, that means a context associated with the guc_id is registered with the GuC. We use this xarray as a lookup mechanism when the GuC communicate with the i915 about the context.' > Again holds for the entire series, where it makes sense (as in we don't > expect to rewrite the entire code anyway). Slightly out of order but one of the last patches in the series, 'Update GuC documentation' adds a big section of comments that attempts to clarify how all of this code works. I likely should add a section explaining the data structures as well. Matt > -Daniel > > > + > > /* Control params for fw initialization */ > > u32 params[GUC_CTL_MAX_DWORDS]; > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 6acc1ef34f92..c2b6d27404b7 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct > > rb_node *rb) > > return rb_entry(rb, struct i915_priolist, node); > > } > > > > -/* Future patches will use this function */ > > -__attribute__ ((unused)) > > static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 > > index) > > { > > struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; > > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct > > intel_guc *guc, u32 index) > > return [index]; > > } > > > > +static inline struct intel_context *__get_context(struct intel_guc *guc, > > u32 id) > > +{ > > + struct intel_context *ce = xa_load(>context_lookup, id); > > + > > + GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS); > > + > > + return ce; > > +} > > + > > static int guc_lrc_desc_pool_create(struct intel_guc *guc) > > { > > u32 size; > > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc > > *guc) > > i915_vma_unpin_and_release(>lrc_desc_pool, I915_VMA_RELEASE_MAP); > > } > > > > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) > > +{ > > + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); > > + > > + memset(desc, 0, sizeof(*desc)); > > + xa_erase_irq(>context_lookup, id); > > +} > > + > > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) > > +{ > > + return __get_context(guc, id); > > +} > > + > > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, > > + struct intel_context *ce) > > +{ > > + xa_store_irq(>context_lookup, id, ce, GFP_ATOMIC); > > +} > > + > > static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) > > { > > /* Leaving stub as this function will be used in future patches */ > > @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc) > > */ > > GEM_BUG_ON(!guc->lrc_desc_pool); > > > > + xa_init_flags(>context_lookup, XA_FLAGS_LOCK_IRQ); > > + > > return 0; > > } > > > > -- > > 2.28.0 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
[PATCH] drm/doc/rfc: drop the i915_gem_lmem.h header
The proper headers have now landed in include/uapi/drm/i915_drm.h, so we can drop i915_gem_lmem.h and instead just reference the real headers for pulling in the kernel doc. Suggested-by: Daniel Vetter Signed-off-by: Matthew Auld --- Documentation/gpu/rfc/i915_gem_lmem.h | 237 Documentation/gpu/rfc/i915_gem_lmem.rst | 6 +- 2 files changed, 3 insertions(+), 240 deletions(-) delete mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h deleted file mode 100644 index d9c61bea0556.. --- a/Documentation/gpu/rfc/i915_gem_lmem.h +++ /dev/null @@ -1,237 +0,0 @@ -/** - * enum drm_i915_gem_memory_class - Supported memory classes - */ -enum drm_i915_gem_memory_class { - /** @I915_MEMORY_CLASS_SYSTEM: System memory */ - I915_MEMORY_CLASS_SYSTEM = 0, - /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */ - I915_MEMORY_CLASS_DEVICE, -}; - -/** - * struct drm_i915_gem_memory_class_instance - Identify particular memory region - */ -struct drm_i915_gem_memory_class_instance { - /** @memory_class: See enum drm_i915_gem_memory_class */ - __u16 memory_class; - - /** @memory_instance: Which instance */ - __u16 memory_instance; -}; - -/** - * struct drm_i915_memory_region_info - Describes one region as known to the - * driver. - * - * Note that we reserve some stuff here for potential future work. As an example - * we might want expose the capabilities for a given region, which could include - * things like if the region is CPU mappable/accessible, what are the supported - * mapping types etc. - * - * Note that to extend struct drm_i915_memory_region_info and struct - * drm_i915_query_memory_regions in the future the plan is to do the following: - * - * .. code-block:: C - * - * struct drm_i915_memory_region_info { - * struct drm_i915_gem_memory_class_instance region; - * union { - * __u32 rsvd0; - * __u32 new_thing1; - * }; - * ... - * union { - * __u64 rsvd1[8]; - * struct { - * __u64 new_thing2; - * __u64 new_thing3; - * ... - * }; - * }; - * }; - * - * With this things should remain source compatible between versions for - * userspace, even as we add new fields. - * - * Note this is using both struct drm_i915_query_item and struct drm_i915_query. - * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS - * at _i915_query_item.query_id. - */ -struct drm_i915_memory_region_info { - /** @region: The class:instance pair encoding */ - struct drm_i915_gem_memory_class_instance region; - - /** @rsvd0: MBZ */ - __u32 rsvd0; - - /** @probed_size: Memory probed by the driver (-1 = unknown) */ - __u64 probed_size; - - /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ - __u64 unallocated_size; - - /** @rsvd1: MBZ */ - __u64 rsvd1[8]; -}; - -/** - * struct drm_i915_query_memory_regions - * - * The region info query enumerates all regions known to the driver by filling - * in an array of struct drm_i915_memory_region_info structures. - * - * Example for getting the list of supported regions: - * - * .. code-block:: C - * - * struct drm_i915_query_memory_regions *info; - * struct drm_i915_query_item item = { - * .query_id = DRM_I915_QUERY_MEMORY_REGIONS; - * }; - * struct drm_i915_query query = { - * .num_items = 1, - * .items_ptr = (uintptr_t), - * }; - * int err, i; - * - * // First query the size of the blob we need, this needs to be large - * // enough to hold our array of regions. The kernel will fill out the - * // item.length for us, which is the number of bytes we need. - * err = ioctl(fd, DRM_IOCTL_I915_QUERY, ); - * if (err) ... - * - * info = calloc(1, item.length); - * // Now that we allocated the required number of bytes, we call the ioctl - * // again, this time with the data_ptr pointing to our newly allocated - * // blob, which the kernel can then populate with the all the region info. - * item.data_ptr = (uintptr_t), - * - * err = ioctl(fd, DRM_IOCTL_I915_QUERY, ); - * if (err) ... - * - * // We can now access each region in the array - * for (i = 0; i < info->num_regions; i++) { - * struct drm_i915_memory_region_info mr = info->regions[i]; - * u16 class = mr.region.class; - * u16 instance = mr.region.instance; - * - * - * } - * - * free(info); - */ -struct drm_i915_query_memory_regions { - /** @num_regions: Number of supported regions */ - __u32 num_regions; - - /** @rsvd: MBZ
Re: [PATCH] drm/i915: Add relocation exceptions for two other platforms
On Tue, May 11, 2021 at 10:31:39AM +0200, Zbigniew Kempczyński wrote: > We have established previously we stop using relocations starting > from gen12 platforms with Tigerlake as an exception. Unfortunately > we need extend transition period and support relocations for two > other igfx platforms - Rocketlake and Alderlake. > > Signed-off-by: Zbigniew Kempczyński > Cc: Dave Airlie > Cc: Daniel Vetter > Cc: Jason Ekstrand So the annoying thing here is that now media-driver is fixed: https://github.com/intel/media-driver/commit/144020c37770083974bedf59902b70b8f444c799 Which means igt is really the only thing left. Dave, is this still ok for an acked exception, or is this now leaning towards "just fix igt"? -Daniel > --- > drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > index 297143511f99..f80da1d6d9b2 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > @@ -496,11 +496,15 @@ eb_validate_vma(struct i915_execbuffer *eb, > struct drm_i915_gem_exec_object2 *entry, > struct i915_vma *vma) > { > - /* Relocations are disallowed for all platforms after TGL-LP. This > - * also covers all platforms with local memory. > + /* > + * Relocations are disallowed starting from gen12 with some exceptions > + * - TGL/RKL/ADL. >*/ > if (entry->relocation_count && > - INTEL_GEN(eb->i915) >= 12 && !IS_TIGERLAKE(eb->i915)) > + INTEL_GEN(eb->i915) >= 12 && !(IS_TIGERLAKE(eb->i915) || > +IS_ROCKETLAKE(eb->i915) || > +IS_ALDERLAKE_S(eb->i915) || > +IS_ALDERLAKE_P(eb->i915))) > return -EINVAL; > > if (unlikely(entry->flags & eb->invalid_flags)) > -- > 2.26.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH RFC 1/3] drm: Add drm_plane_add_modifiers()
On Mon, May 10, 2021 at 09:49:38PM -0400, Tina Zhang wrote: > Add a function to add modifiers to a plane. > > Signed-off-by: Tina Zhang For one, new functions for drivers needs kerneldoc. But the real issue here is that you're suppoed to supply the modifiers when creating the plane, not later on. So this function doesn't make sense. Please fix virtio code to use the existing functions (drm_universal_plane_init() to be specific), or explain what that's not possible. -Daniel > --- > drivers/gpu/drm/drm_plane.c | 41 + > include/drm/drm_plane.h | 3 +++ > 2 files changed, 44 insertions(+) > > diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c > index b570a480090a..793b16d84f86 100644 > --- a/drivers/gpu/drm/drm_plane.c > +++ b/drivers/gpu/drm/drm_plane.c > @@ -288,6 +288,47 @@ int drm_universal_plane_init(struct drm_device *dev, > struct drm_plane *plane, > } > EXPORT_SYMBOL(drm_universal_plane_init); > > +int drm_plane_add_modifiers(struct drm_device *dev, > + struct drm_plane *plane, > + const uint64_t *format_modifiers) > +{ > + struct drm_mode_config *config = >mode_config; > + const uint64_t *temp_modifiers = format_modifiers; > + unsigned int format_modifier_count = 0; > + > + /* > + * Only considering adding modifiers when no modifier was > + * added to that plane before. > + */ > + if (!temp_modifiers || plane->modifier_count) > + return -EINVAL; > + > + while (*temp_modifiers++ != DRM_FORMAT_MOD_INVALID) > + format_modifier_count++; > + > + if (format_modifier_count) > + config->allow_fb_modifiers = true; > + > + plane->modifier_count = format_modifier_count; > + plane->modifiers = kmalloc_array(format_modifier_count, > + sizeof(format_modifiers[0]), > + GFP_KERNEL); > + > + if (format_modifier_count && !plane->modifiers) { > + DRM_DEBUG_KMS("out of memory when allocating plane\n"); > + return -ENOMEM; > + } > + > + memcpy(plane->modifiers, format_modifiers, > +format_modifier_count * sizeof(format_modifiers[0])); > + if (config->allow_fb_modifiers) > + create_in_format_blob(dev, plane); > + > + return 0; > +} > +EXPORT_SYMBOL(drm_plane_add_modifiers); > + > + > int drm_plane_register_all(struct drm_device *dev) > { > unsigned int num_planes = 0; > diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h > index 50c23eb432b7..0dacdeffc3bc 100644 > --- a/include/drm/drm_plane.h > +++ b/include/drm/drm_plane.h > @@ -827,6 +827,9 @@ int drm_universal_plane_init(struct drm_device *dev, >const uint64_t *format_modifiers, >enum drm_plane_type type, >const char *name, ...); > +int drm_plane_add_modifiers(struct drm_device *dev, > +struct drm_plane *plane, > +const uint64_t *format_modifiers); > int drm_plane_init(struct drm_device *dev, > struct drm_plane *plane, > uint32_t possible_crtcs, > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] component: Move host device to end of device lists on binding
Quoting Rafael J. Wysocki (2021-05-11 03:52:06) > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd wrote: > > [cut] > > > > > > > > > > I will try it, but then I wonder about things like system wide > > > > suspend/resume too. The drm encoder chain would need to reimplement the > > > > logic for system wide suspend/resume so that any PM ops attached to the > > > > msm device run in the correct order. Right now the bridge PM ops will > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run. > > > > After this change, the msm PM ops will run, the bridge PM ops will run, > > > > and then the i2c bus PM ops will run. It feels like that could be a > > > > problem if we're suspending the DSI encoder while the bridge is still > > > > active. > > > > > > Yup suspend/resume has the exact same problem as shutdown. > > > > I think suspend/resume has the exact opposite problem. At least I think > > the correct order is to suspend the bridge, then the encoder, i.e. DSI, > > like is happening today. It looks like drm_atomic_helper_shutdown() > > operates from the top down when we want bottom up? I admit I have no > > idea what is supposed to happen here. > > Why would the system-wide suspend ordering be different from the > shutdown ordering? I don't really know. I'm mostly noting that today the order of suspend is to suspend the bridge device first and then the aggregate device. If the suspend of the aggregate device is traversing the devices like drm_atomic_helper_shutdown() then it would operate on the bridge device after it has been suspended, like is happening during shutdown. But it looks like that isn't happening. At least for the msm driver we're suspending the aggregate device after the bridge, and there are some weird usages of prepare and complete in there (see msm_pm_prepare() and msm_pm_complete) which makes me think that it's all working around this component code. The prepare phase is going to suspend the display pipeline, and then the bridge device will run its suspend hooks, and then the aggregate driver will run its suspend hooks. If we had a proper device for the aggregate device instead of the bind/unbind component hooks we could clean this up.
Re: [RFC] Implicit vs explicit user fence sync
On Tue, May 11, 2021 at 05:32:29PM +0200, Christian König wrote: > Am 11.05.21 um 16:23 schrieb Daniel Vetter: > > On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote: > > > Am 11.05.21 um 09:31 schrieb Daniel Vetter: > > > > [SNIP] > > > > > > And that's just the one ioctl I know is big trouble, I'm sure we'll > > > > > > find > > > > > > more funny corner cases when we roll out explicit user fencing. > > > > > I think we can just ignore sync_file. As far as it concerns me that > > > > > UAPI is > > > > > pretty much dead. > > > > Uh that's rather bold. Android is built on it. Currently atomic kms is > > > > built on it. > > > To be honest I don't think we care about Android at all. > > we = amd or we = upstream here? > > we = amd, for everybody else that is certainly a different topic. > > But for now AMD is the only one running into this problem. > > Could be that Nouveau sees this as well with the next hw generation, but who > knows? > > > > > Why is this not much of a problem if it's just within one driver? > > > Because inside the same driver I can easily add the waits before > > > submitting > > > the MM work as necessary. > > What is MM work here now? > > MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware. > > > > > > > > Adding implicit synchronization on top of that is then rather > > > > > > > trivial. > > > > > > Well that's what I disagree with, since I already see some problems > > > > > > that I > > > > > > don't think we can overcome (the atomic ioctl is one). And that's > > > > > > with us > > > > > > only having a fairly theoretical understanding of the overall > > > > > > situation. > > > > > But how should we then ever support user fences with the atomic IOCTL? > > > > > > > > > > We can't wait in user space since that will disable the support for > > > > > waiting > > > > > in the hardware. > > > > Well, figure it out :-) > > > > > > > > This is exactly why I'm not seeing anything solved with just rolling a > > > > function call to a bunch of places, because it's pretending all things > > > > are > > > > solved when clearly that's not the case. > > > > > > > > I really think what we need is to first figure out how to support > > > > userspace fences as explicit entities across the stack, maybe with > > > > something like this order: > > > > 1. enable them purely within a single userspace driver (like vk with > > > > winsys disabled, or something else like that except not amd because > > > > there's this amdkfd split for "real" compute) > > > > 1a. including atomic ioctl, e.g. for vk direct display support this can > > > > be > > > > used without cross-process sharing, new winsys protocols and all that > > > > fun > > > > 2. figure out how to transport these userspace fences with something > > > > like > > > > drm_syncobj > > > > 2a. figure out the compat story for drivers which dont do userspace > > > > fences > > > > 2b. figure out how to absorb the overhead if the winsys/compositor > > > > doesn't > > > > support explicit sync > > > > 3. maybe figure out how to make this all happen magically with implicit > > > > sync, if we really, really care > > > > > > > > If we do 3 before we've nailed all these problems, we're just > > > > guaranteeing > > > > we'll get the wrong solutions and so we'll then have 3 ways of doing > > > > userspace fences > > > > - the butchered implicit one that didn't quite work > > > > - the explicit one > > > > - the not-so-butchered implicit one with the lessons from the properly > > > > done explicit one > > > > > > > > The thing is, if you have no idea how to integrate userspace fences > > > > explicitly into atomic ioctl, then you definitely have no idea how to do > > > > it implicitly :-) > > > Well I agree on that. But the question is still how would you do explicit > > > with atomic? > > If you supply an userpace fence (is that what we call them now) as > > in-fence, then your only allowed to get a userspace fence as out-fence. > > Yeah, that part makes perfectly sense. But I don't see the problem with > that? > > > That way we > > - don't block anywhere we shouldn't > > - don't create a dma_fence out of a userspace fence > > > > The problem is this completely breaks your "magically make implicit > > fencing with userspace fences" plan. > > Why? If you allow implicit fencing then you can end up with - an implicit userspace fence as the in-fence - but an explicit dma_fence as the out fence Which is not allowed. So there's really no way to make this work, except if you stall in the ioctl, which also doesn't work. So you have to do an uapi change here. At that point we might as well do it right. Of course if you only care about some specific compositors (or maybe only the -amdgpu Xorg driver even) then this isn't a concern, but atomic is cross-driver so we can't do that. Or at least I don't see a way how to do this without causing endless amounts of fun down the road. > > So I have a plan here,
Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote: > > -Original Message- > > From: Martin Peres > > Sent: Tuesday, May 11, 2021 1:06 AM > > To: Daniel Vetter > > Cc: Jason Ekstrand ; Brost, Matthew > > ; intel-gfx ; > > dri-devel ; Ursulin, Tvrtko > > ; Ekstrand, Jason ; > > Ceraolo Spurio, Daniele ; Bloomfield, Jon > > ; Vetter, Daniel ; > > Harrison, John C > > Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915 > > > > On 10/05/2021 19:33, Daniel Vetter wrote: > > > On Mon, May 10, 2021 at 3:55 PM Martin Peres > > wrote: > > >> > > >> On 10/05/2021 02:11, Jason Ekstrand wrote: > > >>> On May 9, 2021 12:12:36 Martin Peres wrote: > > >>> > > Hi, > > > > On 06/05/2021 22:13, Matthew Brost wrote: > > > Basic GuC submission support. This is the first bullet point in the > > > upstreaming plan covered in the following RFC [1]. > > > > > > At a very high level the GuC is a piece of firmware which sits between > > > the i915 and the GPU. It offloads some of the scheduling of contexts > > > from the i915 and programs the GPU to submit contexts. The i915 > > > communicates with the GuC and the GuC communicates with the > > GPU. > > > > May I ask what will GuC command submission do that execlist > > won't/can't > > do? And what would be the impact on users? Even forgetting the > > troubled > > history of GuC (instability, performance regression, poor level of user > > support, 6+ years of trying to upstream it...), adding this much code > > and doubling the amount of validation needed should come with a > > rationale making it feel worth it... and I am not seeing here. Would > > you > > mind providing the rationale behind this work? > > > > > > > > GuC submission will be disabled by default on all current upstream > > > platforms behind a module parameter - enable_guc. A value of 3 will > > > enable submission and HuC loading via the GuC. GuC submission > > should > > > work on all gen11+ platforms assuming the GuC firmware is present. > > > > What is the plan here when it comes to keeping support for execlist? I > > am afraid that landing GuC support in Linux is the first step towards > > killing the execlist, which would force users to use proprietary > > firmwares that even most Intel engineers have little influence over. > > Indeed, if "drm/i915/guc: Disable semaphores when using GuC > > scheduling" > > which states "Disable semaphores when using GuC scheduling as > > semaphores > > are broken in the current GuC firmware." is anything to go by, it means > > that even Intel developers seem to prefer working around the GuC > > firmware, rather than fixing it. > > >>> > > >>> Yes, landing GuC support may be the first step in removing execlist > > >>> support. The inevitable reality is that GPU scheduling is coming and > > >>> likely to be there only path in the not-too-distant future. (See also > > >>> the ongoing thread with AMD about fences.) I'm not going to pass > > >>> judgement on whether or not this is a good thing. I'm just reading the > > >>> winds and, in my view, this is where things are headed for good or ill. > > >>> > > >>> In answer to the question above, the answer to "what do we gain from > > >>> GuC?" may soon be, "you get to use your GPU." We're not there yet > > and, > > >>> again, I'm not necessarily advocating for it, but that is likely where > > >>> things are headed. > > >> > > >> This will be a sad day, especially since it seems fundamentally opposed > > >> with any long-term support, on top of taking away user freedom to > > >> fix/tweak their system when Intel won't. > > >> > > >>> A firmware-based submission model isn't a bad design IMO and, aside > > from > > >>> the firmware freedom issues, I think there are actual advantages to the > > >>> model. Immediately, it'll unlock a few features like parallel submission > > >>> (more on that in a bit) and long-running compute because they're > > >>> implemented in GuC and the work to implement them properly in the > > >>> execlist scheduler is highly non-trivial. Longer term, it may (no > > >>> guarantees) unlock some performance by getting the kernel out of the > > way. > > >> > > >> Oh, I definitely agree with firmware-based submission model not being a > > >> bad design. I was even cheering for it in 2015. Experience with it made > > >> me regret that deeply since :s > > >> > > >> But with the DRM scheduler being responsible for most things, I fail to > > >> see what we could offload in the GuC except context switching (like > > >> every other manufacturer). The problem is, the GuC does way more than > > >> just switching registers in bulk, and if the number of revisions of the > > >> GuC is anything to go by, it is way too complex for me to feel > > >> comfortable with it. > > > > > > We need to flesh out that part of the
Re: [PATCH 1/2] drm: Fix dirtyfb stalls
On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote: > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter wrote: > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark wrote: > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter wrote: > > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote: > > > > > From: Rob Clark > > > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video > > > > > mode" type displays, which is pointless and unnecessary. Add an > > > > > optional helper vfunc to determine if a plane is attached to a CRTC > > > > > that actually needs dirtyfb, and skip over them. > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > So this is a bit annoying because the idea of all these "remap legacy > > > > uapi > > > > to atomic constructs" helpers is that they shouldn't need/use anything > > > > beyond what userspace also has available. So adding hacks for them feels > > > > really bad. > > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb > > > (or similar) is actually required or is a no-op. > > > > > > But it is perhaps less of a problem because this essentially boils > > > down to "x11 vs wayland", and it seems like wayland compositors for > > > non-vsync'd rendering just pageflips and throws away extra frames from > > > the app? > > > > Yeah it's about not adequately batching up rendering and syncing with > > hw. bare metal x11 is just especially stupid about it :-) > > > > > > Also I feel like it's not entirely the right thing to do here either. > > > > We've had this problem already on the fbcon emulation side (which also > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and the > > > > fix > > > > there was to have a worker which batches up all the updates and avoids > > > > any > > > > stalls in bad places. > > > > > > I'm not too worried about fbcon not being able to render faster than > > > vblank. OTOH it is a pretty big problem for x11 > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do > > the same with fbcon, which trivially can get ahead of vblank otherwise > > (if sometimes flushes each character, so you have to pile them up into > > a single update if that's still pending). > > > > > > Since this is for frontbuffer rendering userspace only we can probably > > > > get > > > > away with assuming there's only a single fb, so the implementation > > > > becomes > > > > pretty simple: > > > > > > > > - 1 worker, and we keep track of a single pending fb > > > > - if there's already a dirty fb pending on a different fb, we stall for > > > > the worker to start processing that one already (i.e. the fb we track > > > > is > > > > reset to NULL) > > > > - if it's pending on the same fb we just toss away all the updates and > > > > go > > > > with a full update, since merging the clip rects is too much work :-) > > > > I > > > > think there's helpers so you could be slightly more clever and just > > > > have > > > > an overall bounding box > > > > > > This doesn't really fix the problem, you still end up delaying sending > > > the next back-buffer to mesa > > > > With this the dirtyfb would never block. Also glorious frontbuffer > > tracking corruption is possible, but that's not the kernel's problem. > > So how would anything get held up in userspace. > > the part about stalling if a dirtyfb is pending was what I was worried > about.. but I suppose you meant the worker stalling, rather than > userspace stalling (where I had interpreted it the other way around). > As soon as userspace needs to stall, you're losing again. Nah, I did mean userspace stalling, so we can't pile up unlimited amounts of dirtyfb request in the kernel. But also I never expect userspace that uses dirtyfb to actually hit this stall point (otherwise we'd need to look at this again). It would really be only there as defense against abuse. > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out > > > of drm_atomic_helper_dirtyfb() > > > > That's still using information that userspace doesn't have, which is a > > bit irky. We might as well go with your thing here then. > > arguably, this is something we should expose to userspace.. for DSI > command-mode panels, you probably want to make a different decision > with regard to how many buffers in your flip-chain.. > > Possibly we should add/remove the fb_damage_clips property depending > on the display type (ie. video/pull vs cmd/push mode)? I'm not sure whether atomic actually needs this exposed: - clients will do full flips for every frame anyway, I've not heard of anyone seriously doing frontbuffer rendering. - transporting the cliprects around and then tossing them if the driver doesn't need them in their flip is probably not a measurable win But yeah if I'm wrong and we have a need here and it's useful, then exposing
Re: [PATCH v6 04/15] swiotlb: Add restricted DMA pool initialization
On Mon, May 10, 2021 at 11:03 PM Christoph Hellwig wrote: > > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > +#include > > +#include > > +#include > > +#include > > +#include > > +#endif > > I don't think any of this belongs into swiotlb.c. Marking > swiotlb_init_io_tlb_mem non-static and having all this code in a separate > file is probably a better idea. Will do in the next version. > > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem, > > + struct device *dev) > > +{ > > + struct io_tlb_mem *mem = rmem->priv; > > + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT; > > + > > + if (dev->dma_io_tlb_mem) > > + return 0; > > + > > + /* Since multiple devices can share the same pool, the private data, > > + * io_tlb_mem struct, will be initialized by the first device attached > > + * to it. > > + */ > > This is not the normal kernel comment style. Will fix this in the next version. > > > +#ifdef CONFIG_ARM > > + if (!PageHighMem(pfn_to_page(PHYS_PFN(rmem->base { > > + kfree(mem); > > + return -EINVAL; > > + } > > +#endif /* CONFIG_ARM */ > > And this is weird. Why would ARM have such a restriction? And if we have > such rstrictions it absolutely belongs into an arch helper. Now I think the CONFIG_ARM can just be removed? The goal here is to make sure we're using linear map and can safely use phys_to_dma/dma_to_phys. > > > + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false); > > + > > + rmem->priv = mem; > > + > > +#ifdef CONFIG_DEBUG_FS > > + if (!debugfs_dir) > > + debugfs_dir = debugfs_create_dir("swiotlb", NULL); > > + > > + swiotlb_create_debugfs(mem, rmem->name, debugfs_dir); > > Doesn't the debugfs_create_dir belong into swiotlb_create_debugfs? Also > please use IS_ENABLEd or a stub to avoid ifdefs like this. Will move it into swiotlb_create_debugfs and use IS_ENABLED in the next version.
Re: [PATCH v6 05/15] swiotlb: Add a new get_io_tlb_mem getter
On Mon, May 10, 2021 at 11:03 PM Christoph Hellwig wrote: > > > +static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev) > > +{ > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > + if (dev && dev->dma_io_tlb_mem) > > + return dev->dma_io_tlb_mem; > > +#endif /* CONFIG_DMA_RESTRICTED_POOL */ > > + > > + return io_tlb_default_mem; > > Given that we're also looking into a not addressing restricted pool > I'd rather always assign the active pool to dev->dma_io_tlb_mem and > do away with this helper. Where do you think is the proper place to do the assignment? First time calling swiotlb_map? or in of_dma_configure_id?
Re: [PATCH v6 08/15] swiotlb: Bounce data from/to restricted DMA pool if available
On Mon, May 10, 2021 at 11:05 PM Christoph Hellwig wrote: > > > +static inline bool is_dev_swiotlb_force(struct device *dev) > > +{ > > +#ifdef CONFIG_DMA_RESTRICTED_POOL > > + if (dev->dma_io_tlb_mem) > > + return true; > > +#endif /* CONFIG_DMA_RESTRICTED_POOL */ > > + return false; > > +} > > + > > > /* If SWIOTLB is active, use its maximum mapping size */ > > if (is_swiotlb_active(dev) && > > - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) > > + (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE || > > + is_dev_swiotlb_force(dev))) > > This is a mess. I think the right way is to have an always_bounce flag > in the io_tlb_mem structure instead. Then the global swiotlb_force can > go away and be replace with this and the fact that having no > io_tlb_mem structure at all means forced no buffering (after a little > refactoring). Will do in the next version.
Re: [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin
On Tue, May 11, 2021 at 05:37:54PM +0200, Daniel Vetter wrote: > On Thu, May 06, 2021 at 12:14:03PM -0700, Matthew Brost wrote: > > Disable engine barriers for unpinning with GuC. This feature isn't > > needed with the GuC as it disables context scheduling before unpinning > > which guarantees the HW will not reference the context. Hence it is > > not necessary to defer unpinning until a kernel context request > > completes on each engine in the context engine mask. > > > > Cc: John Harrison > > Signed-off-by: Matthew Brost > > Signed-off-by: Daniele Ceraolo Spurio > > Instead of these ifs in the code, can we push this barrier business down > into backends? > Not a bad idea. This is an example of what I think of implict behavior of the backend creeping into the higher levels. > Not in this series, but as one of the things to sort out as part of the > conversion to drm/scheduler. Agree. After basic GuC submission gets merged maybe we go through the code and remove all the implict backend assumptions. Matt > -Daniel > > > --- > > drivers/gpu/drm/i915/gt/intel_context.c| 2 +- > > drivers/gpu/drm/i915/gt/intel_context.h| 1 + > > drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++ > > drivers/gpu/drm/i915/i915_active.c | 3 +++ > > 4 files changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > > b/drivers/gpu/drm/i915/gt/intel_context.c > > index 1499b8aace2a..7f97753ab164 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.c > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > > @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct > > intel_context *ce) > > > > __i915_active_acquire(>active); > > > > - if (intel_context_is_barrier(ce)) > > + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) > > return 0; > > > > /* Preallocate tracking nodes */ > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > > b/drivers/gpu/drm/i915/gt/intel_context.h > > index 92ecbab8c1cd..9b211ca5ecc7 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_context.h > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > > @@ -16,6 +16,7 @@ > > #include "intel_engine_types.h" > > #include "intel_ring_types.h" > > #include "intel_timeline_types.h" > > +#include "uc/intel_guc_submission.h" > > > > #define CE_TRACE(ce, fmt, ...) do { > > \ > > const struct intel_context *ce__ = (ce);\ > > diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c > > b/drivers/gpu/drm/i915/gt/selftest_context.c > > index 26685b927169..fa7b99a671dd 100644 > > --- a/drivers/gpu/drm/i915/gt/selftest_context.c > > +++ b/drivers/gpu/drm/i915/gt/selftest_context.c > > @@ -209,7 +209,13 @@ static int __live_active_context(struct > > intel_engine_cs *engine) > > * This test makes sure that the context is kept alive until a > > * subsequent idle-barrier (emitted when the engine wakeref hits 0 > > * with no more outstanding requests). > > +* > > +* In GuC submission mode we don't use idle barriers and we instead > > +* get a message from the GuC to signal that it is safe to unpin the > > +* context from memory. > > */ > > + if (intel_engine_uses_guc(engine)) > > + return 0; > > > > if (intel_engine_pm_is_awake(engine)) { > > pr_err("%s is awake before starting %s!\n", > > @@ -357,7 +363,11 @@ static int __live_remote_context(struct > > intel_engine_cs *engine) > > * on the context image remotely (intel_context_prepare_remote_request), > > * which inserts foreign fences into intel_context.active, does not > > * clobber the idle-barrier. > > +* > > +* In GuC submission mode we don't use idle barriers. > > */ > > + if (intel_engine_uses_guc(engine)) > > + return 0; > > > > if (intel_engine_pm_is_awake(engine)) { > > pr_err("%s is awake before starting %s!\n", > > diff --git a/drivers/gpu/drm/i915/i915_active.c > > b/drivers/gpu/drm/i915/i915_active.c > > index b1aa1c482c32..9a264898bb91 100644 > > --- a/drivers/gpu/drm/i915/i915_active.c > > +++ b/drivers/gpu/drm/i915/i915_active.c > > @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active > > *ref) > > > > GEM_BUG_ON(i915_active_is_idle(ref)); > > > > + if (llist_empty(>preallocated_barriers)) > > + return; > > + > > /* > > * Transfer the list of preallocated barriers into the > > * i915_active rbtree, but only as proto-nodes. They will be > > -- > > 2.28.0 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [PATCH] Documentation: gpu: Mention the requirements for new properties
On Tue, May 11, 2021 at 11:55 AM Maxime Ripard wrote: > > New KMS properties come with a bunch of requirements to avoid each > driver from running their own, inconsistent, set of properties, > eventually leading to issues like property conflicts, inconsistencies > between drivers and semantics, etc. > > Let's document what we expect. > > Signed-off-by: Maxime Ripard > --- > Documentation/gpu/drm-kms.rst | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst > index 87e5023e3f55..30f4c376f419 100644 > --- a/Documentation/gpu/drm-kms.rst > +++ b/Documentation/gpu/drm-kms.rst > @@ -463,6 +463,24 @@ KMS Properties > This section of the documentation is primarily aimed at user-space > developers. > For the driver APIs, see the other sections. > > +Requirements > + > + > +KMS drivers might need to add extra properties to support new features. > +Each new property introduced in a driver need to meet a few > +requirements, in addition to the one mentioned above.: > + > +- It must be standardized, with some documentation to describe the "to describe how the" With that fixed, it looks good to me. Alex > + property can be used. > + > +- It must provide a generic helper in the core code to register that > + property on the object it attaches to. > + > +- Its content must be decoded by the core and provided in the object > + associated state structure. > + > +- An IGT test must be submitted. > + > Property Types and Blob Property Support > > > -- > 2.31.1 >
Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset
On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote: > We receive notification of an engine reset from GuC at its > completion. Meaning GuC has potentially cleared any HW state > we may have been interested in capturing. GuC resumes scheduling > on the engine post-reset, as the resets are meant to be transparent, > further muddling our error state. > > There is ongoing work to define an API for a GuC debug state dump. The > suggestion for now is to manually disable FW initiated resets in cases > where debug state is needed. > > Signed-off-by: Matthew Brost This looks a bit backwards to me: - I figured we should capture error state when we get the G2H, in which case I hope we do know which the offending context was that got shot. - For now we're missing the hw state, but we should still be able to capture the buffers userspace wants us to capture. So that could be wired up already? But yeah register state capturing needs support from GuC fw. I think this is a big enough miss in GuC features that we should list it on the rfc as a thing to fix. -Daniel > --- > drivers/gpu/drm/i915/gt/intel_context.c | 20 +++ > drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ > drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 -- > drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +-- > drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++--- > 7 files changed, 91 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > b/drivers/gpu/drm/i915/gt/intel_context.c > index 2f01437056a8..3fe7794b2bfd 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct > intel_context *ce) > return rq; > } > > +struct i915_request *intel_context_find_active_request(struct intel_context > *ce) > +{ > + struct i915_request *rq, *active = NULL; > + unsigned long flags; > + > + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine)); > + > + spin_lock_irqsave(>guc_active.lock, flags); > + list_for_each_entry_reverse(rq, >guc_active.requests, > + sched.link) { > + if (i915_request_completed(rq)) > + break; > + > + active = rq; > + } > + spin_unlock_irqrestore(>guc_active.lock, flags); > + > + return active; > +} > + > #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) > #include "selftest_context.c" > #endif > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > b/drivers/gpu/drm/i915/gt/intel_context.h > index 9b211ca5ecc7..d2b499ed8a05 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.h > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct > intel_context *ce, > > struct i915_request *intel_context_create_request(struct intel_context *ce); > > +struct i915_request * > +intel_context_find_active_request(struct intel_context *ce); > + > static inline struct intel_ring *__intel_context_ring_size(u64 sz) > { > return u64_to_ptr(struct intel_ring, sz); > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h > b/drivers/gpu/drm/i915/gt/intel_engine.h > index 3321d0917a99..bb94963a9fa2 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine.h > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs > *engine, > ktime_t *now); > > struct i915_request * > -intel_engine_find_active_request(struct intel_engine_cs *engine); > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine); > > u32 intel_engine_context_size(struct intel_gt *gt, u8 class); > > @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, > unsigned int sibling) > return engine->cops->get_sibling(engine, sibling); > } > > +static inline void > +intel_engine_set_hung_context(struct intel_engine_cs *engine, > + struct intel_context *ce) > +{ > + engine->hung_ce = ce; > +} > + > +static inline void > +intel_engine_clear_hung_context(struct intel_engine_cs *engine) > +{ > + intel_engine_set_hung_context(engine, NULL); > +} > + > +static inline struct intel_context * > +intel_engine_get_hung_context(struct intel_engine_cs *engine) > +{ > + return engine->hung_ce; > +} > + > #endif /* _INTEL_RINGBUFFER_H_ */ > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 10300db1c9a6..ad3987289f09 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -1727,7 +1727,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, > drm_printf(m, "\tRequests:\n");
Re: [Intel-gfx] [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification
On Thu, May 06, 2021 at 12:14:22PM -0700, Matthew Brost wrote: > GuC will issue a reset on detecting an engine hang and will notify > the driver via a G2H message. The driver will service the notification > by resetting the guilty context to a simple state or banning it > completely. > > Cc: Matthew Brost > Cc: John Harrison > Signed-off-by: Matthew Brost Entirely aside, but I wonder whether we shouldn't just make non-recoverable contexts the only thing we support. But probably a too big can of worms. -Daniel > --- > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 ++ > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++ > drivers/gpu/drm/i915/i915_trace.h | 10 ++ > 4 files changed, 53 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > index 277b4496a20e..a2abe1c422e3 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > @@ -263,6 +263,8 @@ int intel_guc_deregister_done_process_msg(struct > intel_guc *guc, > const u32 *msg, u32 len); > int intel_guc_sched_done_process_msg(struct intel_guc *guc, >const u32 *msg, u32 len); > +int intel_guc_context_reset_process_msg(struct intel_guc *guc, > + const u32 *msg, u32 len); > > void intel_guc_submission_reset_prepare(struct intel_guc *guc); > void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > index b3194d753b13..9c84b2ba63a8 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > @@ -941,6 +941,12 @@ static int ct_process_request(struct intel_guc_ct *ct, > struct ct_incoming_msg *r > CT_ERROR(ct, "schedule context failed %x %*ph\n", > action, 4 * len, payload); > break; > + case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: > + ret = intel_guc_context_reset_process_msg(guc, payload, len); > + if (unlikely(ret)) > + CT_ERROR(ct, "context reset notification failed %x > %*ph\n", > + action, 4 * len, payload); > + break; > default: > ret = -EOPNOTSUPP; > break; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 2c3791fc24b7..940017495731 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc > *guc, > return 0; > } > > +static void guc_context_replay(struct intel_context *ce) > +{ > + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; > + > + __guc_reset_context(ce, true); > + i915_sched_engine_hi_kick(sched_engine); > +} > + > +static void guc_handle_context_reset(struct intel_guc *guc, > + struct intel_context *ce) > +{ > + trace_intel_context_reset(ce); > + guc_context_replay(ce); > +} > + > +int intel_guc_context_reset_process_msg(struct intel_guc *guc, > + const u32 *msg, u32 len) > +{ > + struct intel_context *ce; > + int desc_idx = msg[0]; > + > + if (unlikely(len != 1)) { > + drm_dbg(_to_gt(guc)->i915->drm, "Invalid length %u", len); > + return -EPROTO; > + } > + > + ce = g2h_context_lookup(guc, desc_idx); > + if (unlikely(!ce)) > + return -EPROTO; > + > + guc_handle_context_reset(guc, ce); > + > + return 0; > +} > + > void intel_guc_log_submission_info(struct intel_guc *guc, > struct drm_printer *p) > { > diff --git a/drivers/gpu/drm/i915/i915_trace.h > b/drivers/gpu/drm/i915/i915_trace.h > index 97c2e83984ed..c095c4d39456 100644 > --- a/drivers/gpu/drm/i915/i915_trace.h > +++ b/drivers/gpu/drm/i915/i915_trace.h > @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context, > __entry->guc_sched_state_no_lock) > ); > > +DEFINE_EVENT(intel_context, intel_context_reset, > + TP_PROTO(struct intel_context *ce), > + TP_ARGS(ce) > +); > + > DEFINE_EVENT(intel_context, intel_context_register, >TP_PROTO(struct intel_context *ce), >TP_ARGS(ce) > @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq) > { > } > > +static inline void > +trace_intel_context_reset(struct intel_context *ce) > +{ > +} > + > static inline void > trace_intel_context_register(struct intel_context *ce) > { > -- > 2.28.0 > > ___ >
Re: [PATCH v6 2/3] drm/mediatek: init panel orientation property
Hi, Hsin-Yi: Hsin-Yi Wang 於 2021年4月29日 週四 下午12:28寫道: > > Init panel orientation property after connector is initialized. Let the > panel driver decides the orientation value later. Acked-by: Chun-Kuang Hu > > Signed-off-by: Hsin-Yi Wang > --- > drivers/gpu/drm/mediatek/mtk_dsi.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c > b/drivers/gpu/drm/mediatek/mtk_dsi.c > index ae403c67cbd9..9da1fd649131 100644 > --- a/drivers/gpu/drm/mediatek/mtk_dsi.c > +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c > @@ -964,6 +964,13 @@ static int mtk_dsi_encoder_init(struct drm_device *drm, > struct mtk_dsi *dsi) > ret = PTR_ERR(dsi->connector); > goto err_cleanup_encoder; > } > + > + ret = drm_connector_init_panel_orientation_property(dsi->connector); > + if (ret) { > + DRM_ERROR("Unable to init panel orientation\n"); > + goto err_cleanup_encoder; > + } > + > drm_connector_attach_encoder(dsi->connector, >encoder); > > return 0; > -- > 2.31.1.498.g6c1eba8ee3d-goog >
[PATCH] drm: fix semicolon.cocci warnings
From: kernel test robot drivers/gpu/drm/kmb/kmb_dsi.c:284:3-4: Unneeded semicolon drivers/gpu/drm/kmb/kmb_dsi.c:304:3-4: Unneeded semicolon drivers/gpu/drm/kmb/kmb_dsi.c:321:3-4: Unneeded semicolon drivers/gpu/drm/kmb/kmb_dsi.c:340:3-4: Unneeded semicolon drivers/gpu/drm/kmb/kmb_dsi.c:364:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: ade896460e4a ("drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY") CC: Geert Uytterhoeven Reported-by: kernel test robot Signed-off-by: kernel test robot --- tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 1140ab592e2ebf8153d2b322604031a8868ce7a5 commit: ade896460e4a62f5e4a892a98d254937f6f5b64c drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY :: branch date: 18 hours ago :: commit date: 6 months ago kmb_dsi.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/drivers/gpu/drm/kmb/kmb_dsi.c +++ b/drivers/gpu/drm/kmb/kmb_dsi.c @@ -281,7 +281,7 @@ static u32 mipi_get_datatype_params(u32 default: DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); return -EINVAL; - }; + } break; case DSI_LP_DT_PPS_YCBCR422_16B: data_type_param.size_constraint_pixels = 2; @@ -301,7 +301,7 @@ static u32 mipi_get_datatype_params(u32 default: DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); return -EINVAL; - }; + } break; case DSI_LP_DT_LPPS_YCBCR422_20B: case DSI_LP_DT_PPS_YCBCR422_24B: @@ -318,7 +318,7 @@ static u32 mipi_get_datatype_params(u32 default: DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); return -EINVAL; - }; + } break; case DSI_LP_DT_PPS_RGB565_16B: data_type_param.size_constraint_pixels = 1; @@ -337,7 +337,7 @@ static u32 mipi_get_datatype_params(u32 default: DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode); return -EINVAL; - }; + } break; case DSI_LP_DT_PPS_RGB666_18B: data_type_param.size_constraint_pixels = 4; @@ -361,7 +361,7 @@ static u32 mipi_get_datatype_params(u32 default: DRM_ERROR("DSI: Invalid data_type %d\n", data_type); return -EINVAL; - }; + } *params = data_type_param; return 0;
Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.
On 2021-05-11 11:56 a.m., Alex Deucher wrote: On Mon, May 10, 2021 at 12:37 PM Andrey Grodzovsky wrote: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 9 + drivers/gpu/drm/amd/amdgpu/cik_ih.c| 1 - drivers/gpu/drm/amd/amdgpu/cz_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/iceland_ih.c| 1 - drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 3 --- 11 files changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 18598eda18f6..a0bff4713672 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3256,7 +3256,6 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL }; - /** * amdgpu_device_init - initialize the driver * @@ -3698,12 +3697,13 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_ucode_sysfs_fini(adev); sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes); - amdgpu_fbdev_fini(adev); amdgpu_irq_fini_hw(adev); amdgpu_device_ip_fini_early(adev); + + amdgpu_gart_dummy_page_fini(adev); } void amdgpu_device_fini_sw(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index c5a9a4fb10d2..354e68081b53 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; @@ -375,5 +375,4 @@ int amdgpu_gart_init(struct amdgpu_device *adev) */ void amdgpu_gart_fini(struct amdgpu_device *adev) { - amdgpu_gart_dummy_page_fini(adev); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index a25fe97b0196..78dc7a23da56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -58,6 +58,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 233b64dab94b..a14973a7a9c9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -361,6 +361,15 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev) if (!amdgpu_device_has_dc_support(adev)) flush_work(>hotplug_work); } + + if (adev->irq.ih_soft.ring) + amdgpu_ih_ring_fini(adev, >irq.ih_soft); Why is the ih_soft handled here and in the various ih sw_fini functions? Post last rebase new ASICs i think were added which i missed. Taking care of this with prev. comment by Christian together right now. Andrey + if (adev->irq.ih.ring) + amdgpu_ih_ring_fini(adev, >irq.ih); + if (adev->irq.ih1.ring) + amdgpu_ih_ring_fini(adev, >irq.ih1); + if (adev->irq.ih2.ring) + amdgpu_ih_ring_fini(adev, >irq.ih2); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index 183d44a6583c..df385ffc9768 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -310,7 +310,6 @@ static int cik_ih_sw_fini(void *handle) struct amdgpu_device *adev = (struct amdgpu_device *)handle; amdgpu_irq_fini_sw(adev); - amdgpu_ih_ring_fini(adev, >irq.ih); amdgpu_irq_remove_domain(adev); return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index d32743949003..b8c47e0cf37a 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -302,7 +302,6 @@ static int
Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.
On Mon, May 10, 2021 at 12:37 PM Andrey Grodzovsky wrote: > > Handle all DMA IOMMU gropup related dependencies before the > group is removed. > > v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate > v6: Drop the BO unamp list > > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 9 + > drivers/gpu/drm/amd/amdgpu/cik_ih.c| 1 - > drivers/gpu/drm/amd/amdgpu/cz_ih.c | 1 - > drivers/gpu/drm/amd/amdgpu/iceland_ih.c| 1 - > drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 --- > drivers/gpu/drm/amd/amdgpu/si_ih.c | 1 - > drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 1 - > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 3 --- > 11 files changed, 13 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 18598eda18f6..a0bff4713672 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3256,7 +3256,6 @@ static const struct attribute *amdgpu_dev_attributes[] > = { > NULL > }; > > - > /** > * amdgpu_device_init - initialize the driver > * > @@ -3698,12 +3697,13 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) > amdgpu_ucode_sysfs_fini(adev); > sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes); > > - > amdgpu_fbdev_fini(adev); > > amdgpu_irq_fini_hw(adev); > > amdgpu_device_ip_fini_early(adev); > + > + amdgpu_gart_dummy_page_fini(adev); > } > > void amdgpu_device_fini_sw(struct amdgpu_device *adev) > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c > index c5a9a4fb10d2..354e68081b53 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c > @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device > *adev) > * > * Frees the dummy page used by the driver (all asics). > */ > -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) > +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) > { > if (!adev->dummy_page_addr) > return; > @@ -375,5 +375,4 @@ int amdgpu_gart_init(struct amdgpu_device *adev) > */ > void amdgpu_gart_fini(struct amdgpu_device *adev) > { > - amdgpu_gart_dummy_page_fini(adev); > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h > index a25fe97b0196..78dc7a23da56 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h > @@ -58,6 +58,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); > void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); > int amdgpu_gart_init(struct amdgpu_device *adev); > void amdgpu_gart_fini(struct amdgpu_device *adev); > +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); > int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, >int pages); > int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index 233b64dab94b..a14973a7a9c9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -361,6 +361,15 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev) > if (!amdgpu_device_has_dc_support(adev)) > flush_work(>hotplug_work); > } > + > + if (adev->irq.ih_soft.ring) > + amdgpu_ih_ring_fini(adev, >irq.ih_soft); Why is the ih_soft handled here and in the various ih sw_fini functions? > + if (adev->irq.ih.ring) > + amdgpu_ih_ring_fini(adev, >irq.ih); > + if (adev->irq.ih1.ring) > + amdgpu_ih_ring_fini(adev, >irq.ih1); > + if (adev->irq.ih2.ring) > + amdgpu_ih_ring_fini(adev, >irq.ih2); > } > > /** > diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > index 183d44a6583c..df385ffc9768 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c > @@ -310,7 +310,6 @@ static int cik_ih_sw_fini(void *handle) > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > amdgpu_irq_fini_sw(adev); > - amdgpu_ih_ring_fini(adev, >irq.ih); > amdgpu_irq_remove_domain(adev); > > return 0; > diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > index d32743949003..b8c47e0cf37a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c > @@ -302,7 +302,6 @@ static int cz_ih_sw_fini(void
[PATCH] Documentation: gpu: Mention the requirements for new properties
New KMS properties come with a bunch of requirements to avoid each driver from running their own, inconsistent, set of properties, eventually leading to issues like property conflicts, inconsistencies between drivers and semantics, etc. Let's document what we expect. Signed-off-by: Maxime Ripard --- Documentation/gpu/drm-kms.rst | 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst index 87e5023e3f55..30f4c376f419 100644 --- a/Documentation/gpu/drm-kms.rst +++ b/Documentation/gpu/drm-kms.rst @@ -463,6 +463,24 @@ KMS Properties This section of the documentation is primarily aimed at user-space developers. For the driver APIs, see the other sections. +Requirements + + +KMS drivers might need to add extra properties to support new features. +Each new property introduced in a driver need to meet a few +requirements, in addition to the one mentioned above.: + +- It must be standardized, with some documentation to describe the + property can be used. + +- It must provide a generic helper in the core code to register that + property on the object it attaches to. + +- Its content must be decoded by the core and provided in the object + associated state structure. + +- An IGT test must be submitted. + Property Types and Blob Property Support -- 2.31.1
Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL
On Fri, May 7, 2021 at 7:45 PM Tejun Heo wrote: > > Hello, > > On Fri, May 07, 2021 at 06:30:56PM -0400, Alex Deucher wrote: > > Maybe we are speaking past each other. I'm not following. We got > > here because a device specific cgroup didn't make sense. With my > > Linux user hat on, that makes sense. I don't want to write code to a > > bunch of device specific interfaces if I can avoid it. But as for > > temporal vs spatial partitioning of the GPU, the argument seems to be > > a sort of hand-wavy one that both spatial and temporal partitioning > > make sense on CPUs, but only temporal partitioning makes sense on > > GPUs. I'm trying to understand that assertion. There are some GPUs > > Spatial partitioning as implemented in cpuset isn't a desirable model. It's > there partly because it has historically been there. It doesn't really > require dynamic hierarchical distribution of anything and is more of a way > to batch-update per-task configuration, which is how it's actually > implemented. It's broken too in that it interferes with per-task affinity > settings. So, not exactly a good example to follow. In addition, this sort > of partitioning requires more hardware knowledge and GPUs are worse than > CPUs in that hardwares differ more. > > Features like this are trivial to implement from userland side by making > per-process settings inheritable and restricting who can update the > settings. > > > that can more easily be temporally partitioned and some that can be > > more easily spatially partitioned. It doesn't seem any different than > > CPUs. > > Right, it doesn't really matter how the resource is distributed. What > matters is how granular and generic the distribution can be. If gpus can > implement work-conserving proportional distribution, that's something which > is widely useful and inherently requires dynamic scheduling from kernel > side. If it's about setting per-vendor affinities, this is way too much > cgroup interface for a feature which can be easily implemented outside > cgroup. Just do per-process (or whatever handles gpus use) and confine their > configurations from cgroup side however way. > > While the specific theme changes a bit, we're basically having the same > discussion with the same conclusion over the past however many months. > Hopefully, the point is clear by now. Thanks, that helps a lot. Alex
Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.
On 2021-05-11 2:44 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 9 + drivers/gpu/drm/amd/amdgpu/cik_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/cz_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 1 - drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 3 --- 11 files changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 18598eda18f6..a0bff4713672 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3256,7 +3256,6 @@ static const struct attribute *amdgpu_dev_attributes[] = { NULL }; - /** * amdgpu_device_init - initialize the driver * @@ -3698,12 +3697,13 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_ucode_sysfs_fini(adev); sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes); - amdgpu_fbdev_fini(adev); amdgpu_irq_fini_hw(adev); amdgpu_device_ip_fini_early(adev); + + amdgpu_gart_dummy_page_fini(adev); I think you should probably just call amdgpu_gart_fini() here. } void amdgpu_device_fini_sw(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index c5a9a4fb10d2..354e68081b53 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev) * * Frees the dummy page used by the driver (all asics). */ -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev) { if (!adev->dummy_page_addr) return; @@ -375,5 +375,4 @@ int amdgpu_gart_init(struct amdgpu_device *adev) */ void amdgpu_gart_fini(struct amdgpu_device *adev) { - amdgpu_gart_dummy_page_fini(adev); } Well either you remove amdgpu_gart_fini() or just call amdgpu_gart_fini() instead of amdgpu_gart_dummy_page_fini(). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h index a25fe97b0196..78dc7a23da56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h @@ -58,6 +58,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev); void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev); int amdgpu_gart_init(struct amdgpu_device *adev); void amdgpu_gart_fini(struct amdgpu_device *adev); +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev); int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset, int pages); int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 233b64dab94b..a14973a7a9c9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -361,6 +361,15 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev) if (!amdgpu_device_has_dc_support(adev)) flush_work(>hotplug_work); } + + if (adev->irq.ih_soft.ring) + amdgpu_ih_ring_fini(adev, >irq.ih_soft); + if (adev->irq.ih.ring) + amdgpu_ih_ring_fini(adev, >irq.ih); + if (adev->irq.ih1.ring) + amdgpu_ih_ring_fini(adev, >irq.ih1); + if (adev->irq.ih2.ring) + amdgpu_ih_ring_fini(adev, >irq.ih2); You should probably make the function NULL save instead of checking here. Christian. Agree, in fact it's already does this check inside amdgpu_ih_ring_fini so I will just drop the checks. Andrey } /** diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index 183d44a6583c..df385ffc9768 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -310,7 +310,6 @@ static int cik_ih_sw_fini(void *handle) struct amdgpu_device *adev = (struct amdgpu_device *)handle; amdgpu_irq_fini_sw(adev); - amdgpu_ih_ring_fini(adev, >irq.ih); amdgpu_irq_remove_domain(adev); return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index d32743949003..b8c47e0cf37a 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
Re: [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin
On Thu, May 06, 2021 at 12:14:03PM -0700, Matthew Brost wrote: > Disable engine barriers for unpinning with GuC. This feature isn't > needed with the GuC as it disables context scheduling before unpinning > which guarantees the HW will not reference the context. Hence it is > not necessary to defer unpinning until a kernel context request > completes on each engine in the context engine mask. > > Cc: John Harrison > Signed-off-by: Matthew Brost > Signed-off-by: Daniele Ceraolo Spurio Instead of these ifs in the code, can we push this barrier business down into backends? Not in this series, but as one of the things to sort out as part of the conversion to drm/scheduler. -Daniel > --- > drivers/gpu/drm/i915/gt/intel_context.c| 2 +- > drivers/gpu/drm/i915/gt/intel_context.h| 1 + > drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++ > drivers/gpu/drm/i915/i915_active.c | 3 +++ > 4 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c > b/drivers/gpu/drm/i915/gt/intel_context.c > index 1499b8aace2a..7f97753ab164 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.c > +++ b/drivers/gpu/drm/i915/gt/intel_context.c > @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct > intel_context *ce) > > __i915_active_acquire(>active); > > - if (intel_context_is_barrier(ce)) > + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) > return 0; > > /* Preallocate tracking nodes */ > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h > b/drivers/gpu/drm/i915/gt/intel_context.h > index 92ecbab8c1cd..9b211ca5ecc7 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.h > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > @@ -16,6 +16,7 @@ > #include "intel_engine_types.h" > #include "intel_ring_types.h" > #include "intel_timeline_types.h" > +#include "uc/intel_guc_submission.h" > > #define CE_TRACE(ce, fmt, ...) do { \ > const struct intel_context *ce__ = (ce);\ > diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c > b/drivers/gpu/drm/i915/gt/selftest_context.c > index 26685b927169..fa7b99a671dd 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_context.c > +++ b/drivers/gpu/drm/i915/gt/selftest_context.c > @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs > *engine) >* This test makes sure that the context is kept alive until a >* subsequent idle-barrier (emitted when the engine wakeref hits 0 >* with no more outstanding requests). > + * > + * In GuC submission mode we don't use idle barriers and we instead > + * get a message from the GuC to signal that it is safe to unpin the > + * context from memory. >*/ > + if (intel_engine_uses_guc(engine)) > + return 0; > > if (intel_engine_pm_is_awake(engine)) { > pr_err("%s is awake before starting %s!\n", > @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs > *engine) >* on the context image remotely (intel_context_prepare_remote_request), >* which inserts foreign fences into intel_context.active, does not >* clobber the idle-barrier. > + * > + * In GuC submission mode we don't use idle barriers. >*/ > + if (intel_engine_uses_guc(engine)) > + return 0; > > if (intel_engine_pm_is_awake(engine)) { > pr_err("%s is awake before starting %s!\n", > diff --git a/drivers/gpu/drm/i915/i915_active.c > b/drivers/gpu/drm/i915/i915_active.c > index b1aa1c482c32..9a264898bb91 100644 > --- a/drivers/gpu/drm/i915/i915_active.c > +++ b/drivers/gpu/drm/i915/i915_active.c > @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active *ref) > > GEM_BUG_ON(i915_active_is_idle(ref)); > > + if (llist_empty(>preallocated_barriers)) > + return; > + > /* >* Transfer the list of preallocated barriers into the >* i915_active rbtree, but only as proto-nodes. They will be > -- > 2.28.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [RFC] Implicit vs explicit user fence sync
Am 11.05.21 um 16:23 schrieb Daniel Vetter: On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote: Am 11.05.21 um 09:31 schrieb Daniel Vetter: [SNIP] And that's just the one ioctl I know is big trouble, I'm sure we'll find more funny corner cases when we roll out explicit user fencing. I think we can just ignore sync_file. As far as it concerns me that UAPI is pretty much dead. Uh that's rather bold. Android is built on it. Currently atomic kms is built on it. To be honest I don't think we care about Android at all. we = amd or we = upstream here? we = amd, for everybody else that is certainly a different topic. But for now AMD is the only one running into this problem. Could be that Nouveau sees this as well with the next hw generation, but who knows? Why is this not much of a problem if it's just within one driver? Because inside the same driver I can easily add the waits before submitting the MM work as necessary. What is MM work here now? MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware. Adding implicit synchronization on top of that is then rather trivial. Well that's what I disagree with, since I already see some problems that I don't think we can overcome (the atomic ioctl is one). And that's with us only having a fairly theoretical understanding of the overall situation. But how should we then ever support user fences with the atomic IOCTL? We can't wait in user space since that will disable the support for waiting in the hardware. Well, figure it out :-) This is exactly why I'm not seeing anything solved with just rolling a function call to a bunch of places, because it's pretending all things are solved when clearly that's not the case. I really think what we need is to first figure out how to support userspace fences as explicit entities across the stack, maybe with something like this order: 1. enable them purely within a single userspace driver (like vk with winsys disabled, or something else like that except not amd because there's this amdkfd split for "real" compute) 1a. including atomic ioctl, e.g. for vk direct display support this can be used without cross-process sharing, new winsys protocols and all that fun 2. figure out how to transport these userspace fences with something like drm_syncobj 2a. figure out the compat story for drivers which dont do userspace fences 2b. figure out how to absorb the overhead if the winsys/compositor doesn't support explicit sync 3. maybe figure out how to make this all happen magically with implicit sync, if we really, really care If we do 3 before we've nailed all these problems, we're just guaranteeing we'll get the wrong solutions and so we'll then have 3 ways of doing userspace fences - the butchered implicit one that didn't quite work - the explicit one - the not-so-butchered implicit one with the lessons from the properly done explicit one The thing is, if you have no idea how to integrate userspace fences explicitly into atomic ioctl, then you definitely have no idea how to do it implicitly :-) Well I agree on that. But the question is still how would you do explicit with atomic? If you supply an userpace fence (is that what we call them now) as in-fence, then your only allowed to get a userspace fence as out-fence. Yeah, that part makes perfectly sense. But I don't see the problem with that? That way we - don't block anywhere we shouldn't - don't create a dma_fence out of a userspace fence The problem is this completely breaks your "magically make implicit fencing with userspace fences" plan. Why? So I have a plan here, what was yours? As far as I see that should still work perfectly fine and I have the strong feeling I'm missing something here. Transporting fences between processes is not the fundamental problem here, but rather the question how we represent all this in the kernel? In other words I think what you outlined above is just approaching it from the wrong side again. Instead of looking what the kernel needs to support this you take a look at userspace and the requirements there. Uh ... that was my idea here? That's why I put "build userspace fences in userspace only" as the very first thing. Then extend to winsys and atomic/display and all these cases where things get more tricky. I agree that transporting the fences is easy, which is why it's not interesting trying to solve that problem first. Which is kinda what you're trying to do here by adding implicit userspace fences (well not even that, just a bunch of function calls without any semantics attached to them). So if there's more here, you need to flesh it out more or I just dont get what you're actually trying to demonstrate. Well I'm trying to figure out why you see it as such a problem to keep implicit sync around. As far as I can tell it is completely octagonal if we use implicit/explicit and dma_fence/user_fence. It's just a different implementation inside the kernel. Christian. -Daniel
RE: [RFC PATCH 00/97] Basic GuC submission support in the i915
> -Original Message- > From: Martin Peres > Sent: Tuesday, May 11, 2021 1:06 AM > To: Daniel Vetter > Cc: Jason Ekstrand ; Brost, Matthew > ; intel-gfx ; > dri-devel ; Ursulin, Tvrtko > ; Ekstrand, Jason ; > Ceraolo Spurio, Daniele ; Bloomfield, Jon > ; Vetter, Daniel ; > Harrison, John C > Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915 > > On 10/05/2021 19:33, Daniel Vetter wrote: > > On Mon, May 10, 2021 at 3:55 PM Martin Peres > wrote: > >> > >> On 10/05/2021 02:11, Jason Ekstrand wrote: > >>> On May 9, 2021 12:12:36 Martin Peres wrote: > >>> > Hi, > > On 06/05/2021 22:13, Matthew Brost wrote: > > Basic GuC submission support. This is the first bullet point in the > > upstreaming plan covered in the following RFC [1]. > > > > At a very high level the GuC is a piece of firmware which sits between > > the i915 and the GPU. It offloads some of the scheduling of contexts > > from the i915 and programs the GPU to submit contexts. The i915 > > communicates with the GuC and the GuC communicates with the > GPU. > > May I ask what will GuC command submission do that execlist > won't/can't > do? And what would be the impact on users? Even forgetting the > troubled > history of GuC (instability, performance regression, poor level of user > support, 6+ years of trying to upstream it...), adding this much code > and doubling the amount of validation needed should come with a > rationale making it feel worth it... and I am not seeing here. Would you > mind providing the rationale behind this work? > > > > > GuC submission will be disabled by default on all current upstream > > platforms behind a module parameter - enable_guc. A value of 3 will > > enable submission and HuC loading via the GuC. GuC submission > should > > work on all gen11+ platforms assuming the GuC firmware is present. > > What is the plan here when it comes to keeping support for execlist? I > am afraid that landing GuC support in Linux is the first step towards > killing the execlist, which would force users to use proprietary > firmwares that even most Intel engineers have little influence over. > Indeed, if "drm/i915/guc: Disable semaphores when using GuC > scheduling" > which states "Disable semaphores when using GuC scheduling as > semaphores > are broken in the current GuC firmware." is anything to go by, it means > that even Intel developers seem to prefer working around the GuC > firmware, rather than fixing it. > >>> > >>> Yes, landing GuC support may be the first step in removing execlist > >>> support. The inevitable reality is that GPU scheduling is coming and > >>> likely to be there only path in the not-too-distant future. (See also > >>> the ongoing thread with AMD about fences.) I'm not going to pass > >>> judgement on whether or not this is a good thing. I'm just reading the > >>> winds and, in my view, this is where things are headed for good or ill. > >>> > >>> In answer to the question above, the answer to "what do we gain from > >>> GuC?" may soon be, "you get to use your GPU." We're not there yet > and, > >>> again, I'm not necessarily advocating for it, but that is likely where > >>> things are headed. > >> > >> This will be a sad day, especially since it seems fundamentally opposed > >> with any long-term support, on top of taking away user freedom to > >> fix/tweak their system when Intel won't. > >> > >>> A firmware-based submission model isn't a bad design IMO and, aside > from > >>> the firmware freedom issues, I think there are actual advantages to the > >>> model. Immediately, it'll unlock a few features like parallel submission > >>> (more on that in a bit) and long-running compute because they're > >>> implemented in GuC and the work to implement them properly in the > >>> execlist scheduler is highly non-trivial. Longer term, it may (no > >>> guarantees) unlock some performance by getting the kernel out of the > way. > >> > >> Oh, I definitely agree with firmware-based submission model not being a > >> bad design. I was even cheering for it in 2015. Experience with it made > >> me regret that deeply since :s > >> > >> But with the DRM scheduler being responsible for most things, I fail to > >> see what we could offload in the GuC except context switching (like > >> every other manufacturer). The problem is, the GuC does way more than > >> just switching registers in bulk, and if the number of revisions of the > >> GuC is anything to go by, it is way too complex for me to feel > >> comfortable with it. > > > > We need to flesh out that part of the plan more, but we're not going > > to use drm scheduler for everything. It's only to handle the dma-fence > > legacy side of things, which means: > > - timeout handling for batches that take too long > > - dma_fence dependency sorting/handling > > - boosting of context from
Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array
On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote: > Add lrc descriptor context lookup array which can resolve the > intel_context from the lrc descriptor index. In addition to lookup, it > can determine in the lrc descriptor context is currently registered with > the GuC by checking if an entry for a descriptor index is present. > Future patches in the series will make use of this array. > > Cc: John Harrison > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 5 +++ > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +-- > 2 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > index d84f37afb9d8..2eb6c497e43c 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > @@ -6,6 +6,8 @@ > #ifndef _INTEL_GUC_H_ > #define _INTEL_GUC_H_ > > +#include "linux/xarray.h" > + > #include "intel_uncore.h" > #include "intel_guc_fw.h" > #include "intel_guc_fwif.h" > @@ -47,6 +49,9 @@ struct intel_guc { > struct i915_vma *lrc_desc_pool; > void *lrc_desc_pool_vaddr; > > + /* guc_id to intel_context lookup */ > + struct xarray context_lookup; The current code sets a disastrous example, but for stuff like this it's always good to explain the locking, and who's holding references and how you're handling cycles. Since I guess the intel_context also holds the guc_id alive somehow. Again holds for the entire series, where it makes sense (as in we don't expect to rewrite the entire code anyway). -Daniel > + > /* Control params for fw initialization */ > u32 params[GUC_CTL_MAX_DWORDS]; > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 6acc1ef34f92..c2b6d27404b7 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct > rb_node *rb) > return rb_entry(rb, struct i915_priolist, node); > } > > -/* Future patches will use this function */ > -__attribute__ ((unused)) > static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) > { > struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct > intel_guc *guc, u32 index) > return [index]; > } > > +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 > id) > +{ > + struct intel_context *ce = xa_load(>context_lookup, id); > + > + GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS); > + > + return ce; > +} > + > static int guc_lrc_desc_pool_create(struct intel_guc *guc) > { > u32 size; > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc > *guc) > i915_vma_unpin_and_release(>lrc_desc_pool, I915_VMA_RELEASE_MAP); > } > > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) > +{ > + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); > + > + memset(desc, 0, sizeof(*desc)); > + xa_erase_irq(>context_lookup, id); > +} > + > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) > +{ > + return __get_context(guc, id); > +} > + > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, > +struct intel_context *ce) > +{ > + xa_store_irq(>context_lookup, id, ce, GFP_ATOMIC); > +} > + > static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) > { > /* Leaving stub as this function will be used in future patches */ > @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc) >*/ > GEM_BUG_ON(!guc->lrc_desc_pool); > > + xa_init_flags(>context_lookup, XA_FLAGS_LOCK_IRQ); > + > return 0; > } > > -- > 2.28.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object
On Thu, May 06, 2021 at 12:13:46PM -0700, Matthew Brost wrote: > Introduce i915_sched_engine object which is lower level data structure > that i915_scheduler / generic code can operate on without touching > execlist specific structures. This allows additional submission backends > to be added without breaking the layer. Maybe add a comment here that this is defacto a detour since we're now aiming to use drm/scheduler instead. But also since the current code is a bit a mess, we expect this detour to be overall faster since we can then refactor in-tree. Maybe also highlight this a bit more in the rfc to make sure this is clear. -Daniel > > Cc: Daniele Ceraolo Spurio > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/i915/gem/i915_gem_wait.c | 4 +- > drivers/gpu/drm/i915/gt/intel_engine.h| 16 - > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 77 ++-- > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 4 +- > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 10 +- > drivers/gpu/drm/i915/gt/intel_engine_types.h | 42 +-- > drivers/gpu/drm/i915/gt/intel_engine_user.c | 2 +- > .../drm/i915/gt/intel_execlists_submission.c | 350 +++--- > .../gpu/drm/i915/gt/intel_ring_submission.c | 13 +- > drivers/gpu/drm/i915/gt/mock_engine.c | 17 +- > drivers/gpu/drm/i915/gt/selftest_execlists.c | 36 +- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 6 +- > drivers/gpu/drm/i915/gt/selftest_lrc.c| 6 +- > drivers/gpu/drm/i915/gt/selftest_reset.c | 2 +- > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 75 ++-- > drivers/gpu/drm/i915/i915_gpu_error.c | 7 +- > drivers/gpu/drm/i915/i915_request.c | 50 +-- > drivers/gpu/drm/i915/i915_request.h | 2 +- > drivers/gpu/drm/i915/i915_scheduler.c | 168 - > drivers/gpu/drm/i915/i915_scheduler.h | 65 +++- > drivers/gpu/drm/i915/i915_scheduler_types.h | 63 > 21 files changed, 575 insertions(+), 440 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c > b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > index 4b9856d5ba14..af1fbf8e2a9a 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c > @@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence, > engine = rq->engine; > > rcu_read_lock(); /* RCU serialisation for set-wedged protection */ > - if (engine->schedule) > - engine->schedule(rq, attr); > + if (engine->sched_engine->schedule) > + engine->sched_engine->schedule(rq, attr); > rcu_read_unlock(); > } > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h > b/drivers/gpu/drm/i915/gt/intel_engine.h > index 8d9184920c51..988d9688ae4d 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine.h > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h > @@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists > *execlists) > return active; > } > > -static inline void > -execlists_active_lock_bh(struct intel_engine_execlists *execlists) > -{ > - local_bh_disable(); /* prevent local softirq and lock recursion */ > - tasklet_lock(>tasklet); > -} > - > -static inline void > -execlists_active_unlock_bh(struct intel_engine_execlists *execlists) > -{ > - tasklet_unlock(>tasklet); > - local_bh_enable(); /* restore softirq, and kick ksoftirqd! */ > -} > - > struct i915_request * > execlists_unwind_incomplete_requests(struct intel_engine_execlists > *execlists); > > @@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs > *engine); > > u32 intel_engine_context_size(struct intel_gt *gt, u8 class); > > -void intel_engine_init_active(struct intel_engine_cs *engine, > - unsigned int subclass); > #define ENGINE_PHYSICAL 0 > #define ENGINE_MOCK 1 > #define ENGINE_VIRTUAL 2 > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 828e1669f92c..ec82a7ec0c8d 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -8,6 +8,7 @@ > #include "gem/i915_gem_context.h" > > #include "i915_drv.h" > +#include "i915_scheduler.h" > > #include "intel_breadcrumbs.h" > #include "intel_context.h" > @@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum > intel_engine_id id) > if (engine->context_size) > DRIVER_CAPS(i915)->has_logical_contexts = true; > > - /* Nothing to do here, execute in order of dependencies */ > - engine->schedule = NULL; > - > ewma__engine_latency_init(>latency); > seqcount_init(>stats.lock); > > @@ -583,9 +581,6 @@ void intel_engine_init_execlists(struct intel_engine_cs > *engine) > memset(execlists->pending, 0, sizeof(execlists->pending)); > execlists->active = > memset(execlists->inflight, 0,
Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages
On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote: > From: Michal Wajdeczko > > New GuC firmware will unify format of MMIO and CTB H2G messages. > Introduce their definitions now to allow gradual transition of > our code to match new changes. > > Signed-off-by: Michal Wajdeczko > Signed-off-by: Matthew Brost > Cc: Michał Winiarski > --- > .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++ > 1 file changed, 226 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > index 775e21f3058c..1c264819aa03 100644 > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h > @@ -6,6 +6,232 @@ > #ifndef _ABI_GUC_MESSAGES_ABI_H > #define _ABI_GUC_MESSAGES_ABI_H > > +/** > + * DOC: HXG Message These aren't useful if we don't pull them in somewhere in the Documentation/gpu hierarchy. General comment, and also please check that it all renders correctly still. btw if you respin a patch not originally by you we generally add a (v1) to the original s-o-b line (or whever the version split was) and explain in the usual changelog in the commit message what was changed. This holds for the entire series ofc. -Daniel > + * > + * All messages exchanged with GuC are defined using 32 bit dwords. > + * First dword is treated as a message header. Remaining dwords are optional. > + * > + * .. _HXG Message: > + * > + * > +---+---+--+ > + * | | Bits | Description > | > + * > +===+===+==+ > + * | | | > | > + * | 0 |31 | **ORIGIN** - originator of the message > | > + * | | | - _`GUC_HXG_ORIGIN_HOST` = 0 > | > + * | | | - _`GUC_HXG_ORIGIN_GUC` = 1 > | > + * | | | > | > + * | > +---+--+ > + * | | 30:28 | **TYPE** - message type > | > + * | | | - _`GUC_HXG_TYPE_REQUEST` = 0 > | > + * | | | - _`GUC_HXG_TYPE_EVENT` = 1 > | > + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3 > | > + * | | | - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5 > | > + * | | | - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6 > | > + * | | | - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7 > | > + * | > +---+--+ > + * | | 27:0 | **AUX** - auxiliary data (depends TYPE) > | > + * > +---+---+--+ > + * | 1 | 31:0 | optional payload (depends on TYPE) > | > + * +---+---+ > | > + * |...| | > | > + * +---+---+ > | > + * | n | 31:0 | > | > + * > +---+---+--+ > + */ > + > +#define GUC_HXG_MSG_MIN_LEN 1u > +#define GUC_HXG_MSG_0_ORIGIN (0x1 << 31) > +#define GUC_HXG_ORIGIN_HOST0u > +#define GUC_HXG_ORIGIN_GUC 1u > +#define GUC_HXG_MSG_0_TYPE (0x7 << 28) > +#define GUC_HXG_TYPE_REQUEST 0u > +#define GUC_HXG_TYPE_EVENT 1u > +#define GUC_HXG_TYPE_NO_RESPONSE_BUSY 3u > +#define GUC_HXG_TYPE_NO_RESPONSE_RETRY 5u > +#define GUC_HXG_TYPE_RESPONSE_FAILURE 6u > +#define GUC_HXG_TYPE_RESPONSE_SUCCESS 7u > +#define GUC_HXG_MSG_0_AUX(0xfff << 0) > + > +/** > + * DOC: HXG Request > + * > + * The `HXG Request`_ message should be used to initiate synchronous activity > + * for which confirmation or return data is expected. > + * > + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_ > + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_ > + * message as a intermediate reply. > + * > + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code. > + * > + * _HXG Request: > + * > + * > +---+---+--+ > + * | | Bits | Description
Re: [Intel-gfx] [RFC PATCH 5/5] drm/i915: Update execbuf IOCTL to accept N BBs
On Thu, May 06, 2021 at 10:30:49AM -0700, Matthew Brost wrote: > Add I915_EXEC_NUMBER_BB_* to drm_i915_gem_execbuffer2.flags which allows > submitting N BBs per IOCTL. > > Cc: Tvrtko Ursulin > Cc: Tony Ye > CC: Carl Zhang > Cc: Daniel Vetter > Cc: Jason Ekstrand > Signed-off-by: Matthew Brost I dropped my big question on the previous patch already, I'll check this out again when it's all squashed into the parallel extension patch so we have everything in one commit. -Daniel > --- > include/uapi/drm/i915_drm.h | 21 - > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 0175b12b33b8..d3072cad4a7e 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1291,7 +1291,26 @@ struct drm_i915_gem_execbuffer2 { > */ > #define I915_EXEC_USE_EXTENSIONS (1 << 21) > > -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1)) > +/* > + * Number of BB in execbuf2 IOCTL - 1, used to submit more than BB in a > single > + * execbuf2 IOCTL. > + * > + * Return -EINVAL if more than 1 BB (value 0) is specified if > + * I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT hasn't been called on the gem > + * context first. Also returns -EINVAL if gem context has been setup with > + * I915_PARALLEL_NO_PREEMPT_MID_BATCH and the number BBs not equal to the > total > + * number hardware contexts in the gem context. > + */ > +#define I915_EXEC_NUMBER_BB_LSB (22) > +#define I915_EXEC_NUMBER_BB_MASK (0x3f << I915_EXEC_NUMBER_BB_LSB) > +#define I915_EXEC_NUMBER_BB_MSB (27) > +#define i915_execbuffer2_set_number_bb(eb2, num_bb) \ > + (eb2).flags = ((eb2).flags & ~I915_EXEC_NUMBER_BB_MASK) | \ > + (((num_bb - 1) << I915_EXEC_NUMBER_BB_LSB) & I915_EXEC_NUMBER_BB_MASK) > +#define i915_execbuffer2_get_number_bb(eb2) \ > + eb2).flags & I915_EXEC_NUMBER_BB_MASK) >> I915_EXEC_NUMBER_BB_LSB) > + 1) > + > +#define __I915_EXEC_UNKNOWN_FLAGS (-(1 << (I915_EXEC_NUMBER_BB_MSB + 1))) > > #define I915_EXEC_CONTEXT_ID_MASK(0x) > #define i915_execbuffer2_set_context_id(eb2, context) \ > -- > 2.28.0 > > ___ > Intel-gfx mailing list > intel-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [RFC PATCH 1/5] drm/doc/rfc: i915 GuC submission / DRM scheduler integration plan
On Tue, May 11, 2021 at 03:58:43PM +0100, Daniel Stone wrote: > Hi, > > On Tue, 11 May 2021 at 15:34, Daniel Vetter wrote: > > On Thu, May 06, 2021 at 10:30:45AM -0700, Matthew Brost wrote: > > > +No major changes are required to the uAPI for basic GuC submission. The > > > only > > > +change is a new scheduler attribute: > > > I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. > > > +This attribute indicates the 2k i915 user priority levels are statically > > > mapped > > > +into 3 levels as follows: > > > + > > > +* -1k to -1 Low priority > > > +* 0 Medium priority > > > +* 1 to 1k High priority > > > + > > > +This is needed because the GuC only has 4 priority bands. The highest > > > priority > > > +band is reserved with the kernel. This aligns with the DRM scheduler > > > priority > > > +levels too. > > > > Please Cc: mesa and get an ack from Jason Ekstrand or Ken Graunke on this, > > just to be sure. > > A reference to the actual specs this targets would help. I don't have > oneAPI to hand if it's relevant, but the two in graphics world are > https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt > and > https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority > - both of them pretty much say that the implementation may do anything > or nothing at all, so this isn't a problem for spec conformance, only > a matter of user priority (sorry). Good point, Matt please also include the level0 spec here (aside from egl/vk extensions). Might need to ping Michal Mrozek internally and cc: him on this one here too. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v6 01/16] drm/ttm: Remap all page faults to per process dummy page.
Am 11.05.21 um 16:44 schrieb Andrey Grodzovsky: On 2021-05-11 2:38 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: On device removal reroute all CPU mappings to dummy page. v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function. v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO. v5: Remove duplicate return. v6: Polish ttm_bo_vm_dummy_page, remove superflous code. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 57 - include/drm/ttm/ttm_bo_api.h | 2 ++ 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index b31b18058965..e5a9615519d1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include #include #include +#include +#include #include #include #include @@ -380,19 +382,72 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved); +static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{ + struct page *dummy_page = (struct page *)res; + + __free_page(dummy_page); +} + +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{ + struct vm_area_struct *vma = vmf->vma; + struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; + vm_fault_t ret = VM_FAULT_NOPAGE; + unsigned long address; + unsigned long pfn; + struct page *page; + + /* Allocate new dummy page to map all the VA range in this VMA to it*/ + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return VM_FAULT_OOM; + + pfn = page_to_pfn(page); + + /* Prefault the entire VMA range right away to avoid further faults */ + for (address = vma->vm_start; address < vma->vm_end; address += PAGE_SIZE) { + + if (unlikely(address >= vma->vm_end)) + break; That extra check can be removed as far as I can see. + + if (vma->vm_flags & VM_MIXEDMAP) + ret = vmf_insert_mixed_prot(vma, address, + __pfn_to_pfn_t(pfn, PFN_DEV), + prot); + else + ret = vmf_insert_pfn_prot(vma, address, pfn, prot); + } + + /* Set the page to be freed using drmm release action */ + if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page)) + return VM_FAULT_OOM; You should probably move that before inserting the page into the VMA and also free the allocated page if it goes wrong. drmm_add_action_or_reset will automatically release the page if the add action fails, that the 'reset' part of the function. Ah! Ok that makes it even more important that you do this before you insert the page into any VMA. Otherwise userspace has access to a freed page with the rather ugly consequences. Christian. Andrey Apart from that patch looks good to me, Christian. + + return ret; +} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page); + vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; vm_fault_t ret; + int idx; ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret; prot = vma->vm_page_prot; - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + if (drm_dev_enter(ddev, )) { + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + drm_dev_exit(idx); + } else { + ret = ttm_bo_vm_dummy_page(vmf, prot); + } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret; diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 639521880c29..254ede97f8e3 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -620,4 +620,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write); bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all); +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot); + #endif
Re: [Intel-gfx] [RFC PATCH 4/5] drm/i915: Introduce 'set parallel submit' extension
On Thu, May 06, 2021 at 10:30:48AM -0700, Matthew Brost wrote: > i915_drm.h updates for 'set parallel submit' extension. > > Cc: Tvrtko Ursulin > Cc: Tony Ye > CC: Carl Zhang > Cc: Daniel Vetter > Cc: Jason Ekstrand > Signed-off-by: Matthew Brost > --- > include/uapi/drm/i915_drm.h | 126 > 1 file changed, 126 insertions(+) > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 26d2e135aa31..0175b12b33b8 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1712,6 +1712,7 @@ struct drm_i915_gem_context_param { > * Extensions: > * i915_context_engines_load_balance > (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE) > * i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND) > + * i915_context_engines_parallel_submit > (I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT) Hm just relalized, but I don't think this hyperlinsk correctly, and I'm also not sure this formats very well as a nice list. Using item lists should look pretty nice like we're doing for the various kms properties, e.g. FOO: Explain what FOO does BAR: Explain what BAR does. struct bar also automatically generates a link Please check with make htmldocs and polish this a bit (might need a small prep patch). > */ > #define I915_CONTEXT_PARAM_ENGINES 0xa > > @@ -1894,9 +1895,134 @@ struct i915_context_param_engines { > __u64 extensions; /* linked chain of extension blocks, 0 terminates */ > #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see > i915_context_engines_load_balance */ > #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */ > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see > i915_context_engines_parallel_submit */ > struct i915_engine_class_instance engines[0]; > } __attribute__((packed)); > > +/* > + * i915_context_engines_parallel_submit: > + * > + * Setup a gem context to allow multiple BBs to be submitted in a single > execbuf > + * IOCTL. Those BBs will then be scheduled to run on the GPU in parallel. > + * > + * All hardware contexts in the engine set are configured for parallel > + * submission (i.e. once this gem context is configured for parallel > submission, > + * all the hardware contexts, regardless if a BB is available on each > individual > + * context, will be submitted to the GPU in parallel). A user can submit BBs > to > + * subset of the hardware contexts, in a single execbuf IOCTL, but it is not > + * recommended as it may reserve physical engines with nothing to run on > them. > + * Highly recommended to configure the gem context with N hardware contexts > then > + * always submit N BBs in a single IOCTL. > + * > + * Their are two currently defined ways to control the placement of the > + * hardware contexts on physical engines: default behavior (no flags) and > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the > + * future as new hardware / use cases arise. Details of how to use this > + * interface below above the flags. > + * > + * Returns -EINVAL if hardware context placement configuration invalid or if > the > + * placement configuration isn't supported on the platform / submission > + * interface. > + * Returns -ENODEV if extension isn't supported on the platform / submission > + * inteface. > + */ > +struct i915_context_engines_parallel_submit { > + struct i915_user_extension base; Ok this is good, since it makes sure we can't possible use this in CTX_SETPARAM. > + > +/* > + * Default placement behvavior (currently unsupported): > + * > + * Rather than restricting parallel submission to a single class with a > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a mode > that > + * enables parallel submission across multiple engine classes. In this case > each > + * context's logical engine mask indicates where that context can placed. It > is > + * implied in this mode that all contexts have mutual exclusive placement > (e.g. > + * if one context is running CS0 no other contexts can run on CS0). > + * > + * Example 1 pseudo code: > + * CSX[Y] = engine class X, logical instance Y > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > + * set_engines(INVALID, INVALID) > + * set_load_balance(engine_index=0, num_siblings=2, engines=CS0[0],CS0[1]) > + * set_load_balance(engine_index=1, num_siblings=2, engines=CS1[0],CS1[1]) > + * set_parallel() > + * > + * Results in the following valid placements: > + * CS0[0], CS1[0] > + * CS0[0], CS1[1] > + * CS0[1], CS1[0] > + * CS0[1], CS1[1] > + * > + * Example 2 pseudo code: > + * CS[X] = generic engine of same class, logical instance X > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > + * set_engines(INVALID, INVALID) > + * set_load_balance(engine_index=0, num_siblings=3, > engines=CS[0],CS[1],CS[2]) > + * set_load_balance(engine_index=1, num_siblings=3, > engines=CS[0],CS[1],CS[2]) > + *
Re: [Intel-gfx] [RFC PATCH 1/5] drm/doc/rfc: i915 GuC submission / DRM scheduler integration plan
Hi, On Tue, 11 May 2021 at 15:34, Daniel Vetter wrote: > On Thu, May 06, 2021 at 10:30:45AM -0700, Matthew Brost wrote: > > +No major changes are required to the uAPI for basic GuC submission. The > > only > > +change is a new scheduler attribute: > > I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP. > > +This attribute indicates the 2k i915 user priority levels are statically > > mapped > > +into 3 levels as follows: > > + > > +* -1k to -1 Low priority > > +* 0 Medium priority > > +* 1 to 1k High priority > > + > > +This is needed because the GuC only has 4 priority bands. The highest > > priority > > +band is reserved with the kernel. This aligns with the DRM scheduler > > priority > > +levels too. > > Please Cc: mesa and get an ack from Jason Ekstrand or Ken Graunke on this, > just to be sure. A reference to the actual specs this targets would help. I don't have oneAPI to hand if it's relevant, but the two in graphics world are https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt and https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority - both of them pretty much say that the implementation may do anything or nothing at all, so this isn't a problem for spec conformance, only a matter of user priority (sorry). Cheers, Daniel
Re: [RFC PATCH 3/5] drm/i915: Expose logical engine instance to user
On Thu, May 06, 2021 at 10:30:47AM -0700, Matthew Brost wrote: > Expose logical engine instance to user via query engine info IOCTL. This > is required for split-frame workloads as these need to be placed on > engines in a logically contiguous order. The logical mapping can change > based on fusing. Rather than having user have knowledge of the fusing we > simply just expose the logical mapping with the existing query engine > info IOCTL. > > Cc: Tvrtko Ursulin > Cc: Tony Ye > CC: Carl Zhang > Cc: Daniel Vetter > Cc: Jason Ekstrand > Signed-off-by: Matthew Brost > --- > include/uapi/drm/i915_drm.h | 7 ++- Two things on all these 3 patches: - Until we've merged the uapi it shouldn't show up in uapi headers. See what Matt A. has done with a fake local header in Documentation/gpu/rfc which you can pull in. - Since this one is tiny I think just the text in the rfc is good enough, I'd drop this. - Squash the others in with the parallel submit rfc patch so that the structs and long-form text are all in one patch please, makes reviewing the overall thing a bit simpler. Rule is to have a complete change per patch, and then not split things further. > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index 9f331ad629f5..26d2e135aa31 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -2396,14 +2396,19 @@ struct drm_i915_engine_info { > > /** @flags: Engine flags. */ > __u64 flags; > +#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE(1 << 0) > > /** @capabilities: Capabilities of this engine. */ > __u64 capabilities; > #define I915_VIDEO_CLASS_CAPABILITY_HEVC (1 << 0) > #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC (1 << 1) > > + /** Logical engine instance */ I think in the final version that we merge with the uapi this should: - explain why we need this - link to relevant other uapi like the paralle submit extension Cheers, Daniel > + __u16 logical_instance; > + > /** @rsvd1: Reserved fields. */ > - __u64 rsvd1[4]; > + __u16 rsvd1[3]; > + __u64 rsvd2[3]; > }; > > /** > -- > 2.28.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v6 04/16] drm/amdkfd: Split kfd suspend from devie exit
On 2021-05-11 2:40 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: Helps to expdite HW related stuff to amdgpu_pci_remove Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5f6696a3c778..2b06dee9a0ce 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -170,7 +170,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev) } } -void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev) +void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev) { if (adev->kfd.dev) { kgd2kfd_device_exit(adev->kfd.dev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 14f68c028126..f8e10af99c28 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -127,7 +127,7 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev, const void *ih_ring_entry); void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev); void amdgpu_amdkfd_device_init(struct amdgpu_device *adev); -void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev); +void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev); int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine, uint32_t vmid, uint64_t gpu_addr, uint32_t *ib_cmd, uint32_t ib_len); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 357b9bf62a1c..ab6d2a43c9a3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -858,10 +858,11 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, return kfd->init_complete; } + + Looks like unnecessary white space change to me. void kgd2kfd_device_exit(struct kfd_dev *kfd) { if (kfd->init_complete) { - kgd2kfd_suspend(kfd, false); Where is the call to this function now? Christian. In patch 'drm/amdgpu: Add early fini callback' in amdgpu_device_ip_fini_early->amdgpu_amdkfd_suspend->kgd2kfd_suspend Andrey device_queue_manager_uninit(kfd->dqm); kfd_interrupt_exit(kfd); kfd_topology_remove_device(kfd);
Re: [PATCH] component: Move host device to end of device lists on binding
On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote: > Within the component device framework this usually isn't that bad > because the real driver work is done at bind time via > component{,master}_ops::bind(). It becomes a problem when the driver > core, or host driver, wants to operate on the component device outside > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The > driver core doesn't understand the relationship between the host device > and the component devices and could possibly try to operate on component > devices when they're already removed from the system or shut down. You really are not supposed to be doing anything with component devices once they have been unbound. You can do stuff with them only between the bind() and the unbind() callbacks for the host device. Access to the host devices outside of that is totally undefined and should not be done. The shutdown callback should be fine as long as the other devices are still bound, but there will be implications if the shutdown order matters. However, randomly pulling devices around in the DPM list sounds to me like a very bad idea. What happens if such re-orderings result in a child device being shutdown after a parent device has been shut down? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [Intel-gfx] [RFC PATCH 2/5] drm/doc/rfc: i915 new parallel submission uAPI plan
On Thu, May 06, 2021 at 10:30:46AM -0700, Matthew Brost wrote: > Add entry fpr i915 new parallel submission uAPI plan. > > Cc: Tvrtko Ursulin > Cc: Tony Ye > CC: Carl Zhang > Cc: Daniel Vetter > Cc: Jason Ekstrand > Signed-off-by: Matthew Brost > --- > Documentation/gpu/rfc/i915_scheduler.rst | 56 +++- > 1 file changed, 54 insertions(+), 2 deletions(-) > > diff --git a/Documentation/gpu/rfc/i915_scheduler.rst > b/Documentation/gpu/rfc/i915_scheduler.rst > index fa6780a11c86..e3455b33edfe 100644 > --- a/Documentation/gpu/rfc/i915_scheduler.rst > +++ b/Documentation/gpu/rfc/i915_scheduler.rst > @@ -13,7 +13,8 @@ i915 with the DRM scheduler is: > modparam enable_guc > * Lots of rework will need to be done to integrate with DRM scheduler so > no need to nit pick everything in the code, it just should be > - functional and not regress execlists > + functional, no major coding style / layering errors, and not regress > + execlists I guess this hunk should be in the previous patch? > * Update IGTs / selftests as needed to work with GuC submission > * Enable CI on supported platforms for a baseline > * Rework / get CI heathly for GuC submission in place as needed > @@ -67,4 +68,55 @@ levels too. > > New parallel submission uAPI > > -Details to come in a following patch. > +The existing bonding uAPI is completely broken with GuC submission because > +whether a submission is a single context submit or parallel submit isn't > known > +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple > +contexts in parallel with the GuC the context must be explictly registered > with > +N contexts and all N contexts must be submitted in a single command to the > GuC. > +This interfaces doesn't support dynamically changing between N contexts as > the > +bonding uAPI does. Hence the need for a new parallel submission interface. > Also > +the legacy bonding uAPI is quite confusing and not intuitive at all. I think you should sit together with Jason on irc or so for a bit and get an earful of how it's all broken irrespective of GuC submission or not. Just to hammer in our case :-) > + > +The new parallel submission uAPI consists of 3 parts: > + > +* Export engines logical mapping > +* A 'set_parallel' extension to configure contexts for parallel > + submission > +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL > + > +Export engines logical mapping > +-- > +Certain use cases require BBs to be placed on engine instances in logical > order > +(e.g. split-frame on gen11+). The logical mapping of engine instances can > change > +based on fusing. Rather than making UMDs be aware of fusing, simply expose > the > +logical mapping with the existing query engine info IOCTL. Also the GuC > +submission interface currently only supports submitting multiple contexts to > +engines in logical order. Maybe highlight more that this is a new restriction with GuC compared to execlist, which is why we need to expose this information to userspace. Also on the platforms thus far supported in upstream there's at most 2 engines of the same type, so really not an issue. > + > +A single bit will be added to drm_i915_engine_info.flags indicating that the > +logical instance has been returned and a new field, > +drm_i915_engine_info.logical_instance, returns the logical instance. > + > +A 'set_parallel' extension to configure contexts for parallel submission > + > +The 'set_parallel' extension configures N contexts for parallel submission. > It > +is setup step that should be called before using any of the contexts. See > +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for > +similar existing examples. Once the N contexts are configured for parallel > +submission the execbuf2 IOCTL can be called submiting 1-N BBs in a single > IOCTL. > +Although submitting less than N BBs is allowed it is not recommended as that > +will likely leave parts of the hardware reserved and idle. Initially only > +support GuC submission. Execlist support can be added later if needed. Can we just require that you always submit N batchbuffers, or does this create a problem for userspace? Allowing things just because is generally not a good idea with uapi, it's better to limit and then allow when there's a need. Ofc if we already have a need then explain why and that's all fine. Also detailed comments on the kerneldoc I'll do in the next patches. > + > +Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and > +i915_context_engines_parallel_submit to the uAPI to implement this extension. > + > +Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL > +--- > +Contexts that have been configured with the
Re: [PATCH v6 01/16] drm/ttm: Remap all page faults to per process dummy page.
On 2021-05-11 2:38 a.m., Christian König wrote: Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky: On device removal reroute all CPU mappings to dummy page. v3: Remove loop to find DRM file and instead access it by vma->vm_file->private_data. Move dummy page installation into a separate function. v4: Map the entire BOs VA space into on demand allocated dummy page on the first fault for that BO. v5: Remove duplicate return. v6: Polish ttm_bo_vm_dummy_page, remove superflous code. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 57 - include/drm/ttm/ttm_bo_api.h | 2 ++ 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index b31b18058965..e5a9615519d1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -34,6 +34,8 @@ #include #include #include +#include +#include #include #include #include @@ -380,19 +382,72 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved); +static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res) +{ + struct page *dummy_page = (struct page *)res; + + __free_page(dummy_page); +} + +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) +{ + struct vm_area_struct *vma = vmf->vma; + struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; + vm_fault_t ret = VM_FAULT_NOPAGE; + unsigned long address; + unsigned long pfn; + struct page *page; + + /* Allocate new dummy page to map all the VA range in this VMA to it*/ + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return VM_FAULT_OOM; + + pfn = page_to_pfn(page); + + /* Prefault the entire VMA range right away to avoid further faults */ + for (address = vma->vm_start; address < vma->vm_end; address += PAGE_SIZE) { + + if (unlikely(address >= vma->vm_end)) + break; That extra check can be removed as far as I can see. + + if (vma->vm_flags & VM_MIXEDMAP) + ret = vmf_insert_mixed_prot(vma, address, + __pfn_to_pfn_t(pfn, PFN_DEV), + prot); + else + ret = vmf_insert_pfn_prot(vma, address, pfn, prot); + } + + /* Set the page to be freed using drmm release action */ + if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page)) + return VM_FAULT_OOM; You should probably move that before inserting the page into the VMA and also free the allocated page if it goes wrong. drmm_add_action_or_reset will automatically release the page if the add action fails, that the 'reset' part of the function. Andrey Apart from that patch looks good to me, Christian. + + return ret; +} +EXPORT_SYMBOL(ttm_bo_vm_dummy_page); + vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; pgprot_t prot; struct ttm_buffer_object *bo = vma->vm_private_data; + struct drm_device *ddev = bo->base.dev; vm_fault_t ret; + int idx; ret = ttm_bo_vm_reserve(bo, vmf); if (ret) return ret; prot = vma->vm_page_prot; - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + if (drm_dev_enter(ddev, )) { + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1); + drm_dev_exit(idx); + } else { + ret = ttm_bo_vm_dummy_page(vmf, prot); + } if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret; diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 639521880c29..254ede97f8e3 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -620,4 +620,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int len, int write); bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all); +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot); + #endif
Re: [PATCH v3 1/1] kernel.h: Split out panic and oops helpers
On 5/11/21 2:41 AM, Andy Shevchenko wrote: kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. Signed-off-by: Andy Shevchenko Reviewed-by: Bjorn Andersson Acked-by: Mike Rapoport Acked-by: Corey Minyard Acked-by: Christian Brauner Acked-by: Arnd Bergmann Acked-by: Kees Cook Acked-by: Wei Liu Acked-by: Rasmus Villemoes Co-developed-by: Andrew Morton Signed-off-by: Andrew Morton Acked-by: Sebastian Reichel Acked-by: Luis Chamberlain Acked-by: Stephen Boyd Acked-by: Thomas Bogendoerfer Acked-by: Helge Deller # parisc --- v3: rebased on top of v5.13-rc1, collected a few more tags Note WRT Andrew's SoB tag above: I have added it since part of the cases I took from him. Andrew, feel free to amend or tell me how you want me to do. Acked-by: Alex Elder . . . diff --git a/drivers/net/ipa/ipa_smp2p.c b/drivers/net/ipa/ipa_smp2p.c index a5f7a79a1923..34b68dc43886 100644 --- a/drivers/net/ipa/ipa_smp2p.c +++ b/drivers/net/ipa/ipa_smp2p.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include . . .
Re: [PATCH 1/1] drm/vc4: Remove redundant error printing in vc4_ioremap_regs()
On Tue, May 11, 2021 at 05:29:23PM +0800, Zhen Lei wrote: > When devm_ioremap_resource() fails, a clear enough error message will be > printed by its subfunction __devm_ioremap_resource(). The error > information contains the device name, failure cause, and possibly resource > information. > > Therefore, remove the error printing here to simplify code and reduce the > binary size. > > Reported-by: Hulk Robot > Signed-off-by: Zhen Lei Merged, thanks Maxime signature.asc Description: PGP signature
Re: [Intel-gfx] [RFC PATCH 1/5] drm/doc/rfc: i915 GuC submission / DRM scheduler integration plan
On Thu, May 06, 2021 at 10:30:45AM -0700, Matthew Brost wrote: > Add entry for i915 GuC submission / DRM scheduler integration plan. > Follow up patch with details of new parallel submission uAPI to come. > > Cc: Jon Bloomfield > Cc: Jason Ekstrand > Cc: Dave Airlie > Cc: Daniel Vetter > Cc: Jason Ekstrand > Cc: dri-devel@lists.freedesktop.org > Signed-off-by: Matthew Brost Would be good to Cc: some drm/scheduler folks here for the next round: $ scripts/get_maintainer.pl -f -- drivers/gpu/drm/scheduler/ says we have maybe the following missing: "Christian König" Luben Tuikov Alex Deucher Steven Price Lee Jones did a ton of warning fixes over the entire tree, so doesn't care about drm/scheduler design directly. > --- > Documentation/gpu/rfc/i915_scheduler.rst | 70 > Documentation/gpu/rfc/index.rst | 4 ++ > 2 files changed, 74 insertions(+) > create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst > > diff --git a/Documentation/gpu/rfc/i915_scheduler.rst > b/Documentation/gpu/rfc/i915_scheduler.rst > new file mode 100644 > index ..fa6780a11c86 > --- /dev/null > +++ b/Documentation/gpu/rfc/i915_scheduler.rst > @@ -0,0 +1,70 @@ > += > +I915 GuC Submission/DRM Scheduler Section > += > + > +Upstream plan > += > +For upstream the overall plan for landing GuC submission and integrating the > +i915 with the DRM scheduler is: > + > +* Merge basic GuC submission > + * Basic submission support for all gen11+ platforms > + * Not enabled by default on any current platforms but can be enabled via > + modparam enable_guc > + * Lots of rework will need to be done to integrate with DRM scheduler so > + no need to nit pick everything in the code, it just should be > + functional and not regress execlists > + * Update IGTs / selftests as needed to work with GuC submission > + * Enable CI on supported platforms for a baseline > + * Rework / get CI heathly for GuC submission in place as needed > +* Merge new parallel submission uAPI > + * Bonding uAPI completely incompatible with GuC submission Maybe clarify that this isn't the only issue with the bonding uapi, so perhaps add "Plus it has severe design issues in general, which is why we want to retire it no matter what". Or something like that. Not sure we should go into full details here, maybe as part of the next patch about parallel submit and all that. > + * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step > + which configures contexts N wide > + * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to > + a context in a single execbuf IOCTL and the batches run on the GPU in > + paralllel > + * Initially only for GuC submission but execlists can be supported if > + needed > +* Convert the i915 to use the DRM scheduler > + * GuC submission backend fully integrated with DRM scheduler > + * All request queues removed from backend (e.g. all backpressure > + handled in DRM scheduler) > + * Resets / cancels hook in DRM scheduler > + * Watchdog hooks into DRM scheduler > + * Lots of complexity of the GuC backend can be pulled out once > + integrated with DRM scheduler (e.g. state machine gets > + simplier, locking gets simplier, etc...) > + * Execlist backend will do the minimum required to hook in the DRM > + scheduler so it can live next to the fully integrated GuC backend > + * Legacy interface > + * Features like timeslicing / preemption / virtual engines would > + be difficult to integrate with the DRM scheduler and these > + features are not required for GuC submission as the GuC does > + these things for us > + * ROI low on fully integrating into DRM scheduler > + * Fully integrating would add lots of complexity to DRM > + scheduler > + * Port i915 priority inheritance / boosting feature in DRM scheduler Maybe a few words on what this does and why we care? Just so drm/scheduler people know what's coming. > + * Remove in-order completion assumptions from DRM scheduler I think it'd be good to put a few words here why we need this. We want to use drm scheduler for dependencies, but rely on the hw/fw scheduler (or well backend for execlist) to handle preemption, round-robin and that kind of stuff. Hence we want to have all runnable requests in the backend (excluding backpressure and stuff like that), and they can complete out-of-order. Maybe also highlight this one in the commit message to get drm/scheduler folks' attention on this and the previous one for discussion. > + * Pull out i915 priority levels and use DRM priority levels > + * Optimize DRM scheduler as needed
Re: [PATCH v7 0/3] drm/i915/display: Try YCbCr420 color when RGB fails
On Mon, May 10, 2021 at 03:33:46PM +0200, Werner Sembach wrote: > When encoder validation of a display mode fails, retry with less bandwidth > heavy YCbCr420 color mode, if available. This enables some HDMI 1.4 setups > to support 4k60Hz output, which previously failed silently. > > AMDGPU had nearly the exact same issue. This problem description is > therefore copied from my commit message of the AMDGPU patch. > > On some setups, while the monitor and the gpu support display modes with > pixel clocks of up to 600MHz, the link encoder might not. This prevents > YCbCr444 and RGB encoding for 4k60Hz, but YCbCr420 encoding might still be > possible. However, which color mode is used is decided before the link > encoder capabilities are checked. This patch fixes the problem by retrying > to find a display mode with YCbCr420 enforced and using it, if it is > valid. > > This patchset is revision 7. Fixed a rebase issue in 1/3 and moved message > from error output to debug output in 2/3. Looks good and CI seem shappy. Series pushed to drm-intel-next. Thanks. -- Ville Syrjälä Intel
Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend
On 5/11/21 4:09 PM, Christian König wrote: Am 11.05.21 um 16:06 schrieb Thomas Hellström (Intel): On 5/11/21 3:58 PM, Christian König wrote: Am 11.05.21 um 15:25 schrieb Thomas Hellström: Most logical place to introduce TTM buffer objects is as an i915 gem object backend. We need to add some ops to account for added functionality like delayed delete and LRU list manipulation. Initially we support only LMEM and SYSTEM memory, but SYSTEM (which in this case means evicted LMEM objects) is not visible to i915 GEM yet. The plan is to move the i915 gem system region over to the TTM system memory type in upcoming patches. We set up GPU bindings directly both from LMEM and from the system region, as there is no need to use the legacy TTM_TT memory type. We reserve that for future porting of GGTT bindings to TTM. There are some changes to TTM to allow for purging system memory buffer objects and to refuse swapping of some objects: Unfortunately i915 gem still relies heavily on short-term object pinning, and we've chosen to keep short-term-pinned buffer objects on the TTM LRU lists for now, meaning that we need some sort of mechanism to tell TTM they are not swappable. A longer term goal is to get rid of the short-term pinning. Well just use the eviction_valuable interface for this. Yes, we do that for vram/lmem eviction, but we have nothing similar for system swapping. Do I understand you correctly that you want me to add a call to eviction_valuable() also for that instead of swap_possible()? You should already have that. eviction_valuable is called in both cases. Hmm. I can only see it called from ttm_mem_evict_first() which is not in the swapping path? Or do I miss something? Thanks, Thomas
Re: [RFC] Implicit vs explicit user fence sync
On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote: > Am 11.05.21 um 09:31 schrieb Daniel Vetter: > > [SNIP] > > > > And that's just the one ioctl I know is big trouble, I'm sure we'll find > > > > more funny corner cases when we roll out explicit user fencing. > > > I think we can just ignore sync_file. As far as it concerns me that UAPI > > > is > > > pretty much dead. > > Uh that's rather bold. Android is built on it. Currently atomic kms is > > built on it. > > To be honest I don't think we care about Android at all. we = amd or we = upstream here? > > > What we should support is drm_syncobj, but that also only as an in-fence > > > since that's what our hardware supports. > > Convince Android folks, minimally. Probably a lot more. Yes with hindsight > > we should have just gone for drm_syncobj instead of the sync_file thing, > > but hindsight and all that. > > > > This is kinda why I don't think trying to support the existing uapi with > > userspace fences underneath with some magic tricks is a good idea. It's > > just a pile of work, plus it's not really architecturally clean. > > > > > > Anotherone that looks very sketchy right now is buffer sharing between > > > > different userspace drivers, like compute <-> media (if you have some > > > > fancy AI pipeline in your media workload, as an example). > > > Yeah, we are certainly going to get that. But only inside the same driver, > > > so not much of a problem. > > Why is this not much of a problem if it's just within one driver? > > Because inside the same driver I can easily add the waits before submitting > the MM work as necessary. What is MM work here now? > > > > > Adding implicit synchronization on top of that is then rather trivial. > > > > Well that's what I disagree with, since I already see some problems > > > > that I > > > > don't think we can overcome (the atomic ioctl is one). And that's with > > > > us > > > > only having a fairly theoretical understanding of the overall situation. > > > But how should we then ever support user fences with the atomic IOCTL? > > > > > > We can't wait in user space since that will disable the support for > > > waiting > > > in the hardware. > > Well, figure it out :-) > > > > This is exactly why I'm not seeing anything solved with just rolling a > > function call to a bunch of places, because it's pretending all things are > > solved when clearly that's not the case. > > > > I really think what we need is to first figure out how to support > > userspace fences as explicit entities across the stack, maybe with > > something like this order: > > 1. enable them purely within a single userspace driver (like vk with > > winsys disabled, or something else like that except not amd because > > there's this amdkfd split for "real" compute) > > 1a. including atomic ioctl, e.g. for vk direct display support this can be > > used without cross-process sharing, new winsys protocols and all that fun > > 2. figure out how to transport these userspace fences with something like > > drm_syncobj > > 2a. figure out the compat story for drivers which dont do userspace fences > > 2b. figure out how to absorb the overhead if the winsys/compositor doesn't > > support explicit sync > > 3. maybe figure out how to make this all happen magically with implicit > > sync, if we really, really care > > > > If we do 3 before we've nailed all these problems, we're just guaranteeing > > we'll get the wrong solutions and so we'll then have 3 ways of doing > > userspace fences > > - the butchered implicit one that didn't quite work > > - the explicit one > > - the not-so-butchered implicit one with the lessons from the properly > >done explicit one > > > > The thing is, if you have no idea how to integrate userspace fences > > explicitly into atomic ioctl, then you definitely have no idea how to do > > it implicitly :-) > > Well I agree on that. But the question is still how would you do explicit > with atomic? If you supply an userpace fence (is that what we call them now) as in-fence, then your only allowed to get a userspace fence as out-fence. That way we - don't block anywhere we shouldn't - don't create a dma_fence out of a userspace fence The problem is this completely breaks your "magically make implicit fencing with userspace fences" plan. So I have a plan here, what was yours? > Transporting fences between processes is not the fundamental problem here, > but rather the question how we represent all this in the kernel? > > In other words I think what you outlined above is just approaching it from > the wrong side again. Instead of looking what the kernel needs to support > this you take a look at userspace and the requirements there. Uh ... that was my idea here? That's why I put "build userspace fences in userspace only" as the very first thing. Then extend to winsys and atomic/display and all these cases where things get more tricky. I agree that transporting the fences is easy, which is
[PATCH] drm/msm/dpu: fix smart dma support
Downstream driver uses dpu->caps->smart_dma_rev to update sspp->cap->features with the bit corresponding to the supported SmartDMA version. Upstream driver does not do this, resulting in SSPP subdriver not enbaling setup_multirect callback. Make SSPP subdriver check global smart_dma_rev to decide if setup_multirect should be enabled. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 10 +- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h | 16 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c| 9 + 3 files changed, 22 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c index b569030a0847..036334e3d99d 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c @@ -157,7 +157,7 @@ static const struct dpu_caps sdm845_dpu_caps = { .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH, .max_mixer_blendstages = 0xb, .qseed_type = DPU_SSPP_SCALER_QSEED3, - .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, + .smart_dma_rev = DPU_SMART_DMA_V2, .ubwc_version = DPU_HW_UBWC_VER_20, .has_src_split = true, .has_dim_layer = true, @@ -173,7 +173,7 @@ static const struct dpu_caps sc7180_dpu_caps = { .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH, .max_mixer_blendstages = 0x9, .qseed_type = DPU_SSPP_SCALER_QSEED4, - .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, + .smart_dma_rev = DPU_SMART_DMA_V2, .ubwc_version = DPU_HW_UBWC_VER_20, .has_dim_layer = true, .has_idle_pc = true, @@ -185,7 +185,7 @@ static const struct dpu_caps sm8150_dpu_caps = { .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH, .max_mixer_blendstages = 0xb, .qseed_type = DPU_SSPP_SCALER_QSEED3, - .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */ + .smart_dma_rev = DPU_SMART_DMA_V2, /* TODO: v2.5 */ .ubwc_version = DPU_HW_UBWC_VER_30, .has_src_split = true, .has_dim_layer = true, @@ -201,7 +201,7 @@ static const struct dpu_caps sm8250_dpu_caps = { .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH, .max_mixer_blendstages = 0xb, .qseed_type = DPU_SSPP_SCALER_QSEED3LITE, - .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */ + .smart_dma_rev = DPU_SMART_DMA_V2, /* TODO: v2.5 */ .ubwc_version = DPU_HW_UBWC_VER_40, .has_src_split = true, .has_dim_layer = true, @@ -215,7 +215,7 @@ static const struct dpu_caps sc7280_dpu_caps = { .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH, .max_mixer_blendstages = 0x7, .qseed_type = DPU_SSPP_SCALER_QSEED4, - .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, + .smart_dma_rev = DPU_SMART_DMA_V2, .ubwc_version = DPU_HW_UBWC_VER_30, .has_dim_layer = true, .has_idle_pc = true, diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h index 4dfd8a20ad5c..04ebccd92d4e 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h @@ -70,6 +70,18 @@ enum { DPU_HW_UBWC_VER_40 = 0x400, }; +/** + * SmartDMA support + * @DPU_SMART_DMA_UNSUPPORTED, SmartDMA not support + * @DPU_SMART_DMA_V1, SmartDMA 1.0 support + * @DPU_SMART_DMA_V2, SmartDMA 2.0 support + */ +enum { + DPU_SMART_DMA_UNSUPPORTED, + DPU_SMART_DMA_V1, + DPU_SMART_DMA_V2, +}; + /** * MDP TOP BLOCK features * @DPU_MDP_PANIC_PER_PIPE Panic configuration needs to be be done per pipe @@ -104,8 +116,6 @@ enum { * @DPU_SSPP_QOS,SSPP support QoS control, danger/safe/creq * @DPU_SSPP_QOS_8LVL, SSPP support 8-level QoS control * @DPU_SSPP_EXCL_RECT, SSPP supports exclusion rect - * @DPU_SSPP_SMART_DMA_V1, SmartDMA 1.0 support - * @DPU_SSPP_SMART_DMA_V2, SmartDMA 2.0 support * @DPU_SSPP_TS_PREFILL Supports prefill with traffic shaper * @DPU_SSPP_TS_PREFILL_REC1 Supports prefill with traffic shaper multirec * @DPU_SSPP_CDP Supports client driven prefetch @@ -124,8 +134,6 @@ enum { DPU_SSPP_QOS, DPU_SSPP_QOS_8LVL, DPU_SSPP_EXCL_RECT, - DPU_SSPP_SMART_DMA_V1, - DPU_SSPP_SMART_DMA_V2, DPU_SSPP_TS_PREFILL, DPU_SSPP_TS_PREFILL_REC1, DPU_SSPP_CDP, diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c index 34d81aa16041..3ce4c5cd5d05 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c @@ -647,7 +647,8 @@ static void dpu_hw_sspp_setup_cdp(struct dpu_hw_pipe *ctx, } static void _setup_layer_ops(struct dpu_hw_pipe *c, - unsigned long features) + unsigned long features, + int smart_dma_rev) { if
[PATCH] drm/msm/dpu: simplify dpu_core_irq_en/disable helpers
dpu_core_irq_en/disable helpers are always called with the irq_count equal to 1. Merge them with _dpu_core_en/disable functions and make them handle just one interrupt index at a time. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c | 50 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h | 20 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 +- 3 files changed, 18 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c index c10761ea191c..0ee9ac21e24a 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c @@ -63,11 +63,11 @@ int dpu_core_irq_idx_lookup(struct dpu_kms *dpu_kms, } /** - * _dpu_core_irq_enable - enable core interrupt given by the index + * dpu_core_irq_enable - enable core interrupt given by the index * @dpu_kms: Pointer to dpu kms context * @irq_idx: interrupt index */ -static int _dpu_core_irq_enable(struct dpu_kms *dpu_kms, int irq_idx) +int dpu_core_irq_enable(struct dpu_kms *dpu_kms, int irq_idx) { unsigned long irq_flags; int ret = 0, enable_count; @@ -85,6 +85,8 @@ static int _dpu_core_irq_enable(struct dpu_kms *dpu_kms, int irq_idx) } enable_count = atomic_read(_kms->irq_obj.enable_counts[irq_idx]); + if (enable_count) + DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idx, enable_count); DRM_DEBUG_KMS("irq_idx=%d enable_count=%d\n", irq_idx, enable_count); trace_dpu_core_irq_enable_idx(irq_idx, enable_count); @@ -109,31 +111,12 @@ static int _dpu_core_irq_enable(struct dpu_kms *dpu_kms, int irq_idx) return ret; } -int dpu_core_irq_enable(struct dpu_kms *dpu_kms, int *irq_idxs, u32 irq_count) -{ - int i, ret = 0, counts; - - if (!irq_idxs || !irq_count) { - DPU_ERROR("invalid params\n"); - return -EINVAL; - } - - counts = atomic_read(_kms->irq_obj.enable_counts[irq_idxs[0]]); - if (counts) - DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idxs[0], counts); - - for (i = 0; (i < irq_count) && !ret; i++) - ret = _dpu_core_irq_enable(dpu_kms, irq_idxs[i]); - - return ret; -} - /** - * _dpu_core_irq_disable - disable core interrupt given by the index + * dpu_core_irq_disable - disable core interrupt given by the index * @dpu_kms: Pointer to dpu kms context * @irq_idx: interrupt index */ -static int _dpu_core_irq_disable(struct dpu_kms *dpu_kms, int irq_idx) +int dpu_core_irq_disable(struct dpu_kms *dpu_kms, int irq_idx) { int ret = 0, enable_count; @@ -148,6 +131,8 @@ static int _dpu_core_irq_disable(struct dpu_kms *dpu_kms, int irq_idx) } enable_count = atomic_read(_kms->irq_obj.enable_counts[irq_idx]); + if (enable_count > 1) + DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idx, enable_count); DRM_DEBUG_KMS("irq_idx=%d enable_count=%d\n", irq_idx, enable_count); trace_dpu_core_irq_disable_idx(irq_idx, enable_count); @@ -164,25 +149,6 @@ static int _dpu_core_irq_disable(struct dpu_kms *dpu_kms, int irq_idx) return ret; } -int dpu_core_irq_disable(struct dpu_kms *dpu_kms, int *irq_idxs, u32 irq_count) -{ - int i, ret = 0, counts; - - if (!irq_idxs || !irq_count) { - DPU_ERROR("invalid params\n"); - return -EINVAL; - } - - counts = atomic_read(_kms->irq_obj.enable_counts[irq_idxs[0]]); - if (counts == 2) - DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idxs[0], counts); - - for (i = 0; (i < irq_count) && !ret; i++) - ret = _dpu_core_irq_disable(dpu_kms, irq_idxs[i]); - - return ret; -} - u32 dpu_core_irq_read(struct dpu_kms *dpu_kms, int irq_idx, bool clear) { if (!dpu_kms->hw_intr) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h index e30775e6585b..2ac781738e83 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h @@ -43,34 +43,30 @@ int dpu_core_irq_idx_lookup( uint32_t instance_idx); /** - * dpu_core_irq_enable - IRQ helper function for enabling one or more IRQs + * dpu_core_irq_enable - IRQ helper function for enabling IRQ * @dpu_kms: DPU handle - * @irq_idxs: Array of irq index - * @irq_count: Number of irq_idx provided in the array + * @irq_idx: irq index * @return:0 for success enabling IRQ, otherwise failure * * This function increments count on each enable and decrements on each - * disable. Interrupts is enabled if count is 0 before increment. + * disable. Interrupt is enabled if count is 0 before increment. */ int dpu_core_irq_enable( struct dpu_kms *dpu_kms, -
Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend
Am 11.05.21 um 16:06 schrieb Thomas Hellström (Intel): On 5/11/21 3:58 PM, Christian König wrote: Am 11.05.21 um 15:25 schrieb Thomas Hellström: Most logical place to introduce TTM buffer objects is as an i915 gem object backend. We need to add some ops to account for added functionality like delayed delete and LRU list manipulation. Initially we support only LMEM and SYSTEM memory, but SYSTEM (which in this case means evicted LMEM objects) is not visible to i915 GEM yet. The plan is to move the i915 gem system region over to the TTM system memory type in upcoming patches. We set up GPU bindings directly both from LMEM and from the system region, as there is no need to use the legacy TTM_TT memory type. We reserve that for future porting of GGTT bindings to TTM. There are some changes to TTM to allow for purging system memory buffer objects and to refuse swapping of some objects: Unfortunately i915 gem still relies heavily on short-term object pinning, and we've chosen to keep short-term-pinned buffer objects on the TTM LRU lists for now, meaning that we need some sort of mechanism to tell TTM they are not swappable. A longer term goal is to get rid of the short-term pinning. Well just use the eviction_valuable interface for this. Yes, we do that for vram/lmem eviction, but we have nothing similar for system swapping. Do I understand you correctly that you want me to add a call to eviction_valuable() also for that instead of swap_possible()? You should already have that. eviction_valuable is called in both cases. In general please make separate patches for the TTM changes and for the i915 changes using them for easier review. I'll respin with a split. Do you want me to do the same also for the other two patches that minmally touch TTM? Yes, that makes it much easier to review the general usefulness of interface changes. Thanks, Christian. Thanks, Thomas
[PATCH 1/2] drm/msm/dpu: simplify clocks handling
DPU driver contains code to parse clock items from device tree into special data struct and then enable/disable/set rate for the clocks using that data struct. However the DPU driver itself uses only parsing and enabling/disabling part (the rate setting is used by DP driver). Move this implementation to the DP driver (which actually uses rate setting) and replace hand-coded enable/disable/get loops in the DPU with the respective clk_bulk operations. Put operation is removed completely because, it is handled using devres instead. DP implementation is unchanged for now. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/Makefile | 2 +- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 24 ++- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 6 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 46 +++-- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 26 +++ .../dpu1/dpu_io_util.c => dp/dp_clk_util.c} | 69 +-- .../dpu1/dpu_io_util.h => dp/dp_clk_util.h} | 2 - drivers/gpu/drm/msm/dp/dp_parser.h| 2 +- drivers/gpu/drm/msm/msm_drv.c | 49 + drivers/gpu/drm/msm/msm_drv.h | 1 + 11 files changed, 84 insertions(+), 147 deletions(-) rename drivers/gpu/drm/msm/{disp/dpu1/dpu_io_util.c => dp/dp_clk_util.c} (61%) rename drivers/gpu/drm/msm/{disp/dpu1/dpu_io_util.h => dp/dp_clk_util.h} (92%) diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 610d630326bb..6621b75e3c7b 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -71,7 +71,6 @@ msm-y := \ disp/dpu1/dpu_hw_top.o \ disp/dpu1/dpu_hw_util.o \ disp/dpu1/dpu_hw_vbif.o \ - disp/dpu1/dpu_io_util.o \ disp/dpu1/dpu_kms.o \ disp/dpu1/dpu_mdss.o \ disp/dpu1/dpu_plane.o \ @@ -104,6 +103,7 @@ msm-$(CONFIG_DRM_MSM_GPU_STATE) += adreno/a6xx_gpu_state.o msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \ dp/dp_catalog.o \ + dp/dp_clk_util.o \ dp/dp_ctrl.o \ dp/dp_display.o \ dp/dp_drm.o \ diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c index 7cba5bbdf4b7..ec3595b48bef 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c @@ -284,17 +284,6 @@ void dpu_core_perf_crtc_release_bw(struct drm_crtc *crtc) } } -static int _dpu_core_perf_set_core_clk_rate(struct dpu_kms *kms, u64 rate) -{ - struct dss_clk *core_clk = kms->perf.core_clk; - - if (core_clk->max_rate && (rate > core_clk->max_rate)) - rate = core_clk->max_rate; - - core_clk->rate = rate; - return dev_pm_opp_set_rate(>pdev->dev, core_clk->rate); -} - static u64 _dpu_core_perf_get_core_clk_rate(struct dpu_kms *kms) { u64 clk_rate = kms->perf.perf_tune.min_core_clk; @@ -306,7 +295,7 @@ static u64 _dpu_core_perf_get_core_clk_rate(struct dpu_kms *kms) dpu_cstate = to_dpu_crtc_state(crtc->state); clk_rate = max(dpu_cstate->new_perf.core_clk_rate, clk_rate); - clk_rate = clk_round_rate(kms->perf.core_clk->clk, + clk_rate = clk_round_rate(kms->perf.core_clk, clk_rate); } } @@ -405,10 +394,11 @@ int dpu_core_perf_crtc_update(struct drm_crtc *crtc, trace_dpu_core_perf_update_clk(kms->dev, stop_req, clk_rate); - ret = _dpu_core_perf_set_core_clk_rate(kms, clk_rate); + if (clk_rate > kms->perf.max_core_clk_rate) + clk_rate = kms->perf.max_core_clk_rate; + ret = dev_pm_opp_set_rate(>pdev->dev, clk_rate); if (ret) { - DPU_ERROR("failed to set %s clock rate %llu\n", - kms->perf.core_clk->clk_name, clk_rate); + DPU_ERROR("failed to set core clock rate %llu\n", clk_rate); return ret; } @@ -529,13 +519,13 @@ void dpu_core_perf_destroy(struct dpu_core_perf *perf) int dpu_core_perf_init(struct dpu_core_perf *perf, struct drm_device *dev, struct dpu_mdss_cfg *catalog, - struct dss_clk *core_clk) + struct clk *core_clk) { perf->dev = dev; perf->catalog = catalog; perf->core_clk = core_clk; - perf->max_core_clk_rate = core_clk->max_rate; + perf->max_core_clk_rate = clk_get_rate(core_clk); if (!perf->max_core_clk_rate) { DPU_DEBUG("optional max core clk rate, use default\n"); perf->max_core_clk_rate = DPU_PERF_DEFAULT_MAX_CORE_CLK_RATE; diff --git
[PATCH 0/2] drm/msm: rework clock handling
msm_dss_clk_*() functions significantly duplicate clk_bulk_* family of functions. Drop custom code and use bulk clocks directly. Dmitry Baryshkov (2): drm/msm/dpu: simplify clocks handling drm/msm/dp: rewrite dss_module_power to use bulk clock functions drivers/gpu/drm/msm/Makefile | 1 - drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 24 +--- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 6 +- drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c | 187 -- drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.h | 40 -- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 46 ++- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 26 ++-- drivers/gpu/drm/msm/dp/dp_ctrl.c | 19 ++- drivers/gpu/drm/msm/dp/dp_parser.c| 21 ++- drivers/gpu/drm/msm/dp/dp_parser.h| 17 ++- drivers/gpu/drm/msm/dp/dp_power.c | 81 ++- drivers/gpu/drm/msm/msm_drv.c | 49 +++ drivers/gpu/drm/msm/msm_drv.h | 1 + 14 files changed, 164 insertions(+), 358 deletions(-) delete mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c delete mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.h
[PATCH 2/2] drm/msm/dp: rewrite dss_module_power to use bulk clock functions
In order to simplify DP code, drop hand-coded loops over clock arrays, replacing them with clk_bulk_* functions. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/Makefile | 1 - drivers/gpu/drm/msm/dp/dp_clk_util.c | 120 --- drivers/gpu/drm/msm/dp/dp_clk_util.h | 38 - drivers/gpu/drm/msm/dp/dp_ctrl.c | 19 ++--- drivers/gpu/drm/msm/dp/dp_parser.c | 21 - drivers/gpu/drm/msm/dp/dp_parser.h | 17 +++- drivers/gpu/drm/msm/dp/dp_power.c| 81 +- 7 files changed, 83 insertions(+), 214 deletions(-) delete mode 100644 drivers/gpu/drm/msm/dp/dp_clk_util.c delete mode 100644 drivers/gpu/drm/msm/dp/dp_clk_util.h diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 6621b75e3c7b..0c1a559dd2fc 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -103,7 +103,6 @@ msm-$(CONFIG_DRM_MSM_GPU_STATE) += adreno/a6xx_gpu_state.o msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \ dp/dp_catalog.o \ - dp/dp_clk_util.o \ dp/dp_ctrl.o \ dp/dp_display.o \ dp/dp_drm.o \ diff --git a/drivers/gpu/drm/msm/dp/dp_clk_util.c b/drivers/gpu/drm/msm/dp/dp_clk_util.c deleted file mode 100644 index 44a4fc59ff31.. --- a/drivers/gpu/drm/msm/dp/dp_clk_util.c +++ /dev/null @@ -1,120 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* Copyright (c) 2012-2015, 2017-2018, The Linux Foundation. - * All rights reserved. - */ - -#include -#include -#include -#include -#include - -#include - -#include "dp_clk_util.h" - -void msm_dss_put_clk(struct dss_clk *clk_arry, int num_clk) -{ - int i; - - for (i = num_clk - 1; i >= 0; i--) { - if (clk_arry[i].clk) - clk_put(clk_arry[i].clk); - clk_arry[i].clk = NULL; - } -} - -int msm_dss_get_clk(struct device *dev, struct dss_clk *clk_arry, int num_clk) -{ - int i, rc = 0; - - for (i = 0; i < num_clk; i++) { - clk_arry[i].clk = clk_get(dev, clk_arry[i].clk_name); - rc = PTR_ERR_OR_ZERO(clk_arry[i].clk); - if (rc) { - DEV_ERR("%pS->%s: '%s' get failed. rc=%d\n", - __builtin_return_address(0), __func__, - clk_arry[i].clk_name, rc); - goto error; - } - } - - return rc; - -error: - for (i--; i >= 0; i--) { - if (clk_arry[i].clk) - clk_put(clk_arry[i].clk); - clk_arry[i].clk = NULL; - } - - return rc; -} - -int msm_dss_clk_set_rate(struct dss_clk *clk_arry, int num_clk) -{ - int i, rc = 0; - - for (i = 0; i < num_clk; i++) { - if (clk_arry[i].clk) { - if (clk_arry[i].type != DSS_CLK_AHB) { - DEV_DBG("%pS->%s: '%s' rate %ld\n", - __builtin_return_address(0), __func__, - clk_arry[i].clk_name, - clk_arry[i].rate); - rc = clk_set_rate(clk_arry[i].clk, - clk_arry[i].rate); - if (rc) { - DEV_ERR("%pS->%s: %s failed. rc=%d\n", - __builtin_return_address(0), - __func__, - clk_arry[i].clk_name, rc); - break; - } - } - } else { - DEV_ERR("%pS->%s: '%s' is not available\n", - __builtin_return_address(0), __func__, - clk_arry[i].clk_name); - rc = -EPERM; - break; - } - } - - return rc; -} - -int msm_dss_enable_clk(struct dss_clk *clk_arry, int num_clk, int enable) -{ - int i, rc = 0; - - if (enable) { - for (i = 0; i < num_clk; i++) { - DEV_DBG("%pS->%s: enable '%s'\n", - __builtin_return_address(0), __func__, - clk_arry[i].clk_name); - rc = clk_prepare_enable(clk_arry[i].clk); - if (rc) - DEV_ERR("%pS->%s: %s en fail. rc=%d\n", - __builtin_return_address(0), - __func__, - clk_arry[i].clk_name, rc); - - if (rc && i) { - msm_dss_enable_clk(_arry[i - 1], - i - 1, false); - break; - }
Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend
On 5/11/21 3:58 PM, Christian König wrote: Am 11.05.21 um 15:25 schrieb Thomas Hellström: Most logical place to introduce TTM buffer objects is as an i915 gem object backend. We need to add some ops to account for added functionality like delayed delete and LRU list manipulation. Initially we support only LMEM and SYSTEM memory, but SYSTEM (which in this case means evicted LMEM objects) is not visible to i915 GEM yet. The plan is to move the i915 gem system region over to the TTM system memory type in upcoming patches. We set up GPU bindings directly both from LMEM and from the system region, as there is no need to use the legacy TTM_TT memory type. We reserve that for future porting of GGTT bindings to TTM. There are some changes to TTM to allow for purging system memory buffer objects and to refuse swapping of some objects: Unfortunately i915 gem still relies heavily on short-term object pinning, and we've chosen to keep short-term-pinned buffer objects on the TTM LRU lists for now, meaning that we need some sort of mechanism to tell TTM they are not swappable. A longer term goal is to get rid of the short-term pinning. Well just use the eviction_valuable interface for this. Yes, we do that for vram/lmem eviction, but we have nothing similar for system swapping. Do I understand you correctly that you want me to add a call to eviction_valuable() also for that instead of swap_possible()? In general please make separate patches for the TTM changes and for the i915 changes using them for easier review. I'll respin with a split. Do you want me to do the same also for the other two patches that minmally touch TTM? Thanks, Thomas
Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend
Am 11.05.21 um 15:25 schrieb Thomas Hellström: Most logical place to introduce TTM buffer objects is as an i915 gem object backend. We need to add some ops to account for added functionality like delayed delete and LRU list manipulation. Initially we support only LMEM and SYSTEM memory, but SYSTEM (which in this case means evicted LMEM objects) is not visible to i915 GEM yet. The plan is to move the i915 gem system region over to the TTM system memory type in upcoming patches. We set up GPU bindings directly both from LMEM and from the system region, as there is no need to use the legacy TTM_TT memory type. We reserve that for future porting of GGTT bindings to TTM. There are some changes to TTM to allow for purging system memory buffer objects and to refuse swapping of some objects: Unfortunately i915 gem still relies heavily on short-term object pinning, and we've chosen to keep short-term-pinned buffer objects on the TTM LRU lists for now, meaning that we need some sort of mechanism to tell TTM they are not swappable. A longer term goal is to get rid of the short-term pinning. Well just use the eviction_valuable interface for this. In general please make separate patches for the TTM changes and for the i915 changes using them for easier review. Christian. Remove the old lmem backend. Cc: Christian König Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 83 --- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 - drivers/gpu/drm/i915/gem/i915_gem_object.c| 126 +++-- drivers/gpu/drm/i915/gem/i915_gem_object.h| 9 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 18 + drivers/gpu/drm/i915/gem/i915_gem_region.c| 6 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 534 ++ drivers/gpu/drm/i915/gem/i915_gem_ttm.h | 48 ++ drivers/gpu/drm/i915/gt/intel_region_lmem.c | 3 +- drivers/gpu/drm/i915/i915_gem.c | 5 +- drivers/gpu/drm/i915/intel_memory_region.c| 1 - drivers/gpu/drm/i915/intel_memory_region.h| 1 - drivers/gpu/drm/i915/intel_region_ttm.c | 5 +- drivers/gpu/drm/i915/intel_region_ttm.h | 7 +- drivers/gpu/drm/ttm/ttm_bo.c | 12 + include/drm/ttm/ttm_device.h | 9 + 17 files changed, 733 insertions(+), 140 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 958ccc1edfed..ef0d884a9e2d 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -155,6 +155,7 @@ gem-y += \ gem/i915_gem_stolen.o \ gem/i915_gem_throttle.o \ gem/i915_gem_tiling.o \ + gem/i915_gem_ttm.o \ gem/i915_gem_ttm_bo_util.o \ gem/i915_gem_userptr.o \ gem/i915_gem_wait.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index f42803ea48f2..2b8cd15de1d9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -4,73 +4,10 @@ */ #include "intel_memory_region.h" -#include "intel_region_ttm.h" #include "gem/i915_gem_region.h" #include "gem/i915_gem_lmem.h" #include "i915_drv.h" -static void lmem_put_pages(struct drm_i915_gem_object *obj, - struct sg_table *pages) -{ - intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node); - obj->mm.dirty = false; - sg_free_table(pages); - kfree(pages); -} - -static int lmem_get_pages(struct drm_i915_gem_object *obj) -{ - unsigned int flags; - struct sg_table *pages; - - flags = I915_ALLOC_MIN_PAGE_SIZE; - if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) - flags |= I915_ALLOC_CONTIGUOUS; - - obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region, -obj->base.size, -flags); - if (IS_ERR(obj->mm.st_mm_node)) - return PTR_ERR(obj->mm.st_mm_node); - - /* Range manager is always contigous */ - if (obj->mm.region->is_range_manager) - obj->flags |= I915_BO_ALLOC_CONTIGUOUS; - pages = intel_region_ttm_node_to_st(obj->mm.region, obj->mm.st_mm_node); - if (IS_ERR(pages)) - return PTR_ERR(pages); - - __i915_gem_object_set_pages(obj, pages, - i915_sg_dma_page_sizes(pages->sgl)); - - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) { - void __iomem *vaddr = - i915_gem_object_lmem_io_map(obj, 0, obj->base.size); - - if (!vaddr) { - struct sg_table *pages = -
Re: [PATCH 0/2] drm/qxl: two one-liner fixes.
Am 11.05.21 um 12:45 schrieb Gerd Hoffmann: Gerd Hoffmann (2): drm/qxl: drop redundant code drm/qxl: balance dumb_shadow_bo pin drivers/gpu/drm/qxl/qxl_display.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Acked-by: Thomas Zimmermann -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer OpenPGP_signature Description: OpenPGP digital signature
[PATCH v2 1/1] drm/mediatek: Remove redundant error printing
When devm_ioremap_resource() fails, a clear enough error message will be printed by its subfunction __devm_ioremap_resource(). The error information contains the device name, failure cause, and possibly resource information. Therefore, remove the error printing here to simplify code and reduce the binary size. Reported-by: Hulk Robot Signed-off-by: Zhen Lei --- drivers/gpu/drm/mediatek/mtk_cec.c| 7 ++- drivers/gpu/drm/mediatek/mtk_disp_ccorr.c | 4 +--- drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 4 +--- drivers/gpu/drm/mediatek/mtk_disp_rdma.c | 4 +--- drivers/gpu/drm/mediatek/mtk_dpi.c| 7 ++- drivers/gpu/drm/mediatek/mtk_dsi.c| 1 - 6 files changed, 7 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/mediatek/mtk_cec.c b/drivers/gpu/drm/mediatek/mtk_cec.c index e9cef5c0c8f7eff..c47b54936cfa6b8 100644 --- a/drivers/gpu/drm/mediatek/mtk_cec.c +++ b/drivers/gpu/drm/mediatek/mtk_cec.c @@ -195,11 +195,8 @@ static int mtk_cec_probe(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); cec->regs = devm_ioremap_resource(dev, res); - if (IS_ERR(cec->regs)) { - ret = PTR_ERR(cec->regs); - dev_err(dev, "Failed to ioremap cec: %d\n", ret); - return ret; - } + if (IS_ERR(cec->regs)) + return PTR_ERR(cec->regs); cec->clk = devm_clk_get(dev, NULL); if (IS_ERR(cec->clk)) { diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c b/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c index 141cb36b9c07b74..2b9923e5c6382f7 100644 --- a/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c +++ b/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c @@ -173,10 +173,8 @@ static int mtk_disp_ccorr_probe(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); priv->regs = devm_ioremap_resource(dev, res); - if (IS_ERR(priv->regs)) { - dev_err(dev, "failed to ioremap ccorr\n"); + if (IS_ERR(priv->regs)) return PTR_ERR(priv->regs); - } #if IS_REACHABLE(CONFIG_MTK_CMDQ) ret = cmdq_dev_get_client_reg(dev, >cmdq_reg, 0); diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c index 961f87f8d4d156f..48927135c247537 100644 --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c @@ -395,10 +395,8 @@ static int mtk_disp_ovl_probe(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); priv->regs = devm_ioremap_resource(dev, res); - if (IS_ERR(priv->regs)) { - dev_err(dev, "failed to ioremap ovl\n"); + if (IS_ERR(priv->regs)) return PTR_ERR(priv->regs); - } #if IS_REACHABLE(CONFIG_MTK_CMDQ) ret = cmdq_dev_get_client_reg(dev, >cmdq_reg, 0); if (ret) diff --git a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c index 728aaadfea8cfcc..e8d31b4c12b7727 100644 --- a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c +++ b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c @@ -294,10 +294,8 @@ static int mtk_disp_rdma_probe(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); priv->regs = devm_ioremap_resource(dev, res); - if (IS_ERR(priv->regs)) { - dev_err(dev, "failed to ioremap rdma\n"); + if (IS_ERR(priv->regs)) return PTR_ERR(priv->regs); - } #if IS_REACHABLE(CONFIG_MTK_CMDQ) ret = cmdq_dev_get_client_reg(dev, >cmdq_reg, 0); if (ret) diff --git a/drivers/gpu/drm/mediatek/mtk_dpi.c b/drivers/gpu/drm/mediatek/mtk_dpi.c index bea91c81626e154..f8020bc046cb63f 100644 --- a/drivers/gpu/drm/mediatek/mtk_dpi.c +++ b/drivers/gpu/drm/mediatek/mtk_dpi.c @@ -741,11 +741,8 @@ static int mtk_dpi_probe(struct platform_device *pdev) } mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); dpi->regs = devm_ioremap_resource(dev, mem); - if (IS_ERR(dpi->regs)) { - ret = PTR_ERR(dpi->regs); - dev_err(dev, "Failed to ioremap mem resource: %d\n", ret); - return ret; - } + if (IS_ERR(dpi->regs)) + return PTR_ERR(dpi->regs); dpi->engine_clk = devm_clk_get(dev, "engine"); if (IS_ERR(dpi->engine_clk)) { diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c b/drivers/gpu/drm/mediatek/mtk_dsi.c index ae403c67cbd922d..89e351dfab88177 100644 --- a/drivers/gpu/drm/mediatek/mtk_dsi.c +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c @@ -1062,7 +1062,6 @@ static int mtk_dsi_probe(struct platform_device *pdev) dsi->regs = devm_ioremap_resource(dev, regs); if (IS_ERR(dsi->regs)) { ret = PTR_ERR(dsi->regs); - dev_err(dev, "Failed to ioremap memory: %d\n", ret); goto err_unregister_host; } -- 2.26.0.106.g9fadedd
[PATCH v2 0/1] drm/mediatek: Remove redundant error printing
v1 --> v2: 1. Combine the modifications of several drm/mediatek files into one patch. 2. According to Baruch Siach's review comment, simplify the following code snippets: -ret = PTR_ERR(cec->regs); -return ret; +return PTR_ERR(cec->regs); Zhen Lei (1): drm/mediatek: Remove redundant error printing drivers/gpu/drm/mediatek/mtk_cec.c| 7 ++- drivers/gpu/drm/mediatek/mtk_disp_ccorr.c | 4 +--- drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 4 +--- drivers/gpu/drm/mediatek/mtk_disp_rdma.c | 4 +--- drivers/gpu/drm/mediatek/mtk_dpi.c| 7 ++- drivers/gpu/drm/mediatek/mtk_dsi.c| 1 - 6 files changed, 7 insertions(+), 20 deletions(-) -- 2.26.0.106.g9fadedd
Re: [PATCH] component: Move host device to end of device lists on binding
On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki wrote: > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd wrote: > > [cut] > > > > > > > > > > I will try it, but then I wonder about things like system wide > > > > suspend/resume too. The drm encoder chain would need to reimplement the > > > > logic for system wide suspend/resume so that any PM ops attached to the > > > > msm device run in the correct order. Right now the bridge PM ops will > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run. > > > > After this change, the msm PM ops will run, the bridge PM ops will run, > > > > and then the i2c bus PM ops will run. It feels like that could be a > > > > problem if we're suspending the DSI encoder while the bridge is still > > > > active. > > > > > > Yup suspend/resume has the exact same problem as shutdown. > > > > I think suspend/resume has the exact opposite problem. At least I think > > the correct order is to suspend the bridge, then the encoder, i.e. DSI, > > like is happening today. It looks like drm_atomic_helper_shutdown() > > operates from the top down when we want bottom up? I admit I have no > > idea what is supposed to happen here. > > Why would the system-wide suspend ordering be different from the > shutdown ordering? At least my point was that both shutdown and suspend/resume have the same problem, and the righ fix is (I think at least) to add these hooks to the component.c aggregate ops structure. Hence just adding new callbacks for shutdown will be an incomplete solution. I don't feel like changing the global device order is the right approach, since essentially that's what component was meant to fix. Except it's incomplete since it only provides a solution for bind/unbind and not for shutdown or suspend/resume as other global state changes. I think some drivers "fixed" this by putting stuff like drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to make sure that everything is ready with that trick. But that doesn't compose very well :-/ -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[PATCH 7/7] drm/i915/lmem: Verify checks for lmem residency
Since objects can be migrated or evicted when not pinned or locked, update the checks for lmem residency or future residency so that the value returned is not immediately stale. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/display/intel_display.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 42 +++- drivers/gpu/drm/i915/gem/i915_gem_object.c | 29 ++ drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 ++ 4 files changed, 75 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index de1f13d203b5..b95def2d5af3 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -11615,7 +11615,7 @@ intel_user_framebuffer_create(struct drm_device *dev, /* object is backed with LMEM for discrete */ i915 = to_i915(obj->base.dev); - if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) { + if (HAS_LMEM(i915) && !i915_gem_object_validates_to_lmem(obj)) { /* object is "remote", not in local memory */ i915_gem_object_put(obj); return ERR_PTR(-EREMOTE); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 2b8cd15de1d9..d539dffa1554 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -23,10 +23,50 @@ i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, return io_mapping_map_wc(>mm.region->iomap, offset, size); } +/** + * i915_gem_object_validates_to_lmem - Whether the object is resident in + * lmem when pages are present. + * @obj: The object to check. + * + * Migratable objects residency may change from under us if the object is + * not pinned or locked. This function is intended to be used to check whether + * the object can only reside in lmem when pages are present. + * + * Return: Whether the object is always resident in lmem when pages are + * present. + */ +bool i915_gem_object_validates_to_lmem(struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *mr = READ_ONCE(obj->mm.region); + + return !i915_gem_object_migratable(obj) && + mr && (mr->type == INTEL_MEMORY_LOCAL || + mr->type == INTEL_MEMORY_STOLEN_LOCAL); +} + +/** + * i915_gem_object_is_lmem - Whether the object is resident in + * lmem + * @obj: The object to check. + * + * Even if an object is allowed to migrate and change memory region, + * this function checks whether it will always be present in lmem when + * valid *or* if that's not the case, whether it's currently resident in lmem. + * For migratable and evictable objects, the latter only makes sense when + * the object is locked. + * + * Return: Whether the object migratable but resident in lmem, or not + * migratable and will be present in lmem when valid. + */ bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { - struct intel_memory_region *mr = obj->mm.region; + struct intel_memory_region *mr = READ_ONCE(obj->mm.region); +#ifdef CONFIG_LOCKDEP + if (i915_gem_object_migratable(obj) && + i915_gem_object_evictable(obj)) + assert_object_held(obj); +#endif return mr && (mr->type == INTEL_MEMORY_LOCAL || mr->type == INTEL_MEMORY_STOLEN_LOCAL); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index c53488f391dd..0475b1c94454 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -458,6 +458,35 @@ bool i915_gem_object_evictable(struct drm_i915_gem_object *obj) return pin_count == 0; } +/** + * i915_gem_object_migratable - Whether the object is migratable out of the + * current region. + * @obj: Pointer to the object. + * + * Return: Whether the object is allowed to be resident in other + * regions than the current while pages are present. + */ +bool i915_gem_object_migratable(struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *mr = READ_ONCE(obj->mm.region); + struct intel_memory_region *placement; + int i; + + if (!mr) + return false; + + if (!obj->mm.n_placements) + return false; + + for (i = 0; i < obj->mm.n_placements; ++i) { + placement = obj->mm.placements[i]; + if (placement != mr) + return true; + } + + return false; +} + void i915_gem_init__objects(struct drm_i915_private *i915) { INIT_WORK(>mm.free_work, __i915_gem_free_work); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index ae5930e307d5..a3ad8cf4eefd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -596,6 +596,10 @@ void
[PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend
Most logical place to introduce TTM buffer objects is as an i915 gem object backend. We need to add some ops to account for added functionality like delayed delete and LRU list manipulation. Initially we support only LMEM and SYSTEM memory, but SYSTEM (which in this case means evicted LMEM objects) is not visible to i915 GEM yet. The plan is to move the i915 gem system region over to the TTM system memory type in upcoming patches. We set up GPU bindings directly both from LMEM and from the system region, as there is no need to use the legacy TTM_TT memory type. We reserve that for future porting of GGTT bindings to TTM. There are some changes to TTM to allow for purging system memory buffer objects and to refuse swapping of some objects: Unfortunately i915 gem still relies heavily on short-term object pinning, and we've chosen to keep short-term-pinned buffer objects on the TTM LRU lists for now, meaning that we need some sort of mechanism to tell TTM they are not swappable. A longer term goal is to get rid of the short-term pinning. Remove the old lmem backend. Cc: Christian König Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 83 --- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 - drivers/gpu/drm/i915/gem/i915_gem_object.c| 126 +++-- drivers/gpu/drm/i915/gem/i915_gem_object.h| 9 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 18 + drivers/gpu/drm/i915/gem/i915_gem_region.c| 6 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 534 ++ drivers/gpu/drm/i915/gem/i915_gem_ttm.h | 48 ++ drivers/gpu/drm/i915/gt/intel_region_lmem.c | 3 +- drivers/gpu/drm/i915/i915_gem.c | 5 +- drivers/gpu/drm/i915/intel_memory_region.c| 1 - drivers/gpu/drm/i915/intel_memory_region.h| 1 - drivers/gpu/drm/i915/intel_region_ttm.c | 5 +- drivers/gpu/drm/i915/intel_region_ttm.h | 7 +- drivers/gpu/drm/ttm/ttm_bo.c | 12 + include/drm/ttm/ttm_device.h | 9 + 17 files changed, 733 insertions(+), 140 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 958ccc1edfed..ef0d884a9e2d 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -155,6 +155,7 @@ gem-y += \ gem/i915_gem_stolen.o \ gem/i915_gem_throttle.o \ gem/i915_gem_tiling.o \ + gem/i915_gem_ttm.o \ gem/i915_gem_ttm_bo_util.o \ gem/i915_gem_userptr.o \ gem/i915_gem_wait.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index f42803ea48f2..2b8cd15de1d9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -4,73 +4,10 @@ */ #include "intel_memory_region.h" -#include "intel_region_ttm.h" #include "gem/i915_gem_region.h" #include "gem/i915_gem_lmem.h" #include "i915_drv.h" -static void lmem_put_pages(struct drm_i915_gem_object *obj, - struct sg_table *pages) -{ - intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node); - obj->mm.dirty = false; - sg_free_table(pages); - kfree(pages); -} - -static int lmem_get_pages(struct drm_i915_gem_object *obj) -{ - unsigned int flags; - struct sg_table *pages; - - flags = I915_ALLOC_MIN_PAGE_SIZE; - if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) - flags |= I915_ALLOC_CONTIGUOUS; - - obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region, -obj->base.size, -flags); - if (IS_ERR(obj->mm.st_mm_node)) - return PTR_ERR(obj->mm.st_mm_node); - - /* Range manager is always contigous */ - if (obj->mm.region->is_range_manager) - obj->flags |= I915_BO_ALLOC_CONTIGUOUS; - pages = intel_region_ttm_node_to_st(obj->mm.region, obj->mm.st_mm_node); - if (IS_ERR(pages)) - return PTR_ERR(pages); - - __i915_gem_object_set_pages(obj, pages, - i915_sg_dma_page_sizes(pages->sgl)); - - if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) { - void __iomem *vaddr = - i915_gem_object_lmem_io_map(obj, 0, obj->base.size); - - if (!vaddr) { - struct sg_table *pages = - __i915_gem_object_unset_pages(obj); - - if (!IS_ERR_OR_NULL(pages)) - lmem_put_pages(obj, pages); - } - - memset_io(vaddr, 0, obj->base.size); - io_mapping_unmap(vaddr); - } - - return 0; -} -
[PATCH 5/7] drm/i915/ttm, drm/ttm: Add a generic TTM memcpy move for page-based iomem
The internal ttm_bo_util memcpy uses vmap functionality, and while it probably might be possible to use it for copying in- and out of sglist represented io memory, using io_mem_reserve() / io_mem_free() callbacks, that would cause problems with fault(). Instead, implement a method mapping page-by-page using kmap_local() semantics. As an additional benefit we then avoid the occasional global TLB flushes of vmap() and consuming vmap space, elimination of a critical point of failure and with a slight change of semantics we could also push the memcpy out async for testing and async driver develpment purposes. Pushing out async can be done since there is no memory allocation going on that could violate the dma_fence lockdep rules. Note that drivers that don't want to use struct io_mapping but relies on memremap functionality, and that don't want to use scatterlists for VRAM may well define specialized (hopefully reusable) iterators for their particular environment. Cc: Christian König Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c | 155 ++ .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h | 141 drivers/gpu/drm/ttm/ttm_bo.c | 1 + 4 files changed, 298 insertions(+) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index cb8823570996..958ccc1edfed 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -155,6 +155,7 @@ gem-y += \ gem/i915_gem_stolen.o \ gem/i915_gem_throttle.o \ gem/i915_gem_tiling.o \ + gem/i915_gem_ttm_bo_util.o \ gem/i915_gem_userptr.o \ gem/i915_gem_wait.o \ gem/i915_gemfs.o diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c new file mode 100644 index ..1116d7df1461 --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +/** + * DOC: Usage and intentions. + * + * This file contains functionality that we might want to move into + * ttm_bo_util.c if there is a common interest. + * Currently a kmap_local only memcpy with support for page-based iomem regions, + * and fast memcpy from write-combined memory. + */ + +#include +#include +#include +#include + +#include "i915_memcpy.h" + +#include "gem/i915_gem_ttm_bo_util.h" + +static void i915_ttm_kmap_iter_tt_kmap_local(struct i915_ttm_kmap_iter *iter, +struct dma_buf_map *dmap, +pgoff_t i) +{ + struct i915_ttm_kmap_iter_tt *iter_tt = + container_of(iter, typeof(*iter_tt), base); + + dma_buf_map_set_vaddr(dmap, kmap_local_page(iter_tt->tt->pages[i])); +} + +static void i915_ttm_kmap_iter_iomap_kmap_local(struct i915_ttm_kmap_iter *iter, + struct dma_buf_map *dmap, + pgoff_t i) +{ + struct i915_ttm_kmap_iter_iomap *iter_io = + container_of(iter, typeof(*iter_io), base); + void __iomem *addr; + +retry: + while (i >= iter_io->cache.end) { + iter_io->cache.sg = iter_io->cache.sg ? + sg_next(iter_io->cache.sg) : iter_io->st->sgl; + iter_io->cache.i = iter_io->cache.end; + iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >> + PAGE_SHIFT; + iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) - + iter_io->start; + } + + if (i < iter_io->cache.i) { + iter_io->cache.end = 0; + iter_io->cache.sg = NULL; + goto retry; + } + + addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs + + (((resource_size_t)i - iter_io->cache.i) + << PAGE_SHIFT)); + dma_buf_map_set_vaddr_iomem(dmap, addr); +} + +struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_tt_ops = { + .kmap_local = i915_ttm_kmap_iter_tt_kmap_local +}; + +struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_io_ops = { + .kmap_local = i915_ttm_kmap_iter_iomap_kmap_local +}; + +static void kunmap_local_dma_buf_map(struct dma_buf_map *map) +{ + if (map->is_iomem) + io_mapping_unmap_local(map->vaddr_iomem); + else + kunmap_local(map->vaddr); +} + +/** + * i915_ttm_move_memcpy - Helper to perform a memcpy ttm move operation. + * @bo: The struct ttm_buffer_object. + * @new_mem: The struct ttm_resource we're moving to (copy destination). + * @new_kmap: A struct i915_ttm_kmap_iter
[PATCH 3/7] drm/i915/ttm, drm/ttm: Initialize the ttm device and memory managers.
Temporarily remove the buddy allocator and related selftests and hook up the TTM range manager for i915 regions. In order to support some of the mock region-related selftests, we need to be able to initialize the TTM range-manager standalone without a struct ttm_device. Add two functions to allow that to the TTM api. Finally modify the mock region selftests somewhat to account for a fragmenting manager. Cc: Christian König Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 2 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 58 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 6 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 3 +- drivers/gpu/drm/i915/gem/i915_gem_region.c| 120 --- drivers/gpu/drm/i915/gem/i915_gem_region.h| 4 - drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 10 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.h| 9 +- drivers/gpu/drm/i915/gt/intel_gt.c| 2 - drivers/gpu/drm/i915/gt/intel_region_lmem.c | 27 +- drivers/gpu/drm/i915/i915_buddy.c | 435 -- drivers/gpu/drm/i915/i915_buddy.h | 131 --- drivers/gpu/drm/i915/i915_drv.c | 8 + drivers/gpu/drm/i915/i915_drv.h | 7 +- drivers/gpu/drm/i915/i915_gem.c | 1 + drivers/gpu/drm/i915/i915_globals.c | 1 - drivers/gpu/drm/i915/i915_globals.h | 1 - drivers/gpu/drm/i915/i915_scatterlist.c | 70 ++ drivers/gpu/drm/i915/i915_scatterlist.h | 35 + drivers/gpu/drm/i915/intel_memory_region.c| 180 ++-- drivers/gpu/drm/i915/intel_memory_region.h| 44 +- drivers/gpu/drm/i915/intel_region_ttm.c | 246 ++ drivers/gpu/drm/i915/intel_region_ttm.h | 29 + drivers/gpu/drm/i915/selftests/i915_buddy.c | 789 -- .../drm/i915/selftests/i915_mock_selftests.h | 1 - .../drm/i915/selftests/intel_memory_region.c | 133 +-- drivers/gpu/drm/i915/selftests/mock_region.c | 51 +- drivers/gpu/drm/ttm/ttm_range_manager.c | 55 +- include/drm/ttm/ttm_bo_driver.h | 23 + 31 files changed, 715 insertions(+), 1771 deletions(-) delete mode 100644 drivers/gpu/drm/i915/i915_buddy.c delete mode 100644 drivers/gpu/drm/i915/i915_buddy.h create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.c create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.h delete mode 100644 drivers/gpu/drm/i915/selftests/i915_buddy.c diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 1e1cb245fca7..b63d374dff23 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -26,6 +26,7 @@ config DRM_I915 select SND_HDA_I915 if SND_HDA_CORE select CEC_CORE if CEC_NOTIFIER select VMAP_PFN + select DRM_TTM help Choose this option if you have a system that has "Intel Graphics Media Accelerator" or "HD Graphics" integrated graphics, diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index d0d936d9137b..cb8823570996 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -50,6 +50,7 @@ i915-y += i915_drv.o \ intel_memory_region.o \ intel_pch.o \ intel_pm.o \ + intel_region_ttm.o \ intel_runtime_pm.o \ intel_sideband.o \ intel_step.o \ @@ -160,7 +161,6 @@ gem-y += \ i915-y += \ $(gem-y) \ i915_active.o \ - i915_buddy.o \ i915_cmd_parser.o \ i915_gem_evict.o \ i915_gem_gtt.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index f44bdd08f7cb..f42803ea48f2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -4,16 +4,70 @@ */ #include "intel_memory_region.h" +#include "intel_region_ttm.h" #include "gem/i915_gem_region.h" #include "gem/i915_gem_lmem.h" #include "i915_drv.h" +static void lmem_put_pages(struct drm_i915_gem_object *obj, + struct sg_table *pages) +{ + intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node); + obj->mm.dirty = false; + sg_free_table(pages); + kfree(pages); +} + +static int lmem_get_pages(struct drm_i915_gem_object *obj) +{ + unsigned int flags; + struct sg_table *pages; + + flags = I915_ALLOC_MIN_PAGE_SIZE; + if (obj->flags & I915_BO_ALLOC_CONTIGUOUS) + flags |= I915_ALLOC_CONTIGUOUS; + + obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region, +obj->base.size, +flags); + if (IS_ERR(obj->mm.st_mm_node)) + return PTR_ERR(obj->mm.st_mm_node); + + /* Range manager
[PATCH 4/7] drm/i915/ttm: Embed a ttm buffer object in the i915 gem object
Embed a struct ttm_buffer_object into the i915 gem object, making sure we alias the gem object part. It's a bit unfortunate that the struct ttm_buffer_ojbect embeds a gem object since we otherwise could make the TTM part private to the TTM backend, and use the usual i915 gem object for the other backends. To make this a bit more storage efficient for the other backends, we'd have to use a pointer for the gem object which would require a lot of changes in the driver. We postpone that for later. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 7 +++ drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 12 +++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index abadf0994ad0..c8953e3f5c70 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -62,6 +62,13 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops, struct lock_class_key *key, unsigned flags) { + /* +* A gem object is embedded both in a struct ttm_buffer_object :/ and +* in a drm_i915_gem_object. Make sure they are aliased. +*/ + BUILD_BUG_ON(offsetof(typeof(*obj), base) != +offsetof(typeof(*obj), __do_not_access.base)); + spin_lock_init(>vma.lock); INIT_LIST_HEAD(>vma.list); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index dbd7fffe956e..98f69d8fd37d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -10,6 +10,7 @@ #include #include +#include #include #include "i915_active.h" @@ -99,7 +100,16 @@ struct i915_gem_object_page_iter { }; struct drm_i915_gem_object { - struct drm_gem_object base; + /* +* We might have reason to revisit the below since it wastes +* a lot of space for non-ttm gem objects. +* In any case, always use the accessors for the ttm_buffer_object +* when accessing it. +*/ + union { + struct drm_gem_object base; + struct ttm_buffer_object __do_not_access; + }; const struct drm_i915_gem_object_ops *ops; -- 2.30.2
[PATCH 1/7] drm/i915: Untangle the vma pages_mutex
From: Thomas Hellström Any sleeping dma_resv lock taken while the vma pages_mutex is held will cause a lockdep splat. Move the i915_gem_object_pin_pages() call out of the pages_mutex critical section. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/i915_vma.c | 33 +++-- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index a6cd0fa62847..7b1c0f4e60d7 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -800,32 +800,37 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned int flags) static int vma_get_pages(struct i915_vma *vma) { int err = 0; + bool pinned_pages = false; if (atomic_add_unless(>pages_count, 1, 0)) return 0; + if (vma->obj) { + err = i915_gem_object_pin_pages(vma->obj); + if (err) + return err; + pinned_pages = true; + } + /* Allocations ahoy! */ - if (mutex_lock_interruptible(>pages_mutex)) - return -EINTR; + if (mutex_lock_interruptible(>pages_mutex)) { + err = -EINTR; + goto unpin; + } if (!atomic_read(>pages_count)) { - if (vma->obj) { - err = i915_gem_object_pin_pages(vma->obj); - if (err) - goto unlock; - } - err = vma->ops->set_pages(vma); - if (err) { - if (vma->obj) - i915_gem_object_unpin_pages(vma->obj); + if (err) goto unlock; - } + pinned_pages = false; } atomic_inc(>pages_count); unlock: mutex_unlock(>pages_mutex); +unpin: + if (pinned_pages) + __i915_gem_object_unpin_pages(vma->obj); return err; } @@ -838,10 +843,10 @@ static void __vma_put_pages(struct i915_vma *vma, unsigned int count) if (atomic_sub_return(count, >pages_count) == 0) { vma->ops->clear_pages(vma); GEM_BUG_ON(vma->pages); - if (vma->obj) - i915_gem_object_unpin_pages(vma->obj); } mutex_unlock(>pages_mutex); + if (vma->obj) + i915_gem_object_unpin_pages(vma->obj); } static void vma_put_pages(struct i915_vma *vma) -- 2.30.2
[PATCH 2/7] drm/i915: Don't free shared locks while shared
We are currently sharing the VM reservation locks across a number of gem objects with page-table memory. Since TTM will individiualize the reservation locks when freeing objects, including accessing the shared locks, make sure that the shared locks are not freed until that is done. For PPGTT we add an additional refcount, for GGTT we flush the object freeing workqueue before freeing the shared lock. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_object.c| 3 ++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 + drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 -- drivers/gpu/drm/i915/gt/intel_gtt.c | 45 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 29 +++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +- 6 files changed, 80 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 28144410df86..abadf0994ad0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -252,6 +252,9 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, if (obj->mm.n_placements > 1) kfree(obj->mm.placements); + if (obj->resv_shared_from) + i915_vm_resv_put(obj->resv_shared_from); + /* But keep the pointer alive for RCU-protected lookups */ call_rcu(>rcu, __i915_gem_free_object_rcu); cond_resched(); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 0727d0c76aa0..450340a73186 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -149,6 +149,7 @@ struct drm_i915_gem_object { * when i915_gem_ww_ctx_backoff() or i915_gem_ww_ctx_fini() are called. */ struct list_head obj_link; + struct dma_resv *resv_shared_from; union { struct rcu_head rcu; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 35069ca5d7de..128d781e429f 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -746,7 +746,13 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt) mutex_unlock(>vm.mutex); i915_address_space_fini(>vm); - dma_resv_fini(>vm.resv); + /* +* Make sure our pagetable gem objects have been freed, +* so that nobody shares our reservation object anymore. +*/ + i915_gem_flush_free_objects(ggtt->vm.i915); + GEM_WARN_ON(kref_read(>vm.resv_ref) != 1); + dma_resv_fini(>vm._resv); arch_phys_wc_del(ggtt->mtrr); @@ -829,6 +835,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) return -ENOMEM; } + kref_init(>vm.resv_ref); ret = setup_scratch_page(>vm); if (ret) { drm_err(>drm, "Scratch setup failed\n"); @@ -1135,7 +1142,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt) ggtt->vm.gt = gt; ggtt->vm.i915 = i915; ggtt->vm.dma = i915->drm.dev; - dma_resv_init(>vm.resv); + dma_resv_init(>vm._resv); if (INTEL_GEN(i915) <= 5) ret = i915_gmch_probe(ggtt); @@ -1144,7 +1151,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt) else ret = gen8_gmch_probe(ggtt); if (ret) { - dma_resv_fini(>vm.resv); + dma_resv_fini(>vm._resv); return ret; } diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 9b98f9d9faa3..695b22b17644 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -22,8 +22,11 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz) * object underneath, with the idea that one object_lock() will lock * them all at once. */ - if (!IS_ERR(obj)) - obj->base.resv = >resv; + if (!IS_ERR(obj)) { + obj->base.resv = i915_vm_resv_get(vm); + obj->resv_shared_from = obj->base.resv; + } + return obj; } @@ -40,8 +43,11 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) * object underneath, with the idea that one object_lock() will lock * them all at once. */ - if (!IS_ERR(obj)) - obj->base.resv = >resv; + if (!IS_ERR(obj)) { + obj->base.resv = i915_vm_resv_get(vm); + obj->resv_shared_from = obj->base.resv; + } + return obj; } @@ -102,7 +108,7 @@ void __i915_vm_close(struct i915_address_space *vm) int i915_vm_lock_objects(struct i915_address_space *vm,
[PATCH 0/7] drm/i915: Move LMEM (VRAM) management over to TTM
This is an initial patch series to move discrete memory management over to TTM. It will be followed up shortly with adding more functionality. The buddy allocator is temporarily removed along with its selftests and It is replaced with the TTM range manager and some selftests are adjusted to account for introduced fragmentation. Work is ongoing to reintroduce the buddy allocator as a TTM resource manager. A new memcpy ttm move is introduced that uses kmap_local() functionality rather than vmap(). Among other things stated in the patch commit message it helps us deal with page-pased LMEM memory. It is generic enough to replace the ttm memcpy move with some additional work if so desired. On x86 it also enables prefetching reads from write-combined memory. Finally the old i915 gem object LMEM backend is replaced with a i915 gem object TTM backend and some additional i915 gem object ops are introduced to support the added functionality. Currently it is used only to support management and eviction of the LMEM region, but work is underway to extend the support to system memory. In this way we use TTM the way it was originally intended, having the GPU binding taken care of by driver code. Intention is to follow up with - System memory support - TTM CPU pagefaulting - Pipelined accelerated moves / migration Thomas Hellström (7): drm/i915: Untangle the vma pages_mutex drm/i915: Don't free shared locks while shared drm/i915/ttm, drm/ttm: Initialize the ttm device and memory managers. drm/i915/ttm: Embed a ttm buffer object in the i915 gem object drm/i915/ttm, drm/ttm: Add a generic TTM memcpy move for page-based iomem drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend drm/i915/lmem: Verify checks for lmem residency drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/Makefile | 4 +- drivers/gpu/drm/i915/display/intel_display.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 71 +- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 - drivers/gpu/drm/i915/gem/i915_gem_object.c| 161 +++- drivers/gpu/drm/i915/gem/i915_gem_object.h| 13 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 37 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 3 +- drivers/gpu/drm/i915/gem/i915_gem_region.c| 126 +-- drivers/gpu/drm/i915/gem/i915_gem_region.h| 4 - drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 10 +- drivers/gpu/drm/i915/gem/i915_gem_stolen.h| 9 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 534 drivers/gpu/drm/i915/gem/i915_gem_ttm.h | 48 ++ .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c | 155 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h | 141 drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +- drivers/gpu/drm/i915/gt/intel_gt.c| 2 - drivers/gpu/drm/i915/gt/intel_gtt.c | 45 +- drivers/gpu/drm/i915/gt/intel_gtt.h | 29 +- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 30 +- drivers/gpu/drm/i915/i915_buddy.c | 435 -- drivers/gpu/drm/i915/i915_buddy.h | 131 --- drivers/gpu/drm/i915/i915_drv.c | 8 + drivers/gpu/drm/i915/i915_drv.h | 7 +- drivers/gpu/drm/i915/i915_gem.c | 6 +- drivers/gpu/drm/i915/i915_globals.c | 1 - drivers/gpu/drm/i915/i915_globals.h | 1 - drivers/gpu/drm/i915/i915_scatterlist.c | 70 ++ drivers/gpu/drm/i915/i915_scatterlist.h | 35 + drivers/gpu/drm/i915/i915_vma.c | 33 +- drivers/gpu/drm/i915/intel_memory_region.c| 181 ++-- drivers/gpu/drm/i915/intel_memory_region.h| 45 +- drivers/gpu/drm/i915/intel_region_ttm.c | 247 ++ drivers/gpu/drm/i915/intel_region_ttm.h | 32 + drivers/gpu/drm/i915/selftests/i915_buddy.c | 789 -- .../drm/i915/selftests/i915_mock_selftests.h | 1 - .../drm/i915/selftests/intel_memory_region.c | 133 +-- drivers/gpu/drm/i915/selftests/mock_region.c | 51 +- drivers/gpu/drm/ttm/ttm_bo.c | 13 + drivers/gpu/drm/ttm/ttm_range_manager.c | 55 +- include/drm/ttm/ttm_bo_driver.h | 23 + include/drm/ttm/ttm_device.h | 9 + 46 files changed, 1876 insertions(+), 1879 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h delete mode 100644 drivers/gpu/drm/i915/i915_buddy.c delete mode 100644 drivers/gpu/drm/i915/i915_buddy.h create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.c create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.h delete mode 100644 drivers/gpu/drm/i915/selftests/i915_buddy.c -- 2.30.2