linux-next: build failure after merge of the drm-misc tree

2021-05-11 Thread Stephen Rothwell
Hi all,

After merging the drm-misc tree, today's linux-next build (powerpc
allyesconfig) failed like this:

drivers/gpu/drm/nouveau/nouveau_connector.c: In function 
'nouveau_connector_of_detect':
drivers/gpu/drm/nouveau/nouveau_connector.c:463:59: error: 'struct drm_device' 
has no member named 'pdev'; did you mean 'dev'?
  463 |  struct device_node *cn, *dn = pci_device_to_OF_node(dev->pdev);
  |   ^~~~
  |   dev

Caused by commit

  b347e04452ff ("drm: Remove pdev field from struct drm_device")

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwell


pgpkMWV4Jnc95.pgp
Description: OpenPGP digital signature


[Bug 212957] [radeon] kernel NULL pointer dereference during system boot

2021-05-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=212957

--- Comment #4 from Dennis Foster (m...@dennisfoster.us) ---
Created attachment 296723
  --> https://bugzilla.kernel.org/attachment.cgi?id=296723=edit
journalctl - bad commit

Attached is a part of the system log after checking out the bisected commit.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 212957] [radeon] kernel NULL pointer dereference during system boot

2021-05-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=212957

--- Comment #3 from Dennis Foster (m...@dennisfoster.us) ---
(In reply to Alex Deucher from comment #1)
> Can you bisect?

0575ff3d33cd62123991d2a5d0d8459d72592388 is the first bad commit
commit 0575ff3d33cd62123991d2a5d0d8459d72592388
Author: Christian König 
Date:   Thu Oct 8 13:01:35 2020 +0200

drm/radeon: stop using pages with drm_prime_sg_to_page_addr_arrays v2

This is deprecated.

v2: also use ttm_sg_tt_init to avoid allocating the page array.

Signed-off-by: Christian König 
Acked-by: Daniel Vetter 
Link: https://patchwork.freedesktop.org/patch/403832/

 drivers/gpu/drm/radeon/radeon_ttm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)


I wasn't able to revert this commit on v5.12, because there's another commit
c67e62790f5c156705fb162da840c6d89d0af6e0 where it seems like that file was
changed drastically, in particular drm_prime_sg_to_page_addr_arrays() was
replaced with drm_prime_sg_to_dma_addr_array().

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

linux-next: manual merge of the drm-intel tree with Linus' tree

2021-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the drm-intel tree got a conflict in:

  drivers/gpu/drm/i915/intel_pm.c

between commit:

  e7c6e405e171 ("Fix misc new gcc warnings")

from Linus' tree and commit:

  c6deb5e97ded ("drm/i915/pm: Make the wm parameter of print_wm_latency a 
pointer")

from the drm-intel tree.

I fixed it up (I just used the latter version) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpuMhYkQ758Y.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the amdgpu tree with the drm-misc tree

2021-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the amdgpu tree got a conflict in:

  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

between commit:

  c777dc9e7933 ("drm/ttm: move the page_alignment into the BO v2")

from the drm-misc tree and commit:

  dd03daec0ff1 ("drm/amdgpu: restructure amdgpu_vram_mgr_new")

from the amdgpu tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index f7235438535f,e2cbe19404c0..
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@@ -448,10 -391,10 +391,10 @@@ static int amdgpu_vram_mgr_new(struct t
pages_per_node = HPAGE_PMD_NR;
  #else
/* default to 2MB */
-   pages_per_node = (2UL << (20UL - PAGE_SHIFT));
+   pages_per_node = 2UL << (20UL - PAGE_SHIFT);
  #endif
-   pages_per_node = max((uint32_t)pages_per_node,
-tbo->page_alignment);
+   pages_per_node = max_t(uint32_t, pages_per_node,
 - mem->page_alignment);
++ tbo->page_alignment);
num_nodes = DIV_ROUND_UP(mem->num_pages, pages_per_node);
}
  
@@@ -469,38 -412,29 +412,29 @@@
mem->start = 0;
pages_left = mem->num_pages;
  
-   spin_lock(>lock);
-   for (i = 0; pages_left >= pages_per_node; ++i) {
-   unsigned long pages = rounddown_pow_of_two(pages_left);
+   /* Limit maximum size to 2GB due to SG table limitations */
+   pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
  
-   /* Limit maximum size to 2GB due to SG table limitations */
-   pages = min(pages, (2UL << (30 - PAGE_SHIFT)));
- 
-   r = drm_mm_insert_node_in_range(mm, [i], pages,
-   pages_per_node, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
-   break;
- 
-   vis_usage += amdgpu_vram_mgr_vis_size(adev, [i]);
-   amdgpu_vram_mgr_virt_start(mem, [i]);
-   pages_left -= pages;
-   }
- 
-   for (; pages_left; ++i) {
-   unsigned long pages = min(pages_left, pages_per_node);
+   i = 0;
+   spin_lock(>lock);
+   while (pages_left) {
 -  uint32_t alignment = mem->page_alignment;
 +  uint32_t alignment = tbo->page_alignment;
  
-   if (pages == pages_per_node)
+   if (pages >= pages_per_node)
alignment = pages_per_node;
  
-   r = drm_mm_insert_node_in_range(mm, [i],
-   pages, alignment, 0,
-   place->fpfn, lpfn,
-   mode);
-   if (unlikely(r))
+   r = drm_mm_insert_node_in_range(mm, [i], pages, alignment,
+   0, place->fpfn, lpfn, mode);
+   if (unlikely(r)) {
+   if (pages > pages_per_node) {
+   if (is_power_of_2(pages))
+   pages = pages / 2;
+   else
+   pages = rounddown_pow_of_two(pages);
+   continue;
+   }
goto error;
+   }
  
vis_usage += amdgpu_vram_mgr_vis_size(adev, [i]);
amdgpu_vram_mgr_virt_start(mem, [i]);


pgpxc9ZLjozxw.pgp
Description: OpenPGP digital signature


Re: [v3 1/2] dt-bindings: backlight: add DisplayPort aux backlight

2021-05-11 Thread Doug Anderson
Hi,

On Tue, May 11, 2021 at 11:12 AM  wrote:
>
> On 01-05-2021 03:08, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Apr 30, 2021 at 8:10 AM  wrote:
> >>
> >> On 30-04-2021 02:33, Doug Anderson wrote:
> >> > Hi,
> >> >
> >> > On Thu, Apr 29, 2021 at 11:04 AM Rob Herring  wrote:
> >> >>
> >> >> On Mon, Apr 26, 2021 at 11:29:15AM +0530, Rajeev Nandan wrote:
> >> >> > Add bindings for DisplayPort aux backlight driver.
> >> >> >
> >> >> > Changes in v2:
> >> >> > - New
> >> >> >
> >> >> > Signed-off-by: Rajeev Nandan 
> >> >> > ---
> >> >> >  .../bindings/leds/backlight/dp-aux-backlight.yaml  | 49 
> >> >> > ++
> >> >> >  1 file changed, 49 insertions(+)
> >> >> >  create mode 100644 
> >> >> > Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> >
> >> >> > diff --git 
> >> >> > a/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> >  
> >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> > new file mode 100644
> >> >> > index ..0fa8bf0
> >> >> > --- /dev/null
> >> >> > +++ 
> >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> > @@ -0,0 +1,49 @@
> >> >> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> >> >> > +%YAML 1.2
> >> >> > +---
> >> >> > +$id: 
> >> >> > http://devicetree.org/schemas/leds/backlight/dp-aux-backlight.yaml#
> >> >> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> >> >> > +
> >> >> > +title: DisplayPort aux backlight driver bindings
> >> >> > +
> >> >> > +maintainers:
> >> >> > +  - Rajeev Nandan 
> >> >> > +
> >> >> > +description:
> >> >> > +  Backlight driver to control the brightness over DisplayPort aux 
> >> >> > channel.
> >> >> > +
> >> >> > +allOf:
> >> >> > +  - $ref: common.yaml#
> >> >> > +
> >> >> > +properties:
> >> >> > +  compatible:
> >> >> > +const: dp-aux-backlight
> >> >> > +
> >> >> > +  ddc-i2c-bus:
> >> >> > +$ref: /schemas/types.yaml#/definitions/phandle
> >> >> > +description:
> >> >> > +  A phandle to the system I2C controller connected to the DDC 
> >> >> > bus used
> >> >> > +  for the DisplayPort AUX channel.
> >> >> > +
> >> >> > +  enable-gpios:
> >> >> > +maxItems: 1
> >> >> > +description: GPIO specifier for backlight enable pin.
> >> >> > +
> >> >> > +  max-brightness: true
> >> >> > +
> >> >> > +required:
> >> >> > +  - compatible
> >> >> > +  - ddc-i2c-bus
> >> >> > +
> >> >> > +additionalProperties: false
> >> >> > +
> >> >> > +examples:
> >> >> > +  - |
> >> >> > +backlight {
> >> >> > +compatible = "dp-aux-backlight";
> >> >> > +ddc-i2c-bus = <_bridge>;
> >> >> > +enable-gpios = < 12 GPIO_ACTIVE_HIGH>;
> >> >>
> >> >> So the DDC bus is connected to a backlight and also a panel? This
> >> >> binding is not reflecting the h/w, but rather what you want for some
> >> >> driver.
> >> >>
> >> >> There's only one thing here and that's an eDP panel which supports
> >> >> backlight control via DP aux channel. You can figure all that out from
> >> >> the panel's compatible and/or reading the EDID.
> >> >>
> >> >> You might also be interested in this thread:
> >> >>
> >> >> https://lore.kernel.org/lkml/yiksdtjcihgnv...@orome.fritz.box/
> >> >
> >> > I think Rajeev needs to rework everything anyway as per:
> >> >
> >> > https://lore.kernel.org/r/87zgxl5qar@intel.com
> >> >
> >> > ...but you're right that it makes sense not to model the backlight as
> >> > a separate node in the device tree. The panel driver can handle
> >> > setting up the backlight.
> >> >
> >> > -Doug
> >>
> >> It was not a good idea to create a separate backlight driver and use
> >> ddc-i2c-bus to get access to DP aux. I am working to move the code
> >> to the panel driver and to utilize the new DRM helper functions
> >> (drm_edp_backlight_*) Lyude has added [1].
> >>
> >> To use these helper functions, the panel driver should have access to
> >> the
> >> "struct drm_dp_aux *". The simple-panel has a "ddc-i2c-bus" property
> >> to give the panel access to the DDC bus and is currently being used to
> >> get the EDID from the panel. Can I use the same ddc bus i2c_adapter to
> >> get
> >> the "struct drm_dp_aux *"?
> >>
> >> As per the suggestion [2], I get the "struct drm_dp_aux *" from the
> >> i2c_adapter of ddc bus (maybe I didn't understand the suggestion
> >> correctly),
> >> and, it turned out, the way I have implemented is not the right way
> >> [3].
> >> So, I am afraid to use the same method in the panel driver.
> >>
> >>
> >> [1] https://lore.kernel.org/dri-devel/871rb5bcf9@intel.com/
> >> [2] https://www.spinics.net/lists/dri-devel/msg295429.html
> >> [3]
> >> https://lore.kernel.org/dri-devel/2021042616.4lc3ekxjugjr3...@maple.lan/
> >
> > So it's definitely up to maintainers, not me. ...but I guess I would
> > have expected something like a new property called "ddc-aux-bus". Then
> > you'd have to create a new API call called something 

Re: [v3 1/2] dt-bindings: backlight: add DisplayPort aux backlight

2021-05-11 Thread Laurent Pinchart
Hi Rajeevny,

On Tue, May 11, 2021 at 11:41:57PM +0530, rajee...@codeaurora.org wrote:
> On 01-05-2021 03:08, Doug Anderson wrote:
> > On Fri, Apr 30, 2021 at 8:10 AM  wrote:
> >> On 30-04-2021 02:33, Doug Anderson wrote:
> >> > On Thu, Apr 29, 2021 at 11:04 AM Rob Herring  wrote:
> >> >> On Mon, Apr 26, 2021 at 11:29:15AM +0530, Rajeev Nandan wrote:
> >> >> > Add bindings for DisplayPort aux backlight driver.
> >> >> >
> >> >> > Changes in v2:
> >> >> > - New
> >> >> >
> >> >> > Signed-off-by: Rajeev Nandan 
> >> >> > ---
> >> >> >  .../bindings/leds/backlight/dp-aux-backlight.yaml  | 49 
> >> >> > ++
> >> >> >  1 file changed, 49 insertions(+)
> >> >> >  create mode 100644 
> >> >> > Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> >
> >> >> > diff --git 
> >> >> > a/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> >  
> >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> > new file mode 100644
> >> >> > index ..0fa8bf0
> >> >> > --- /dev/null
> >> >> > +++ 
> >> >> > b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
> >> >> > @@ -0,0 +1,49 @@
> >> >> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> >> >> > +%YAML 1.2
> >> >> > +---
> >> >> > +$id: 
> >> >> > http://devicetree.org/schemas/leds/backlight/dp-aux-backlight.yaml#
> >> >> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> >> >> > +
> >> >> > +title: DisplayPort aux backlight driver bindings
> >> >> > +
> >> >> > +maintainers:
> >> >> > +  - Rajeev Nandan 
> >> >> > +
> >> >> > +description:
> >> >> > +  Backlight driver to control the brightness over DisplayPort aux 
> >> >> > channel.
> >> >> > +
> >> >> > +allOf:
> >> >> > +  - $ref: common.yaml#
> >> >> > +
> >> >> > +properties:
> >> >> > +  compatible:
> >> >> > +const: dp-aux-backlight
> >> >> > +
> >> >> > +  ddc-i2c-bus:
> >> >> > +$ref: /schemas/types.yaml#/definitions/phandle
> >> >> > +description:
> >> >> > +  A phandle to the system I2C controller connected to the DDC 
> >> >> > bus used
> >> >> > +  for the DisplayPort AUX channel.
> >> >> > +
> >> >> > +  enable-gpios:
> >> >> > +maxItems: 1
> >> >> > +description: GPIO specifier for backlight enable pin.
> >> >> > +
> >> >> > +  max-brightness: true
> >> >> > +
> >> >> > +required:
> >> >> > +  - compatible
> >> >> > +  - ddc-i2c-bus
> >> >> > +
> >> >> > +additionalProperties: false
> >> >> > +
> >> >> > +examples:
> >> >> > +  - |
> >> >> > +backlight {
> >> >> > +compatible = "dp-aux-backlight";
> >> >> > +ddc-i2c-bus = <_bridge>;
> >> >> > +enable-gpios = < 12 GPIO_ACTIVE_HIGH>;
> >> >>
> >> >> So the DDC bus is connected to a backlight and also a panel? This
> >> >> binding is not reflecting the h/w, but rather what you want for some
> >> >> driver.
> >> >>
> >> >> There's only one thing here and that's an eDP panel which supports
> >> >> backlight control via DP aux channel. You can figure all that out from
> >> >> the panel's compatible and/or reading the EDID.
> >> >>
> >> >> You might also be interested in this thread:
> >> >>
> >> >> https://lore.kernel.org/lkml/yiksdtjcihgnv...@orome.fritz.box/
> >> >
> >> > I think Rajeev needs to rework everything anyway as per:
> >> >
> >> > https://lore.kernel.org/r/87zgxl5qar@intel.com
> >> >
> >> > ...but you're right that it makes sense not to model the backlight as
> >> > a separate node in the device tree. The panel driver can handle
> >> > setting up the backlight.
> >> 
> >> It was not a good idea to create a separate backlight driver and use
> >> ddc-i2c-bus to get access to DP aux. I am working to move the code
> >> to the panel driver and to utilize the new DRM helper functions
> >> (drm_edp_backlight_*) Lyude has added [1].
> >> 
> >> To use these helper functions, the panel driver should have access to the
> >> "struct drm_dp_aux *". The simple-panel has a "ddc-i2c-bus" property
> >> to give the panel access to the DDC bus and is currently being used to
> >> get the EDID from the panel. Can I use the same ddc bus i2c_adapter to get
> >> the "struct drm_dp_aux *"?
> >> 
> >> As per the suggestion [2], I get the "struct drm_dp_aux *" from the
> >> i2c_adapter of ddc bus (maybe I didn't understand the suggestion 
> >> correctly),
> >> and, it turned out, the way I have implemented is not the right way [3].
> >> So, I am afraid to use the same method in the panel driver.
> >> 
> >> 
> >> [1] https://lore.kernel.org/dri-devel/871rb5bcf9@intel.com/
> >> [2] https://www.spinics.net/lists/dri-devel/msg295429.html
> >> [3]
> >> https://lore.kernel.org/dri-devel/2021042616.4lc3ekxjugjr3...@maple.lan/
> > 
> > So it's definitely up to maintainers, not me. ...but I guess I would
> > have expected something like a new property called "ddc-aux-bus". Then
> > you'd have to create a new API call called something like
> > 

Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages

2021-05-11 Thread Michal Wajdeczko



On 11.05.2021 17:16, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
>> From: Michal Wajdeczko 
>>
>> New GuC firmware will unify format of MMIO and CTB H2G messages.
>> Introduce their definitions now to allow gradual transition of
>> our code to match new changes.
>>
>> Signed-off-by: Michal Wajdeczko 
>> Signed-off-by: Matthew Brost 
>> Cc: Michał Winiarski 
>> ---
>>  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++
>>  1 file changed, 226 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h 
>> b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>> index 775e21f3058c..1c264819aa03 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
>> @@ -6,6 +6,232 @@
>>  #ifndef _ABI_GUC_MESSAGES_ABI_H
>>  #define _ABI_GUC_MESSAGES_ABI_H
>>  
>> +/**
>> + * DOC: HXG Message
> 
> These aren't useful if we don't pull them in somewhere in the
> Documentation/gpu hierarchy. General comment, and also please check that
> it all renders correctly still.

Patch that connects all these DOC sections into i915.rst is still on
private branch, where I'm trying to verify all html rendering, and ...

> 
> btw if you respin a patch not originally by you we generally add a (v1) to
> the original s-o-b line (or whever the version split was) and explain in
> the usual changelog in the commit message what was changed.
> 
> This holds for the entire series ofc.
> -Daniel
> 
>> + *
>> + * All messages exchanged with GuC are defined using 32 bit dwords.
>> + * First dword is treated as a message header. Remaining dwords are 
>> optional.
>> + *
>> + * .. _HXG Message:

where such workarounds from early documentation are already removed,
since they are not needed any more starting from commit ef09989594bf
("scripts/kernel-doc: add internal hyperlink to DOC: sections")

Michal

>> + *
>> + *  
>> +---+---+--+
>> + *  |   | Bits  | Description   
>>|
>> + *  
>> +===+===+==+
>> + *  |   |   |   
>>|
>> + *  | 0 |31 | **ORIGIN** - originator of the message
>>|
>> + *  |   |   |   - _`GUC_HXG_ORIGIN_HOST` = 0
>>|
>> + *  |   |   |   - _`GUC_HXG_ORIGIN_GUC` = 1 
>>|
>> + *  |   |   |   
>>|
>> + *  |   
>> +---+--+
>> + *  |   | 30:28 | **TYPE** - message type   
>>|
>> + *  |   |   |   - _`GUC_HXG_TYPE_REQUEST` = 0   
>>|
>> + *  |   |   |   - _`GUC_HXG_TYPE_EVENT` = 1 
>>|
>> + *  |   |   |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3  
>>|
>> + *  |   |   |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5 
>>|
>> + *  |   |   |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6  
>>|
>> + *  |   |   |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7  
>>|
>> + *  |   
>> +---+--+
>> + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)   
>>|
>> + *  
>> +---+---+--+
>> + *  | 1 |  31:0 | optional payload (depends on TYPE)
>>|
>> + *  +---+---+   
>>|
>> + *  |...|   |   
>>|
>> + *  +---+---+   
>>|
>> + *  | n |  31:0 |   
>>|
>> + *  
>> +---+---+--+
>> + */
>> +
>> +#define GUC_HXG_MSG_MIN_LEN 1u
>> +#define GUC_HXG_MSG_0_ORIGIN(0x1 << 31)
>> +#define   GUC_HXG_ORIGIN_HOST   0u
>> +#define   GUC_HXG_ORIGIN_GUC1u
>> +#define GUC_HXG_MSG_0_TYPE  (0x7 << 28)
>> +#define   GUC_HXG_TYPE_REQUEST  0u
>> +#define   GUC_HXG_TYPE_EVENT1u
>> +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY 3u
>> +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY5u
>> +#define   GUC_HXG_TYPE_RESPONSE_FAILURE 6u
>> +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS 7u
>> +#define GUC_HXG_MSG_0_AUX   (0xfff << 0)
>> +
>> +/**
>> + * DOC: HXG Request
>> + *
>> + * The `HXG Request`_ message should be used to initiate 

RE: [PATCH 1/3] virtio-gpu uapi: Add VIRTIO_GPU_F_EXPLICIT_FLUSH feature

2021-05-11 Thread Kasireddy, Vivek
Hi Gerd,

> On Tue, May 11, 2021 at 01:36:08AM -0700, Vivek Kasireddy wrote:
> > This feature enables the Guest to wait until a flush has been
> > performed on a buffer it has submitted to the Host.
> 
> This needs a virtio-spec update documenting the new feature.
[Kasireddy, Vivek] Yes, I was planning to do that after getting your 
thoughts on this feature.

> 
> > +   VIRTIO_GPU_CMD_WAIT_FLUSH,
> 
> Why a new command?
> 
> If I understand it correctly you want wait until
> VIRTIO_GPU_CMD_RESOURCE_FLUSH is done.  We could
> extend the VIRTIO_GPU_CMD_RESOURCE_FLUSH command
> for that instead.
[Kasireddy, Vivek] VIRTIO_GPU_CMD_RESOURCE_FLUSH can trigger/queue a
redraw that may be performed synchronously or asynchronously depending on the
UI (Glarea is async and gtk-egl is sync but can be made async). I'd like to 
make the
Guest wait until the actual redraw happens (until GlFLush or eglSwapBuffers, 
again
depending on the UI). 

However, as part of this feature (explicit flush), I'd like to make the Guest 
wait until
the current resource (as specified by resource_flush or set_scanout) is flushed 
or
synchronized. But for a different feature I am thinking of (explicit sync), I'd 
like to
make the Guest wait for the previous buffer/resource submitted (available via 
old_state->fb).

I think it may be possible to accomplish both features by overloading 
resource_flush
but given the various combinations of Guests (Android/Chrome OS, Windows, Linux)
and Hosts (Android/Chrome OS, Linux) that are or will be supported with 
virtio-gpu +
i915, I figured adding a new command might be cleaner.

Thanks,
Vivek


> 
> take care,
>   Gerd



Re: [PATCH v2] drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected

2021-05-11 Thread Alex Deucher
On Mon, May 10, 2021 at 11:33 PM Kai-Heng Feng
 wrote:
>
> On Fri, Apr 30, 2021 at 12:57 PM Kai-Heng Feng
>  wrote:
> >
> > Screen flickers rapidly when two 4K 60Hz monitors are in use. This issue
> > doesn't happen when one monitor is 4K 60Hz (pixelclock 594MHz) and
> > another one is 4K 30Hz (pixelclock 297MHz).
> >
> > The issue is gone after setting "power_dpm_force_performance_level" to
> > "high". Following the indication, we found that the issue occurs when
> > sclk is too low.
> >
> > So resolve the issue by disabling sclk switching when there are two
> > monitors requires high pixelclock (> 297MHz).
> >
> > v2:
> >  - Only apply the fix to Oland.
> > Signed-off-by: Kai-Heng Feng 
>
> A gentle ping...

Applied.  Thanks for the reminder.

Alex


>
> > ---
> >  drivers/gpu/drm/radeon/radeon.h| 1 +
> >  drivers/gpu/drm/radeon/radeon_pm.c | 8 
> >  drivers/gpu/drm/radeon/si_dpm.c| 3 +++
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon.h 
> > b/drivers/gpu/drm/radeon/radeon.h
> > index 42281fce552e6..56ed5634cebef 100644
> > --- a/drivers/gpu/drm/radeon/radeon.h
> > +++ b/drivers/gpu/drm/radeon/radeon.h
> > @@ -1549,6 +1549,7 @@ struct radeon_dpm {
> > void*priv;
> > u32 new_active_crtcs;
> > int new_active_crtc_count;
> > +   int high_pixelclock_count;
> > u32 current_active_crtcs;
> > int current_active_crtc_count;
> > bool single_display;
> > diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
> > b/drivers/gpu/drm/radeon/radeon_pm.c
> > index 0c1950f4e146f..3861c0b98fcf3 100644
> > --- a/drivers/gpu/drm/radeon/radeon_pm.c
> > +++ b/drivers/gpu/drm/radeon/radeon_pm.c
> > @@ -1767,6 +1767,7 @@ static void radeon_pm_compute_clocks_dpm(struct 
> > radeon_device *rdev)
> > struct drm_device *ddev = rdev->ddev;
> > struct drm_crtc *crtc;
> > struct radeon_crtc *radeon_crtc;
> > +   struct radeon_connector *radeon_connector;
> >
> > if (!rdev->pm.dpm_enabled)
> > return;
> > @@ -1776,6 +1777,7 @@ static void radeon_pm_compute_clocks_dpm(struct 
> > radeon_device *rdev)
> > /* update active crtc counts */
> > rdev->pm.dpm.new_active_crtcs = 0;
> > rdev->pm.dpm.new_active_crtc_count = 0;
> > +   rdev->pm.dpm.high_pixelclock_count = 0;
> > if (rdev->num_crtc && rdev->mode_info.mode_config_initialized) {
> > list_for_each_entry(crtc,
> > >mode_config.crtc_list, head) {
> > @@ -1783,6 +1785,12 @@ static void radeon_pm_compute_clocks_dpm(struct 
> > radeon_device *rdev)
> > if (crtc->enabled) {
> > rdev->pm.dpm.new_active_crtcs |= (1 << 
> > radeon_crtc->crtc_id);
> > rdev->pm.dpm.new_active_crtc_count++;
> > +   if (!radeon_crtc->connector)
> > +   continue;
> > +
> > +   radeon_connector = 
> > to_radeon_connector(radeon_crtc->connector);
> > +   if 
> > (radeon_connector->pixelclock_for_modeset > 297000)
> > +   
> > rdev->pm.dpm.high_pixelclock_count++;
> > }
> > }
> > }
> > diff --git a/drivers/gpu/drm/radeon/si_dpm.c 
> > b/drivers/gpu/drm/radeon/si_dpm.c
> > index 9186095518047..3cc2b96a7f368 100644
> > --- a/drivers/gpu/drm/radeon/si_dpm.c
> > +++ b/drivers/gpu/drm/radeon/si_dpm.c
> > @@ -2979,6 +2979,9 @@ static void si_apply_state_adjust_rules(struct 
> > radeon_device *rdev,
> > (rdev->pdev->device == 0x6605)) {
> > max_sclk = 75000;
> > }
> > +
> > +   if (rdev->pm.dpm.high_pixelclock_count > 1)
> > +   disable_sclk_switching = true;
> > }
> >
> > if (rps->vce_active) {
> > --
> > 2.30.2
> >


Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 07:43:30PM +0200, Daniel Vetter wrote:
> On Tue, May 11, 2021 at 10:01:28AM -0700, Matthew Brost wrote:
> > On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote:
> > > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> > > > Add lrc descriptor context lookup array which can resolve the
> > > > intel_context from the lrc descriptor index. In addition to lookup, it
> > > > can determine in the lrc descriptor context is currently registered with
> > > > the GuC by checking if an entry for a descriptor index is present.
> > > > Future patches in the series will make use of this array.
> > > > 
> > > > Cc: John Harrison 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  5 +++
> > > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +--
> > > >  2 files changed, 35 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > index d84f37afb9d8..2eb6c497e43c 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > > @@ -6,6 +6,8 @@
> > > >  #ifndef _INTEL_GUC_H_
> > > >  #define _INTEL_GUC_H_
> > > >  
> > > > +#include "linux/xarray.h"
> > > > +
> > > >  #include "intel_uncore.h"
> > > >  #include "intel_guc_fw.h"
> > > >  #include "intel_guc_fwif.h"
> > > > @@ -47,6 +49,9 @@ struct intel_guc {
> > > > struct i915_vma *lrc_desc_pool;
> > > > void *lrc_desc_pool_vaddr;
> > > >  
> > > > +   /* guc_id to intel_context lookup */
> > > > +   struct xarray context_lookup;
> > > 
> > > The current code sets a disastrous example, but for stuff like this it's
> > > always good to explain the locking, and who's holding references and how
> > > you're handling cycles. Since I guess the intel_context also holds the
> > > guc_id alive somehow.
> > > 
> > 
> > I think (?) I know what you mean by this comment. How about adding:
> > 
> > 'If an entry in the the context_lookup is present, that means a context
> > associated with the guc_id is registered with the GuC. We use this xarray 
> > as a
> > lookup mechanism when the GuC communicate with the i915 about the context.'
> 
> So no idea how this works, but generally we put a "Protecte by
> " or similar in here (so you get a nice link plus something
> you can use as jump label in your ide too). Plus since intel_context has
> some lifetime rules, explaining whether you're allowed to use the pointer
> after you unlock, or whether you need to grab a reference or what exactly
> is going on. Usually there's three options:
> 
> - No refcounting, you cannot access a pointer obtained through this after
>   you unluck.
> - Weak reference, you upgrade to a full reference with
>   kref_get_unless_zero. If that fails it indicates a lookup failure, since
>   you raced with destruction. If it succeeds you can use the pointer after
>   unlock.
> - Strong reference, you get your own reference that stays valid with
>   kref_get().
> 

I think the rules for this are 'if this exists in the xarray, we have ref'.
Likewise if the GuC knows about the context we have a ref to the context. 

> I'm just bringing this up because the current i915-gem code is full of
> very tricky locking and lifetime rules, and explains roughly nothing of it
> in the data structures. Minimally some hints about the locking/lifetime
> rules of important structs should be there.
>

Agree. I'll add some comments here and to other structures this code uses.
 
> For locking rules it's good to double-down on them by adding
> lockdep_assert_held to all relevant functions (where appropriate only
> ofc).
>

Agree. I think I mostly do that in series. That being said the locking is going
to be a bit ugly until we switch to the DRM scheduler because currently multiple
processes can enter the GuC backend in parallel. With the DRM scheduler we allow
a single point of entry which simplifies things quite a bit.

The current locking rules are explained in the documentation patch: 'Update GuC
documentation'. As the locking evolves so will the documentation + lockdep
asserts.

Matt
 
> What I generally don't think makes sense is to then also document the
> locking in the kerneldoc for the functions. That tends to be one place too
> many and ime just gets out of date and not useful at all.
> 
> > > Again holds for the entire series, where it makes sense (as in we don't
> > > expect to rewrite the entire code anyway).
> > 
> > Slightly out of order but one of the last patches in the series, 'Update GuC
> > documentation' adds a big section of comments that attempts to clarify how 
> > all
> > of this code works. I likely should add a section explaining the data 
> > structures
> > as well.
> 
> Yeah that would be nice.
> -Daniel
> 
> 
> > 
> > Matt
> > 
> > > -Daniel
> > > 
> > > > +
> > > > /* Control params for fw initialization 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Christian König

Am 11.05.21 um 18:48 schrieb Daniel Vetter:

[SNIP]

Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.


Ok, wait a second. I really don't understand what's going on here.

The out fence is just to let the userspace know when the frame is 
displayed. Or rather when the old frame is no longer displayed so that 
it can be reused, right?


Then why does that need to be a dma_fence? We don't use that for memory 
management anywhere, don't we?



So you have to do an uapi change here. At that point we might as well do
it right.


I mean in the worst case we might need to allow user fences with 
sync_files as well when that is really used outside of Android.


But I still don't see the fundamental problem here.

Regards,
Christian.


Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.


So I have a plan here, what was yours?

As far as I see that should still work perfectly fine and I have the strong
feeling I'm missing something here.


Transporting fences between processes is not the fundamental problem here,
but rather the question how we represent all this in the kernel?

In other words I think what you outlined above is just approaching it from
the wrong side again. Instead of looking what the kernel needs to support
this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).

So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.

Well I'm trying to figure out why you see it as such a problem to keep
implicit sync around.

As far as I can tell it is completely octagonal if we use implicit/explicit
and dma_fence/user_fence.

It's just a different implementation inside the kernel.

See above. It falls apart with the atomic ioctl.
-Daniel


Re: [Intel-gfx] [RFC PATCH 4/5] drm/i915: Introduce 'set parallel submit' extension

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 05:11:44PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 10:30:48AM -0700, Matthew Brost wrote:
> > i915_drm.h updates for 'set parallel submit' extension.
> > 
> > Cc: Tvrtko Ursulin 
> > Cc: Tony Ye 
> > CC: Carl Zhang 
> > Cc: Daniel Vetter 
> > Cc: Jason Ekstrand 
> > Signed-off-by: Matthew Brost 
> > ---
> >  include/uapi/drm/i915_drm.h | 126 
> >  1 file changed, 126 insertions(+)
> > 
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 26d2e135aa31..0175b12b33b8 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -1712,6 +1712,7 @@ struct drm_i915_gem_context_param {
> >   * Extensions:
> >   *   i915_context_engines_load_balance 
> > (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
> >   *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
> > + *   i915_context_engines_parallel_submit 
> > (I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)
> 
> Hm just relalized, but I don't think this hyperlinsk correctly, and I'm
> also not sure this formats very well as a nice list. Using item lists
> should look pretty nice like we're doing for the various kms properties,
> e.g.
> 
> FOO:
>   Explain what FOO does
> 
> BAR:
>   Explain what BAR does. struct bar also automatically generates a link
> 
> Please check with make htmldocs and polish this a bit (might need a small
> prep patch).
> 

I agree the doc should look nice. To get there I might need to chat with you on
IRC as I'm new to this. 

> >   */
> >  #define I915_CONTEXT_PARAM_ENGINES 0xa
> >  
> > @@ -1894,9 +1895,134 @@ struct i915_context_param_engines {
> > __u64 extensions; /* linked chain of extension blocks, 0 terminates */
> >  #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see 
> > i915_context_engines_load_balance */
> >  #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */
> > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > i915_context_engines_parallel_submit */
> > struct i915_engine_class_instance engines[0];
> >  } __attribute__((packed));
> >  
> > +/*
> > + * i915_context_engines_parallel_submit:
> > + *
> > + * Setup a gem context to allow multiple BBs to be submitted in a single 
> > execbuf
> > + * IOCTL. Those BBs will then be scheduled to run on the GPU in parallel.
> > + *
> > + * All hardware contexts in the engine set are configured for parallel
> > + * submission (i.e. once this gem context is configured for parallel 
> > submission,
> > + * all the hardware contexts, regardless if a BB is available on each 
> > individual
> > + * context, will be submitted to the GPU in parallel). A user can submit 
> > BBs to
> > + * subset of the hardware contexts, in a single execbuf IOCTL, but it is 
> > not
> > + * recommended as it may reserve physical engines with nothing to run on 
> > them.
> > + * Highly recommended to configure the gem context with N hardware 
> > contexts then
> > + * always submit N BBs in a single IOCTL.
> > + *
> > + * Their are two currently defined ways to control the placement of the
> > + * hardware contexts on physical engines: default behavior (no flags) and
> > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the
> > + * future as new hardware / use cases arise. Details of how to use this
> > + * interface below above the flags.
> > + *
> > + * Returns -EINVAL if hardware context placement configuration invalid or 
> > if the
> > + * placement configuration isn't supported on the platform / submission
> > + * interface.
> > + * Returns -ENODEV if extension isn't supported on the platform / 
> > submission
> > + * inteface.
> > + */
> > +struct i915_context_engines_parallel_submit {
> > +   struct i915_user_extension base;
> 
> Ok this is good, since it makes sure we can't possible use this in
> CTX_SETPARAM.
> 

Yep, this is at context creation time. Technically you still can call this over
and over on the same gem context but Jason is taking that ability away I
believe. I've also told the media team to setup the context once and don't touch
it again.

> > +
> > +/*
> > + * Default placement behvavior (currently unsupported):
> > + *
> > + * Rather than restricting parallel submission to a single class with a
> > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a 
> > mode that
> > + * enables parallel submission across multiple engine classes. In this 
> > case each
> > + * context's logical engine mask indicates where that context can placed. 
> > It is
> > + * implied in this mode that all contexts have mutual exclusive placement 
> > (e.g.
> > + * if one context is running CS0 no other contexts can run on CS0).
> > + *
> > + * Example 1 pseudo code:
> > + * CSX[Y] = engine class X, logical instance Y
> > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > + * set_engines(INVALID, INVALID)
> > + * set_load_balance(engine_index=0, num_siblings=2, 

Re: [PATCH] drm: fix semicolon.cocci warnings

2021-05-11 Thread Geert Uytterhoeven
On Tue, May 11, 2021 at 7:17 PM Daniel Vetter  wrote:
> On Wed, May 12, 2021 at 12:11:23AM +0800, kernel test robot wrote:
> > From: kernel test robot 
> >
> > drivers/gpu/drm/kmb/kmb_dsi.c:284:3-4: Unneeded semicolon
> > drivers/gpu/drm/kmb/kmb_dsi.c:304:3-4: Unneeded semicolon
> > drivers/gpu/drm/kmb/kmb_dsi.c:321:3-4: Unneeded semicolon
> > drivers/gpu/drm/kmb/kmb_dsi.c:340:3-4: Unneeded semicolon
> > drivers/gpu/drm/kmb/kmb_dsi.c:364:2-3: Unneeded semicolon
> >
> >
> >  Remove unneeded semicolon.
> >
> > Generated by: scripts/coccinelle/misc/semicolon.cocci
> >
> > Fixes: ade896460e4a ("drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY")

This Fixed-tag is completely bogus.  The right one is
Fixes: 98521f4d4b4cb265 ("drm/kmb: Mipi DSI part of the display driver")

> > CC: Geert Uytterhoeven 
> > Reported-by: kernel test robot 
> > Signed-off-by: kernel test robot 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v3] drm/i915: Invoke another _DSM to enable MUX on HP Workstation laptops

2021-05-11 Thread Ville Syrjälä
On Mon, Apr 26, 2021 at 11:24:10PM +0800, Kai-Heng Feng wrote:
> On HP Fury G7 Workstations, graphics output is re-routed from Intel GFX
> to discrete GFX after S3. This is not desirable, because userspace will
> treat connected display as a new one, losing display settings.
> 
> The expected behavior is to let discrete GFX drives all external
> displays.
> 
> The platform in question uses ACPI method \_SB.PCI0.HGME to enable MUX.
> The method is inside the another _DSM, so add the _DSM and call it
> accordingly.
> 
> I also tested some MUX-less and iGPU only laptops with that _DSM, no
> regression was found.
> 
> v3:
>  - Remove BXT from names.
>  - Change the parameter type.
>  - Fold the function into intel_modeset_init_hw().
> 
> v2:
>  - Forward declare struct pci_dev.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3113
> References: 
> https://lore.kernel.org/intel-gfx/1460040732-31417-4-git-send-email-animesh.ma...@intel.com/
> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/gpu/drm/i915/display/intel_acpi.c| 18 ++
>  drivers/gpu/drm/i915/display/intel_acpi.h|  3 +++
>  drivers/gpu/drm/i915/display/intel_display.c |  2 ++
>  3 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_acpi.c 
> b/drivers/gpu/drm/i915/display/intel_acpi.c
> index 833d0c1be4f1..d008d3976261 100644
> --- a/drivers/gpu/drm/i915/display/intel_acpi.c
> +++ b/drivers/gpu/drm/i915/display/intel_acpi.c
> @@ -13,12 +13,17 @@
>  #include "intel_display_types.h"
>  
>  #define INTEL_DSM_REVISION_ID 1 /* For Calpella anyway... */
> +#define INTEL_DSM_FN_PLATFORM_MUX_ENABLE 0 /* No args */

This block of defines is for the other DSM. We don't want to
mix these up. We also want to name it according to the spec,
so something like GET_BIOS_DATA_FUNCS_SUPPORTED. Similarly
for the intel_dsm_enable_mux() wrapper function. + it needs
a comment to document that some BIOSes abuse it to do MUX
initialization and whatnot.

We should perhaps rename all the old DSM stuff to
something a bit less generic as well...

>  #define INTEL_DSM_FN_PLATFORM_MUX_INFO 1 /* No args */
>  
>  static const guid_t intel_dsm_guid =
>   GUID_INIT(0x7ed873d3, 0xc2d0, 0x4e4f,
> 0xa8, 0x54, 0x0f, 0x13, 0x17, 0xb0, 0x1c, 0x2c);
>  
> +static const guid_t intel_dsm_guid2 =
> + GUID_INIT(0x3e5b41c6, 0xeb1d, 0x4260,
> +   0x9d, 0x15, 0xc7, 0x1f, 0xba, 0xda, 0xe4, 0x14);
> +
>  static char *intel_dsm_port_name(u8 id)
>  {
>   switch (id) {
> @@ -176,6 +181,19 @@ void intel_unregister_dsm_handler(void)
>  {
>  }
>  
> +void intel_dsm_enable_mux(struct drm_i915_private *i915)
> +{
> + struct pci_dev *pdev = i915->drm.pdev;
> + acpi_handle dhandle;
> +
> + dhandle = ACPI_HANDLE(>dev);
> + if (!dhandle)
> + return;
> +
> + acpi_evaluate_dsm(dhandle, _dsm_guid2, INTEL_DSM_REVISION_ID,
> +   INTEL_DSM_FN_PLATFORM_MUX_ENABLE, NULL);
> +}
> +
>  /*
>   * ACPI Specification, Revision 5.0, Appendix B.3.2 _DOD (Enumerate All 
> Devices
>   * Attached to the Display Adapter).
> diff --git a/drivers/gpu/drm/i915/display/intel_acpi.h 
> b/drivers/gpu/drm/i915/display/intel_acpi.h
> index e8b068661d22..def013cf6308 100644
> --- a/drivers/gpu/drm/i915/display/intel_acpi.h
> +++ b/drivers/gpu/drm/i915/display/intel_acpi.h
> @@ -11,11 +11,14 @@ struct drm_i915_private;
>  #ifdef CONFIG_ACPI
>  void intel_register_dsm_handler(void);
>  void intel_unregister_dsm_handler(void);
> +void intel_dsm_enable_mux(struct drm_i915_private *i915);
>  void intel_acpi_device_id_update(struct drm_i915_private *i915);
>  #else
>  static inline void intel_register_dsm_handler(void) { return; }
>  static inline void intel_unregister_dsm_handler(void) { return; }
>  static inline
> +void intel_dsm_enable_mux(struct drm_i915_private *i915) { return; }
> +static inline
>  void intel_acpi_device_id_update(struct drm_i915_private *i915) { return; }
>  #endif /* CONFIG_ACPI */
>  
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
> b/drivers/gpu/drm/i915/display/intel_display.c
> index a10e26380ef3..d79dae370b20 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -11472,6 +11472,8 @@ void intel_modeset_init_hw(struct drm_i915_private 
> *i915)
>  {
>   struct intel_cdclk_state *cdclk_state;
>  
> + intel_dsm_enable_mux(i915);
> +

This should probably be somewhere around where we do all the other
semi ACPI related init (OpRegion/etc.).

>   if (!HAS_DISPLAY(i915))
>   return;
>  
> -- 
> 2.30.2

-- 
Ville Syrjälä
Intel


Re: [v3 1/2] dt-bindings: backlight: add DisplayPort aux backlight

2021-05-11 Thread rajeevny

On 01-05-2021 03:08, Doug Anderson wrote:

Hi,

On Fri, Apr 30, 2021 at 8:10 AM  wrote:


On 30-04-2021 02:33, Doug Anderson wrote:
> Hi,
>
> On Thu, Apr 29, 2021 at 11:04 AM Rob Herring  wrote:
>>
>> On Mon, Apr 26, 2021 at 11:29:15AM +0530, Rajeev Nandan wrote:
>> > Add bindings for DisplayPort aux backlight driver.
>> >
>> > Changes in v2:
>> > - New
>> >
>> > Signed-off-by: Rajeev Nandan 
>> > ---
>> >  .../bindings/leds/backlight/dp-aux-backlight.yaml  | 49 
++
>> >  1 file changed, 49 insertions(+)
>> >  create mode 100644 
Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
>> >
>> > diff --git 
a/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml 
b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
>> > new file mode 100644
>> > index ..0fa8bf0
>> > --- /dev/null
>> > +++ 
b/Documentation/devicetree/bindings/leds/backlight/dp-aux-backlight.yaml
>> > @@ -0,0 +1,49 @@
>> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> > +%YAML 1.2
>> > +---
>> > +$id: http://devicetree.org/schemas/leds/backlight/dp-aux-backlight.yaml#
>> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> > +
>> > +title: DisplayPort aux backlight driver bindings
>> > +
>> > +maintainers:
>> > +  - Rajeev Nandan 
>> > +
>> > +description:
>> > +  Backlight driver to control the brightness over DisplayPort aux channel.
>> > +
>> > +allOf:
>> > +  - $ref: common.yaml#
>> > +
>> > +properties:
>> > +  compatible:
>> > +const: dp-aux-backlight
>> > +
>> > +  ddc-i2c-bus:
>> > +$ref: /schemas/types.yaml#/definitions/phandle
>> > +description:
>> > +  A phandle to the system I2C controller connected to the DDC bus used
>> > +  for the DisplayPort AUX channel.
>> > +
>> > +  enable-gpios:
>> > +maxItems: 1
>> > +description: GPIO specifier for backlight enable pin.
>> > +
>> > +  max-brightness: true
>> > +
>> > +required:
>> > +  - compatible
>> > +  - ddc-i2c-bus
>> > +
>> > +additionalProperties: false
>> > +
>> > +examples:
>> > +  - |
>> > +backlight {
>> > +compatible = "dp-aux-backlight";
>> > +ddc-i2c-bus = <_bridge>;
>> > +enable-gpios = < 12 GPIO_ACTIVE_HIGH>;
>>
>> So the DDC bus is connected to a backlight and also a panel? This
>> binding is not reflecting the h/w, but rather what you want for some
>> driver.
>>
>> There's only one thing here and that's an eDP panel which supports
>> backlight control via DP aux channel. You can figure all that out from
>> the panel's compatible and/or reading the EDID.
>>
>> You might also be interested in this thread:
>>
>> https://lore.kernel.org/lkml/yiksdtjcihgnv...@orome.fritz.box/
>
> I think Rajeev needs to rework everything anyway as per:
>
> https://lore.kernel.org/r/87zgxl5qar@intel.com
>
> ...but you're right that it makes sense not to model the backlight as
> a separate node in the device tree. The panel driver can handle
> setting up the backlight.
>
> -Doug

It was not a good idea to create a separate backlight driver and use
ddc-i2c-bus to get access to DP aux. I am working to move the code
to the panel driver and to utilize the new DRM helper functions
(drm_edp_backlight_*) Lyude has added [1].

To use these helper functions, the panel driver should have access to
the
"struct drm_dp_aux *". The simple-panel has a "ddc-i2c-bus" property
to give the panel access to the DDC bus and is currently being used to
get the EDID from the panel. Can I use the same ddc bus i2c_adapter to
get
the "struct drm_dp_aux *"?

As per the suggestion [2], I get the "struct drm_dp_aux *" from the
i2c_adapter of ddc bus (maybe I didn't understand the suggestion
correctly),
and, it turned out, the way I have implemented is not the right way 
[3].

So, I am afraid to use the same method in the panel driver.


[1] https://lore.kernel.org/dri-devel/871rb5bcf9@intel.com/
[2] https://www.spinics.net/lists/dri-devel/msg295429.html
[3]
https://lore.kernel.org/dri-devel/2021042616.4lc3ekxjugjr3...@maple.lan/


So it's definitely up to maintainers, not me. ...but I guess I would
have expected something like a new property called "ddc-aux-bus". Then
you'd have to create a new API call called something like
"of_find_ddc_aux_adapter_by_node()" that would allow you to find it.



To implement the first suggestion, I can think of the following way
to get the "struct drm_dp_aux" in the panel_simple_probe function:

- Create a new panel-simple DT property "ddc-aux-bus", a phandle to the
platform device that implements the AUX channel.

- Create a global list of drm_dp_aux in drm_dp_helper.c. Initialize list 
head
in drm_dp_aux_init(), add the drm_dp_aux onto the list in 
drm_dp_aux_register().

Similarly, remove the drm_dp_aux from list in drm_dp_aux_unregister().

- Create a new function of_drm_find_dp_aux_by_node() to get the expected
drm_dp_aux from this global list.

Please let me know your views on this implementation.

Below is the 

Re: [Intel-gfx] [RFC PATCH 5/5] drm/i915: Update execbuf IOCTL to accept N BBs

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 05:13:54PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 10:30:49AM -0700, Matthew Brost wrote:
> > Add I915_EXEC_NUMBER_BB_* to drm_i915_gem_execbuffer2.flags which allows
> > submitting N BBs per IOCTL.
> > 
> > Cc: Tvrtko Ursulin 
> > Cc: Tony Ye 
> > CC: Carl Zhang 
> > Cc: Daniel Vetter 
> > Cc: Jason Ekstrand 
> > Signed-off-by: Matthew Brost 
> 
> I dropped my big question on the previous patch already, I'll check this
> out again when it's all squashed into the parallel extension patch so we
> have everything in one commit.

I think we just drop this and only allow N BBs per IOCTL as discussed in patch
#2 of this series.

Matt

> -Daniel
> 
> > ---
> >  include/uapi/drm/i915_drm.h | 21 -
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 0175b12b33b8..d3072cad4a7e 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -1291,7 +1291,26 @@ struct drm_i915_gem_execbuffer2 {
> >   */
> >  #define I915_EXEC_USE_EXTENSIONS   (1 << 21)
> >  
> > -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1))
> > +/*
> > + * Number of BB in execbuf2 IOCTL - 1, used to submit more than BB in a 
> > single
> > + * execbuf2 IOCTL.
> > + *
> > + * Return -EINVAL if more than 1 BB (value 0) is specified if
> > + * I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT hasn't been called on the gem
> > + * context first. Also returns -EINVAL if gem context has been setup with
> > + * I915_PARALLEL_NO_PREEMPT_MID_BATCH and the number BBs not equal to the 
> > total
> > + * number hardware contexts in the gem context.
> > + */
> > +#define I915_EXEC_NUMBER_BB_LSB(22)
> > +#define I915_EXEC_NUMBER_BB_MASK   (0x3f << I915_EXEC_NUMBER_BB_LSB)
> > +#define I915_EXEC_NUMBER_BB_MSB(27)
> > +#define i915_execbuffer2_set_number_bb(eb2, num_bb) \
> > +   (eb2).flags = ((eb2).flags & ~I915_EXEC_NUMBER_BB_MASK) | \
> > +   (((num_bb - 1) << I915_EXEC_NUMBER_BB_LSB) & I915_EXEC_NUMBER_BB_MASK)
> > +#define i915_execbuffer2_get_number_bb(eb2) \
> > +   eb2).flags & I915_EXEC_NUMBER_BB_MASK) >> I915_EXEC_NUMBER_BB_LSB) 
> > + 1)
> > +
> > +#define __I915_EXEC_UNKNOWN_FLAGS (-(1 << (I915_EXEC_NUMBER_BB_MSB + 1)))
> >  
> >  #define I915_EXEC_CONTEXT_ID_MASK  (0x)
> >  #define i915_execbuffer2_set_context_id(eb2, context) \
> > -- 
> > 2.28.0
> > 
> > ___
> > Intel-gfx mailing list
> > intel-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 05:16:38PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
> > From: Michal Wajdeczko 
> > 
> > New GuC firmware will unify format of MMIO and CTB H2G messages.
> > Introduce their definitions now to allow gradual transition of
> > our code to match new changes.
> > 
> > Signed-off-by: Michal Wajdeczko 
> > Signed-off-by: Matthew Brost 
> > Cc: Michał Winiarski 
> > ---
> >  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++
> >  1 file changed, 226 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h 
> > b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> > index 775e21f3058c..1c264819aa03 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> > @@ -6,6 +6,232 @@
> >  #ifndef _ABI_GUC_MESSAGES_ABI_H
> >  #define _ABI_GUC_MESSAGES_ABI_H
> >  
> > +/**
> > + * DOC: HXG Message
> 
> These aren't useful if we don't pull them in somewhere in the
> Documentation/gpu hierarchy. General comment, and also please check that
> it all renders correctly still.
>

Sure. Let me figure this out before my next rev.
 
> btw if you respin a patch not originally by you we generally add a (v1) to
> the original s-o-b line (or whever the version split was) and explain in
> the usual changelog in the commit message what was changed.
> 

Still new to this process. Will do.

Matt

> This holds for the entire series ofc.
> -Daniel
> 
> > + *
> > + * All messages exchanged with GuC are defined using 32 bit dwords.
> > + * First dword is treated as a message header. Remaining dwords are 
> > optional.
> > + *
> > + * .. _HXG Message:
> > + *
> > + *  
> > +---+---+--+
> > + *  |   | Bits  | Description  
> > |
> > + *  
> > +===+===+==+
> > + *  |   |   |  
> > |
> > + *  | 0 |31 | **ORIGIN** - originator of the message   
> > |
> > + *  |   |   |   - _`GUC_HXG_ORIGIN_HOST` = 0   
> > |
> > + *  |   |   |   - _`GUC_HXG_ORIGIN_GUC` = 1
> > |
> > + *  |   |   |  
> > |
> > + *  |   
> > +---+--+
> > + *  |   | 30:28 | **TYPE** - message type  
> > |
> > + *  |   |   |   - _`GUC_HXG_TYPE_REQUEST` = 0  
> > |
> > + *  |   |   |   - _`GUC_HXG_TYPE_EVENT` = 1
> > |
> > + *  |   |   |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3 
> > |
> > + *  |   |   |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5
> > |
> > + *  |   |   |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6 
> > |
> > + *  |   |   |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7 
> > |
> > + *  |   
> > +---+--+
> > + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)  
> > |
> > + *  
> > +---+---+--+
> > + *  | 1 |  31:0 | optional payload (depends on TYPE)   
> > |
> > + *  +---+---+  
> > |
> > + *  |...|   |  
> > |
> > + *  +---+---+  
> > |
> > + *  | n |  31:0 |  
> > |
> > + *  
> > +---+---+--+
> > + */
> > +
> > +#define GUC_HXG_MSG_MIN_LEN1u
> > +#define GUC_HXG_MSG_0_ORIGIN   (0x1 << 31)
> > +#define   GUC_HXG_ORIGIN_HOST  0u
> > +#define   GUC_HXG_ORIGIN_GUC   1u
> > +#define GUC_HXG_MSG_0_TYPE (0x7 << 28)
> > +#define   GUC_HXG_TYPE_REQUEST 0u
> > +#define   GUC_HXG_TYPE_EVENT   1u
> > +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY3u
> > +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY   5u
> > +#define   GUC_HXG_TYPE_RESPONSE_FAILURE6u
> > +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS7u
> > +#define GUC_HXG_MSG_0_AUX  (0xfff << 0)
> > +
> > +/**
> > + * DOC: HXG Request
> > + *
> > + * The `HXG Request`_ message should be used to initiate synchronous 
> > activity
> > + * for which confirmation or return data is expected.
> > + *
> > + * The recipient of this message shall use 

Re: [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 05:18:22PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:46PM -0700, Matthew Brost wrote:
> > Introduce i915_sched_engine object which is lower level data structure
> > that i915_scheduler / generic code can operate on without touching
> > execlist specific structures. This allows additional submission backends
> > to be added without breaking the layer.
> 
> Maybe add a comment here that this is defacto a detour since we're now
> aiming to use drm/scheduler instead. But also since the current code is a
> bit a mess, we expect this detour to be overall faster since we can then
> refactor in-tree.
> 

Agree. I think in the end we will still have a i915_sched_engine which more or
less encapsulates a 'struct drm_gpu_scheduler' plus a few common variables
between the execlist and GuC backends.

Matt

> Maybe also highlight this a bit more in the rfc to make sure this is
> clear.
> -Daniel
> 
> > 
> > Cc: Daniele Ceraolo Spurio 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_wait.c  |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_engine.h|  16 -
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  77 ++--
> >  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   4 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_pm.c |  10 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  42 +--
> >  drivers/gpu/drm/i915/gt/intel_engine_user.c   |   2 +-
> >  .../drm/i915/gt/intel_execlists_submission.c  | 350 +++---
> >  .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
> >  drivers/gpu/drm/i915/gt/mock_engine.c |  17 +-
> >  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  36 +-
> >  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
> >  drivers/gpu/drm/i915/gt/selftest_lrc.c|   6 +-
> >  drivers/gpu/drm/i915/gt/selftest_reset.c  |   2 +-
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  75 ++--
> >  drivers/gpu/drm/i915/i915_gpu_error.c |   7 +-
> >  drivers/gpu/drm/i915/i915_request.c   |  50 +--
> >  drivers/gpu/drm/i915/i915_request.h   |   2 +-
> >  drivers/gpu/drm/i915/i915_scheduler.c | 168 -
> >  drivers/gpu/drm/i915/i915_scheduler.h |  65 +++-
> >  drivers/gpu/drm/i915/i915_scheduler_types.h   |  63 
> >  21 files changed, 575 insertions(+), 440 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > index 4b9856d5ba14..af1fbf8e2a9a 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > @@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
> > engine = rq->engine;
> >  
> > rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> > -   if (engine->schedule)
> > -   engine->schedule(rq, attr);
> > +   if (engine->sched_engine->schedule)
> > +   engine->sched_engine->schedule(rq, attr);
> > rcu_read_unlock();
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
> > b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index 8d9184920c51..988d9688ae4d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists 
> > *execlists)
> > return active;
> >  }
> >  
> > -static inline void
> > -execlists_active_lock_bh(struct intel_engine_execlists *execlists)
> > -{
> > -   local_bh_disable(); /* prevent local softirq and lock recursion */
> > -   tasklet_lock(>tasklet);
> > -}
> > -
> > -static inline void
> > -execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
> > -{
> > -   tasklet_unlock(>tasklet);
> > -   local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
> > -}
> > -
> >  struct i915_request *
> >  execlists_unwind_incomplete_requests(struct intel_engine_execlists 
> > *execlists);
> >  
> > @@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs 
> > *engine);
> >  
> >  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
> >  
> > -void intel_engine_init_active(struct intel_engine_cs *engine,
> > - unsigned int subclass);
> >  #define ENGINE_PHYSICAL0
> >  #define ENGINE_MOCK1
> >  #define ENGINE_VIRTUAL 2
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 828e1669f92c..ec82a7ec0c8d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -8,6 +8,7 @@
> >  #include "gem/i915_gem_context.h"
> >  
> >  #include "i915_drv.h"
> > +#include "i915_scheduler.h"
> >  
> >  #include "intel_breadcrumbs.h"
> >  #include "intel_context.h"
> > @@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
> > intel_engine_id id)
> > if (engine->context_size)
> > 

Re: [Intel-gfx] [RFC PATCH 2/5] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 04:49:58PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 10:30:46AM -0700, Matthew Brost wrote:
> > Add entry fpr i915 new parallel submission uAPI plan.
> > 
> > Cc: Tvrtko Ursulin 
> > Cc: Tony Ye 
> > CC: Carl Zhang 
> > Cc: Daniel Vetter 
> > Cc: Jason Ekstrand 
> > Signed-off-by: Matthew Brost 
> > ---
> >  Documentation/gpu/rfc/i915_scheduler.rst | 56 +++-
> >  1 file changed, 54 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/gpu/rfc/i915_scheduler.rst 
> > b/Documentation/gpu/rfc/i915_scheduler.rst
> > index fa6780a11c86..e3455b33edfe 100644
> > --- a/Documentation/gpu/rfc/i915_scheduler.rst
> > +++ b/Documentation/gpu/rfc/i915_scheduler.rst
> > @@ -13,7 +13,8 @@ i915 with the DRM scheduler is:
> >   modparam enable_guc
> > * Lots of rework will need to be done to integrate with DRM scheduler so
> >   no need to nit pick everything in the code, it just should be
> > - functional and not regress execlists
> > + functional, no major coding style / layering errors, and not regress
> > + execlists
> 
> I guess this hunk should be in the previous patch?
> 

Yep, noticed this after sending.

> > * Update IGTs / selftests as needed to work with GuC submission
> > * Enable CI on supported platforms for a baseline
> > * Rework / get CI heathly for GuC submission in place as needed
> > @@ -67,4 +68,55 @@ levels too.
> >  
> >  New parallel submission uAPI
> >  
> > -Details to come in a following patch.
> > +The existing bonding uAPI is completely broken with GuC submission because
> > +whether a submission is a single context submit or parallel submit isn't 
> > known
> > +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
> > +contexts in parallel with the GuC the context must be explictly registered 
> > with
> > +N contexts and all N contexts must be submitted in a single command to the 
> > GuC.
> > +This interfaces doesn't support dynamically changing between N contexts as 
> > the
> > +bonding uAPI does. Hence the need for a new parallel submission interface. 
> > Also
> > +the legacy bonding uAPI is quite confusing and not intuitive at all.
> 
> I think you should sit together with Jason on irc or so for a bit and get
> an earful of how it's all broken irrespective of GuC submission or not.
> Just to hammer in our case :-)
>

Sounds like a fun conversation, will do.
 
> > +
> > +The new parallel submission uAPI consists of 3 parts:
> > +
> > +* Export engines logical mapping
> > +* A 'set_parallel' extension to configure contexts for parallel
> > +  submission
> > +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
> > +
> > +Export engines logical mapping
> > +--
> > +Certain use cases require BBs to be placed on engine instances in logical 
> > order
> > +(e.g. split-frame on gen11+). The logical mapping of engine instances can 
> > change
> > +based on fusing. Rather than making UMDs be aware of fusing, simply expose 
> > the
> > +logical mapping with the existing query engine info IOCTL. Also the GuC
> > +submission interface currently only supports submitting multiple contexts 
> > to
> > +engines in logical order.
> 
> Maybe highlight more that this is a new restriction with GuC compared to
> execlist, which is why we need to expose this information to userspace.
> Also on the platforms thus far supported in upstream there's at most 2
> engines of the same type, so really not an issue.
>

Sure. This is a limitation of the GuC interface + really isn't needed unless we
have more than 2 engines of the same type.
 
> > +
> > +A single bit will be added to drm_i915_engine_info.flags indicating that 
> > the
> > +logical instance has been returned and a new field,
> > +drm_i915_engine_info.logical_instance, returns the logical instance.
> > +
> > +A 'set_parallel' extension to configure contexts for parallel submission
> > +
> > +The 'set_parallel' extension configures N contexts for parallel 
> > submission. It
> > +is setup step that should be called before using any of the contexts. See
> > +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
> > +similar existing examples. Once the N contexts are configured for parallel
> > +submission the execbuf2 IOCTL can be called submiting 1-N BBs in a single 
> > IOCTL.
> > +Although submitting less than N BBs is allowed it is not recommended as 
> > that
> > +will likely leave parts of the hardware reserved and idle. Initially only
> > +support GuC submission. Execlist support can be added later if needed.
> 
> Can we just require that you always submit N batchbuffers, or does this
> create a problem for userspace? Allowing things just because is generally
> not a good idea with uapi, it's better to limit and then allow when
> there's a need.
>

Yes, we can 

Re: [PATCH v6 10/16] drm/amdgpu: Guard against write accesses after device removal

2021-05-11 Thread Andrey Grodzovsky




On 2021-05-11 2:50 a.m., Christian König wrote:

Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky:

This should prevent writing to memory or IO ranges possibly
already allocated for other uses after our device is removed.

v5:
Protect more places wher memcopy_to/form_io takes place
Protect IB submissions

v6: Switch to !drm_dev_enter instead of scoping entire code
with brackets.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 11 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c   |  9 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c    | 17 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   | 63 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h   |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  | 70 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  | 49 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   | 31 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c   | 11 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 22 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  7 +-
  drivers/gpu/drm/amd/amdgpu/psp_v11_0.c    | 44 ++--
  drivers/gpu/drm/amd/amdgpu/psp_v12_0.c    |  8 +--
  drivers/gpu/drm/amd/amdgpu/psp_v3_1.c |  8 +--
  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 26 ---
  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 22 +++---
  .../drm/amd/pm/powerplay/smumgr/smu7_smumgr.c |  2 +
  17 files changed, 257 insertions(+), 145 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index a0bff4713672..94c415176cdc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -71,6 +71,8 @@
  #include 
  #include 
+#include 
+
  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -281,7 +283,10 @@ void amdgpu_device_vram_access(struct 
amdgpu_device *adev, loff_t pos,

  unsigned long flags;
  uint32_t hi = ~0;
  uint64_t last;
+    int idx;
+ if (!drm_dev_enter(>ddev, ))
+ return;
  #ifdef CONFIG_64BIT
  last = min(pos + size, adev->gmc.visible_vram_size);
@@ -299,8 +304,10 @@ void amdgpu_device_vram_access(struct 
amdgpu_device *adev, loff_t pos,

  memcpy_fromio(buf, addr, count);
  }
-    if (count == size)
+    if (count == size) {
+    drm_dev_exit(idx);
  return;
+    }


Maybe use a goto instead, but really just a nit pick.




  pos += count;
  buf += count / 4;
@@ -323,6 +330,8 @@ void amdgpu_device_vram_access(struct 
amdgpu_device *adev, loff_t pos,

  *buf++ = RREG32_NO_KIQ(mmMM_DATA);
  }
  spin_unlock_irqrestore(>mmio_idx_lock, flags);
+
+    drm_dev_exit(idx);
  }
  /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

index 4d32233cde92..04ba5eef1e88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -31,6 +31,8 @@
  #include "amdgpu_ras.h"
  #include "amdgpu_xgmi.h"
+#include 
+
  /**
   * amdgpu_gmc_pdb0_alloc - allocate vram for pdb0
   *
@@ -151,6 +153,10 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device 
*adev, void *cpu_pt_addr,

  {
  void __iomem *ptr = (void *)cpu_pt_addr;
  uint64_t value;
+    int idx;
+
+    if (!drm_dev_enter(>ddev, ))
+    return 0;
  /*
   * The following is for PTE only. GART does not have PDEs.
@@ -158,6 +164,9 @@ int amdgpu_gmc_set_pte_pde(struct amdgpu_device 
*adev, void *cpu_pt_addr,

  value = addr & 0xF000ULL;
  value |= flags;
  writeq(value, ptr + (gpu_page_idx * 8));
+
+    drm_dev_exit(idx);
+
  return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c

index 148a3b481b12..62fcbd446c71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -30,6 +30,7 @@
  #include 
  #include 
+#include 
  #include "amdgpu.h"
  #include "atom.h"
@@ -137,7 +138,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
unsigned num_ibs,

  bool secure;
  unsigned i;
-    int r = 0;
+    int idx, r = 0;
  bool need_pipe_sync = false;
  if (num_ibs == 0)
@@ -169,13 +170,16 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
unsigned num_ibs,

  return -EINVAL;
  }
+    if (!drm_dev_enter(>ddev, ))
+    return -ENODEV;
+
  alloc_size = ring->funcs->emit_frame_size + num_ibs *
  ring->funcs->emit_ib_size;
  r = amdgpu_ring_alloc(ring, alloc_size);
  if (r) {
  dev_err(adev->dev, "scheduling IB failed (%d).\n", r);
-    return r;
+    goto exit;
  }
  need_ctx_switch = ring->current_ctx != fence_ctx;
@@ -205,7 +209,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
unsigned num_ibs,

  r = 

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:42:58AM -0700, Rob Clark wrote:
> On Tue, May 11, 2021 at 10:21 AM Daniel Vetter  wrote:
> >
> > On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote:
> > > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > > From: Rob Clark 
> > > > > > > > >
> > > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank 
> > > > > > > > > on "video
> > > > > > > > > mode" type displays, which is pointless and unnecessary.  Add 
> > > > > > > > > an
> > > > > > > > > optional helper vfunc to determine if a plane is attached to 
> > > > > > > > > a CRTC
> > > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > >
> > > > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > > > legacy uapi
> > > > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > > > anything
> > > > > > > > beyond what userspace also has available. So adding hacks for 
> > > > > > > > them feels
> > > > > > > > really bad.
> > > > > > >
> > > > > > > I suppose the root problem is that userspace doesn't know if 
> > > > > > > dirtyfb
> > > > > > > (or similar) is actually required or is a no-op.
> > > > > > >
> > > > > > > But it is perhaps less of a problem because this essentially boils
> > > > > > > down to "x11 vs wayland", and it seems like wayland compositors 
> > > > > > > for
> > > > > > > non-vsync'd rendering just pageflips and throws away extra frames 
> > > > > > > from
> > > > > > > the app?
> > > > > >
> > > > > > Yeah it's about not adequately batching up rendering and syncing 
> > > > > > with
> > > > > > hw. bare metal x11 is just especially stupid about it :-)
> > > > > >
> > > > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > > > either.
> > > > > > > > We've had this problem already on the fbcon emulation side 
> > > > > > > > (which also
> > > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), 
> > > > > > > > and the fix
> > > > > > > > there was to have a worker which batches up all the updates and 
> > > > > > > > avoids any
> > > > > > > > stalls in bad places.
> > > > > > >
> > > > > > > I'm not too worried about fbcon not being able to render faster 
> > > > > > > than
> > > > > > > vblank.  OTOH it is a pretty big problem for x11
> > > > > >
> > > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > > > the same with fbcon, which trivially can get ahead of vblank 
> > > > > > otherwise
> > > > > > (if sometimes flushes each character, so you have to pile them up 
> > > > > > into
> > > > > > a single update if that's still pending).
> > > > > >
> > > > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > > > probably get
> > > > > > > > away with assuming there's only a single fb, so the 
> > > > > > > > implementation becomes
> > > > > > > > pretty simple:
> > > > > > > >
> > > > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > > > - if there's already a dirty fb pending on a different fb, we 
> > > > > > > > stall for
> > > > > > > >   the worker to start processing that one already (i.e. the fb 
> > > > > > > > we track is
> > > > > > > >   reset to NULL)
> > > > > > > > - if it's pending on the same fb we just toss away all the 
> > > > > > > > updates and go
> > > > > > > >   with a full update, since merging the clip rects is too much 
> > > > > > > > work :-) I
> > > > > > > >   think there's helpers so you could be slightly more clever 
> > > > > > > > and just have
> > > > > > > >   an overall bounding box
> > > > > > >
> > > > > > > This doesn't really fix the problem, you still end up delaying 
> > > > > > > sending
> > > > > > > the next back-buffer to mesa
> > > > > >
> > > > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > > > tracking corruption is possible, but that's not the kernel's 
> > > > > > problem.
> > > > > > So how would anything get held up in userspace.
> > > > >
> > > > > the part about stalling if a dirtyfb is pending was what I was worried
> > > > > about.. but I suppose you meant the worker stalling, rather than
> > > > > userspace stalling (where I had interpreted it the other way around).
> > > > > As soon as userspace needs to stall, you're losing again.
> > > >
> > > > Nah, I did mean userspace stalling, so we can't pile up unlimited 
> > > > amounts
> > > > of dirtyfb request in the kernel.
> > > 

Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:12:32AM -0700, Matthew Brost wrote:
> On Tue, May 11, 2021 at 06:28:25PM +0200, Daniel Vetter wrote:
> > On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote:
> > > We receive notification of an engine reset from GuC at its
> > > completion. Meaning GuC has potentially cleared any HW state
> > > we may have been interested in capturing. GuC resumes scheduling
> > > on the engine post-reset, as the resets are meant to be transparent,
> > > further muddling our error state.
> > > 
> > > There is ongoing work to define an API for a GuC debug state dump. The
> > > suggestion for now is to manually disable FW initiated resets in cases
> > > where debug state is needed.
> > > 
> > > Signed-off-by: Matthew Brost 
> > 
> > This looks a bit backwards to me:
> > 
> 
> Definitely a bit hacky but this patch does the best to capture the error as it
> can,
> 
> > - I figured we should capture error state when we get the G2H, in which
> >   case I hope we do know which the offending context was that got shot.
> >
> 
> We know which context was shot based on the G2H. See 'hung_ce' in this patch.

Ah maybe I should read more. Would be good to have comments on how the
locking works here, especially around reset it tends to be tricky.
Comments in the data structs/members.

> 
> > - For now we're missing the hw state, but we should still be able to
> >   capture the buffers userspace wants us to capture. So that could be
> >   wired up already?
> 
> Which buffers exactly? We dump all buffers associated with the context. 

There's an opt-in list that userspace can set in execbuf. Maybe that's the
one you mean.
-Daniel

> 
> > 
> > But yeah register state capturing needs support from GuC fw.
> >
> > I think this is a big enough miss in GuC features that we should list it
> > on the rfc as a thing to fix.
> 
> Agree this needs to be fixed.
> 
> Matt
> 
> > -Daniel
> > 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_context.c   | 20 +++
> > >  drivers/gpu/drm/i915/gt/intel_context.h   |  3 ++
> > >  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
> > >  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 --
> > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +--
> > >  drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++---
> > >  7 files changed, 91 insertions(+), 26 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index 2f01437056a8..3fe7794b2bfd 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -514,6 +514,26 @@ struct i915_request 
> > > *intel_context_create_request(struct intel_context *ce)
> > >   return rq;
> > >  }
> > >  
> > > +struct i915_request *intel_context_find_active_request(struct 
> > > intel_context *ce)
> > > +{
> > > + struct i915_request *rq, *active = NULL;
> > > + unsigned long flags;
> > > +
> > > + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
> > > +
> > > + spin_lock_irqsave(>guc_active.lock, flags);
> > > + list_for_each_entry_reverse(rq, >guc_active.requests,
> > > + sched.link) {
> > > + if (i915_request_completed(rq))
> > > + break;
> > > +
> > > + active = rq;
> > > + }
> > > + spin_unlock_irqrestore(>guc_active.lock, flags);
> > > +
> > > + return active;
> > > +}
> > > +
> > >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> > >  #include "selftest_context.c"
> > >  #endif
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index 9b211ca5ecc7..d2b499ed8a05 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct 
> > > intel_context *ce,
> > >  
> > >  struct i915_request *intel_context_create_request(struct intel_context 
> > > *ce);
> > >  
> > > +struct i915_request *
> > > +intel_context_find_active_request(struct intel_context *ce);
> > > +
> > >  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
> > >  {
> > >   return u64_to_ptr(struct intel_ring, sz);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
> > > b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > index 3321d0917a99..bb94963a9fa2 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct 
> > > intel_engine_cs *engine,
> > >  ktime_t *now);
> > >  
> > >  struct i915_request *
> > > -intel_engine_find_active_request(struct intel_engine_cs *engine);
> > > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
> > >  
> > >  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
> > > 

Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:01:28AM -0700, Matthew Brost wrote:
> On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote:
> > On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> > > Add lrc descriptor context lookup array which can resolve the
> > > intel_context from the lrc descriptor index. In addition to lookup, it
> > > can determine in the lrc descriptor context is currently registered with
> > > the GuC by checking if an entry for a descriptor index is present.
> > > Future patches in the series will make use of this array.
> > > 
> > > Cc: John Harrison 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  5 +++
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +--
> > >  2 files changed, 35 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index d84f37afb9d8..2eb6c497e43c 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -6,6 +6,8 @@
> > >  #ifndef _INTEL_GUC_H_
> > >  #define _INTEL_GUC_H_
> > >  
> > > +#include "linux/xarray.h"
> > > +
> > >  #include "intel_uncore.h"
> > >  #include "intel_guc_fw.h"
> > >  #include "intel_guc_fwif.h"
> > > @@ -47,6 +49,9 @@ struct intel_guc {
> > >   struct i915_vma *lrc_desc_pool;
> > >   void *lrc_desc_pool_vaddr;
> > >  
> > > + /* guc_id to intel_context lookup */
> > > + struct xarray context_lookup;
> > 
> > The current code sets a disastrous example, but for stuff like this it's
> > always good to explain the locking, and who's holding references and how
> > you're handling cycles. Since I guess the intel_context also holds the
> > guc_id alive somehow.
> > 
> 
> I think (?) I know what you mean by this comment. How about adding:
> 
> 'If an entry in the the context_lookup is present, that means a context
> associated with the guc_id is registered with the GuC. We use this xarray as a
> lookup mechanism when the GuC communicate with the i915 about the context.'

So no idea how this works, but generally we put a "Protecte by
" or similar in here (so you get a nice link plus something
you can use as jump label in your ide too). Plus since intel_context has
some lifetime rules, explaining whether you're allowed to use the pointer
after you unlock, or whether you need to grab a reference or what exactly
is going on. Usually there's three options:

- No refcounting, you cannot access a pointer obtained through this after
  you unluck.
- Weak reference, you upgrade to a full reference with
  kref_get_unless_zero. If that fails it indicates a lookup failure, since
  you raced with destruction. If it succeeds you can use the pointer after
  unlock.
- Strong reference, you get your own reference that stays valid with
  kref_get().

I'm just bringing this up because the current i915-gem code is full of
very tricky locking and lifetime rules, and explains roughly nothing of it
in the data structures. Minimally some hints about the locking/lifetime
rules of important structs should be there.

For locking rules it's good to double-down on them by adding
lockdep_assert_held to all relevant functions (where appropriate only
ofc).

What I generally don't think makes sense is to then also document the
locking in the kerneldoc for the functions. That tends to be one place too
many and ime just gets out of date and not useful at all.

> > Again holds for the entire series, where it makes sense (as in we don't
> > expect to rewrite the entire code anyway).
> 
> Slightly out of order but one of the last patches in the series, 'Update GuC
> documentation' adds a big section of comments that attempts to clarify how all
> of this code works. I likely should add a section explaining the data 
> structures
> as well.

Yeah that would be nice.
-Daniel


> 
> Matt
> 
> > -Daniel
> > 
> > > +
> > >   /* Control params for fw initialization */
> > >   u32 params[GUC_CTL_MAX_DWORDS];
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 6acc1ef34f92..c2b6d27404b7 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct 
> > > rb_node *rb)
> > >   return rb_entry(rb, struct i915_priolist, node);
> > >  }
> > >  
> > > -/* Future patches will use this function */
> > > -__attribute__ ((unused))
> > >  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 
> > > index)
> > >  {
> > >   struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct 
> > > intel_guc *guc, u32 index)
> > >   return [index];
> > >  }
> > >  
> > > +static inline struct intel_context *__get_context(struct intel_guc *guc, 
> > > 

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Rob Clark
On Tue, May 11, 2021 at 10:21 AM Daniel Vetter  wrote:
>
> On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote:
> > On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
> > >
> > > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > > > >
> > > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > > > >
> > > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > > "video
> > > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > > CRTC
> > > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark 
> > > > > > >
> > > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > > legacy uapi
> > > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > > anything
> > > > > > > beyond what userspace also has available. So adding hacks for 
> > > > > > > them feels
> > > > > > > really bad.
> > > > > >
> > > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > > (or similar) is actually required or is a no-op.
> > > > > >
> > > > > > But it is perhaps less of a problem because this essentially boils
> > > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > > non-vsync'd rendering just pageflips and throws away extra frames 
> > > > > > from
> > > > > > the app?
> > > > >
> > > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > > hw. bare metal x11 is just especially stupid about it :-)
> > > > >
> > > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > > either.
> > > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > > also
> > > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), 
> > > > > > > and the fix
> > > > > > > there was to have a worker which batches up all the updates and 
> > > > > > > avoids any
> > > > > > > stalls in bad places.
> > > > > >
> > > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > > vblank.  OTOH it is a pretty big problem for x11
> > > > >
> > > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > > (if sometimes flushes each character, so you have to pile them up into
> > > > > a single update if that's still pending).
> > > > >
> > > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > > probably get
> > > > > > > away with assuming there's only a single fb, so the 
> > > > > > > implementation becomes
> > > > > > > pretty simple:
> > > > > > >
> > > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > > - if there's already a dirty fb pending on a different fb, we 
> > > > > > > stall for
> > > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > > track is
> > > > > > >   reset to NULL)
> > > > > > > - if it's pending on the same fb we just toss away all the 
> > > > > > > updates and go
> > > > > > >   with a full update, since merging the clip rects is too much 
> > > > > > > work :-) I
> > > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > > just have
> > > > > > >   an overall bounding box
> > > > > >
> > > > > > This doesn't really fix the problem, you still end up delaying 
> > > > > > sending
> > > > > > the next back-buffer to mesa
> > > > >
> > > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > > So how would anything get held up in userspace.
> > > >
> > > > the part about stalling if a dirtyfb is pending was what I was worried
> > > > about.. but I suppose you meant the worker stalling, rather than
> > > > userspace stalling (where I had interpreted it the other way around).
> > > > As soon as userspace needs to stall, you're losing again.
> > >
> > > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > > of dirtyfb request in the kernel.
> > >
> > > But also I never expect userspace that uses dirtyfb to actually hit this
> > > stall point (otherwise we'd need to look at this again). It would really
> > > be only there as defense against abuse.
> >
> > I don't believe modesetting ddx throttles dirtyfb, it (indirectly)
> > calls this from it's BlockHandler.. so if you do end up blocking after
> > the N'th dirtyfb, you are still going to end 

Re: [PATCH] drm/doc/rfc: drop the i915_gem_lmem.h header

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 07:28:08PM +0200, Daniel Vetter wrote:
> On Tue, May 11, 2021 at 06:03:56PM +0100, Matthew Auld wrote:
> > The proper headers have now landed in include/uapi/drm/i915_drm.h, so we
> > can drop i915_gem_lmem.h and instead just reference the real headers for
> > pulling in the kernel doc.
> > 
> > Suggested-by: Daniel Vetter 
> > Signed-off-by: Matthew Auld 
> 
> Reviewed-by: Daniel Vetter 
> 
> I guess we need to have a note that when we land the pciid for dg1 to move
> all the remaining bits over to real docs and delete the i915 lmem rfc. But
> everything in due time.

One thing I forgot: The include stanza will I think result in the
explicitly included functions not showing up in the normal driver uapi
docs. Which I think is fine while we settle all this. Or do I get this
wrong?
-Daniel

> -Daniel
> 
> > ---
> >  Documentation/gpu/rfc/i915_gem_lmem.h   | 237 
> >  Documentation/gpu/rfc/i915_gem_lmem.rst |   6 +-
> >  2 files changed, 3 insertions(+), 240 deletions(-)
> >  delete mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
> > 
> > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h 
> > b/Documentation/gpu/rfc/i915_gem_lmem.h
> > deleted file mode 100644
> > index d9c61bea0556..
> > --- a/Documentation/gpu/rfc/i915_gem_lmem.h
> > +++ /dev/null
> > @@ -1,237 +0,0 @@
> > -/**
> > - * enum drm_i915_gem_memory_class - Supported memory classes
> > - */
> > -enum drm_i915_gem_memory_class {
> > -   /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> > -   I915_MEMORY_CLASS_SYSTEM = 0,
> > -   /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> > -   I915_MEMORY_CLASS_DEVICE,
> > -};
> > -
> > -/**
> > - * struct drm_i915_gem_memory_class_instance - Identify particular memory 
> > region
> > - */
> > -struct drm_i915_gem_memory_class_instance {
> > -   /** @memory_class: See enum drm_i915_gem_memory_class */
> > -   __u16 memory_class;
> > -
> > -   /** @memory_instance: Which instance */
> > -   __u16 memory_instance;
> > -};
> > -
> > -/**
> > - * struct drm_i915_memory_region_info - Describes one region as known to 
> > the
> > - * driver.
> > - *
> > - * Note that we reserve some stuff here for potential future work. As an 
> > example
> > - * we might want expose the capabilities for a given region, which could 
> > include
> > - * things like if the region is CPU mappable/accessible, what are the 
> > supported
> > - * mapping types etc.
> > - *
> > - * Note that to extend struct drm_i915_memory_region_info and struct
> > - * drm_i915_query_memory_regions in the future the plan is to do the 
> > following:
> > - *
> > - * .. code-block:: C
> > - *
> > - * struct drm_i915_memory_region_info {
> > - * struct drm_i915_gem_memory_class_instance region;
> > - * union {
> > - * __u32 rsvd0;
> > - * __u32 new_thing1;
> > - * };
> > - * ...
> > - * union {
> > - * __u64 rsvd1[8];
> > - * struct {
> > - * __u64 new_thing2;
> > - * __u64 new_thing3;
> > - * ...
> > - * };
> > - * };
> > - * };
> > - *
> > - * With this things should remain source compatible between versions for
> > - * userspace, even as we add new fields.
> > - *
> > - * Note this is using both struct drm_i915_query_item and struct 
> > drm_i915_query.
> > - * For this new query we are adding the new query id 
> > DRM_I915_QUERY_MEMORY_REGIONS
> > - * at _i915_query_item.query_id.
> > - */
> > -struct drm_i915_memory_region_info {
> > -   /** @region: The class:instance pair encoding */
> > -   struct drm_i915_gem_memory_class_instance region;
> > -
> > -   /** @rsvd0: MBZ */
> > -   __u32 rsvd0;
> > -
> > -   /** @probed_size: Memory probed by the driver (-1 = unknown) */
> > -   __u64 probed_size;
> > -
> > -   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> > -   __u64 unallocated_size;
> > -
> > -   /** @rsvd1: MBZ */
> > -   __u64 rsvd1[8];
> > -};
> > -
> > -/**
> > - * struct drm_i915_query_memory_regions
> > - *
> > - * The region info query enumerates all regions known to the driver by 
> > filling
> > - * in an array of struct drm_i915_memory_region_info structures.
> > - *
> > - * Example for getting the list of supported regions:
> > - *
> > - * .. code-block:: C
> > - *
> > - * struct drm_i915_query_memory_regions *info;
> > - * struct drm_i915_query_item item = {
> > - * .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> > - * };
> > - * struct drm_i915_query query = {
> > - * .num_items = 1,
> > - * .items_ptr = (uintptr_t),
> > - * };
> > - * int err, i;
> > - *
> > - * // First query the size of the blob we need, this needs to be large
> > - * // enough to hold our array of regions. The kernel will fill out the
> > - * // item.length for us, which is the number of bytes we need.
> > - * err = ioctl(fd, DRM_IOCTL_I915_QUERY, );

Re: [PATCH] drm/doc/rfc: drop the i915_gem_lmem.h header

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 06:03:56PM +0100, Matthew Auld wrote:
> The proper headers have now landed in include/uapi/drm/i915_drm.h, so we
> can drop i915_gem_lmem.h and instead just reference the real headers for
> pulling in the kernel doc.
> 
> Suggested-by: Daniel Vetter 
> Signed-off-by: Matthew Auld 

Reviewed-by: Daniel Vetter 

I guess we need to have a note that when we land the pciid for dg1 to move
all the remaining bits over to real docs and delete the i915 lmem rfc. But
everything in due time.
-Daniel

> ---
>  Documentation/gpu/rfc/i915_gem_lmem.h   | 237 
>  Documentation/gpu/rfc/i915_gem_lmem.rst |   6 +-
>  2 files changed, 3 insertions(+), 240 deletions(-)
>  delete mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h
> 
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h 
> b/Documentation/gpu/rfc/i915_gem_lmem.h
> deleted file mode 100644
> index d9c61bea0556..
> --- a/Documentation/gpu/rfc/i915_gem_lmem.h
> +++ /dev/null
> @@ -1,237 +0,0 @@
> -/**
> - * enum drm_i915_gem_memory_class - Supported memory classes
> - */
> -enum drm_i915_gem_memory_class {
> - /** @I915_MEMORY_CLASS_SYSTEM: System memory */
> - I915_MEMORY_CLASS_SYSTEM = 0,
> - /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
> - I915_MEMORY_CLASS_DEVICE,
> -};
> -
> -/**
> - * struct drm_i915_gem_memory_class_instance - Identify particular memory 
> region
> - */
> -struct drm_i915_gem_memory_class_instance {
> - /** @memory_class: See enum drm_i915_gem_memory_class */
> - __u16 memory_class;
> -
> - /** @memory_instance: Which instance */
> - __u16 memory_instance;
> -};
> -
> -/**
> - * struct drm_i915_memory_region_info - Describes one region as known to the
> - * driver.
> - *
> - * Note that we reserve some stuff here for potential future work. As an 
> example
> - * we might want expose the capabilities for a given region, which could 
> include
> - * things like if the region is CPU mappable/accessible, what are the 
> supported
> - * mapping types etc.
> - *
> - * Note that to extend struct drm_i915_memory_region_info and struct
> - * drm_i915_query_memory_regions in the future the plan is to do the 
> following:
> - *
> - * .. code-block:: C
> - *
> - *   struct drm_i915_memory_region_info {
> - *   struct drm_i915_gem_memory_class_instance region;
> - *   union {
> - *   __u32 rsvd0;
> - *   __u32 new_thing1;
> - *   };
> - *   ...
> - *   union {
> - *   __u64 rsvd1[8];
> - *   struct {
> - *   __u64 new_thing2;
> - *   __u64 new_thing3;
> - *   ...
> - *   };
> - *   };
> - *   };
> - *
> - * With this things should remain source compatible between versions for
> - * userspace, even as we add new fields.
> - *
> - * Note this is using both struct drm_i915_query_item and struct 
> drm_i915_query.
> - * For this new query we are adding the new query id 
> DRM_I915_QUERY_MEMORY_REGIONS
> - * at _i915_query_item.query_id.
> - */
> -struct drm_i915_memory_region_info {
> - /** @region: The class:instance pair encoding */
> - struct drm_i915_gem_memory_class_instance region;
> -
> - /** @rsvd0: MBZ */
> - __u32 rsvd0;
> -
> - /** @probed_size: Memory probed by the driver (-1 = unknown) */
> - __u64 probed_size;
> -
> - /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
> - __u64 unallocated_size;
> -
> - /** @rsvd1: MBZ */
> - __u64 rsvd1[8];
> -};
> -
> -/**
> - * struct drm_i915_query_memory_regions
> - *
> - * The region info query enumerates all regions known to the driver by 
> filling
> - * in an array of struct drm_i915_memory_region_info structures.
> - *
> - * Example for getting the list of supported regions:
> - *
> - * .. code-block:: C
> - *
> - *   struct drm_i915_query_memory_regions *info;
> - *   struct drm_i915_query_item item = {
> - *   .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
> - *   };
> - *   struct drm_i915_query query = {
> - *   .num_items = 1,
> - *   .items_ptr = (uintptr_t),
> - *   };
> - *   int err, i;
> - *
> - *   // First query the size of the blob we need, this needs to be large
> - *   // enough to hold our array of regions. The kernel will fill out the
> - *   // item.length for us, which is the number of bytes we need.
> - *   err = ioctl(fd, DRM_IOCTL_I915_QUERY, );
> - *   if (err) ...
> - *
> - *   info = calloc(1, item.length);
> - *   // Now that we allocated the required number of bytes, we call the ioctl
> - *   // again, this time with the data_ptr pointing to our newly allocated
> - *   // blob, which the kernel can then populate with the all the region 
> info.
> - *   item.data_ptr = (uintptr_t),
> - *
> - *   err = ioctl(fd, DRM_IOCTL_I915_QUERY, );
> - *   if (err) ...
> - *
> - *   // We can now access each region 

Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:19:09AM -0700, Stephen Boyd wrote:
> Quoting Daniel Vetter (2021-05-11 06:39:36)
> > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki  
> > wrote:
> > >
> > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:
> > >
> > > [cut]
> > >
> > > >
> > > > >
> > > > > > I will try it, but then I wonder about things like system wide
> > > > > > suspend/resume too. The drm encoder chain would need to reimplement 
> > > > > > the
> > > > > > logic for system wide suspend/resume so that any PM ops attached to 
> > > > > > the
> > > > > > msm device run in the correct order. Right now the bridge PM ops 
> > > > > > will
> > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > > After this change, the msm PM ops will run, the bridge PM ops will 
> > > > > > run,
> > > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > > problem if we're suspending the DSI encoder while the bridge is 
> > > > > > still
> > > > > > active.
> > > > >
> > > > > Yup suspend/resume has the exact same problem as shutdown.
> > > >
> > > > I think suspend/resume has the exact opposite problem. At least I think
> > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > > operates from the top down when we want bottom up? I admit I have no
> > > > idea what is supposed to happen here.
> > >
> > > Why would the system-wide suspend ordering be different from the
> > > shutdown ordering?
> >
> > At least my point was that both shutdown and suspend/resume have the
> > same problem, and the righ fix is (I think at least) to add these
> > hooks to the component.c aggregate ops structure. Hence just adding
> > new callbacks for shutdown will be an incomplete solution.
> 
> To add proper hooks to component.c we'll need to make the aggregate
> device into a 'struct device' and make a bus for them that essentially
> adds the aggregate device to the bus once all the components are
> registered. The bind/unbind can be ported to probe/remove, and then the
> aggregate driver can get PM ops that run before the component devices
> run their PM ops.
> 
> Let me go try it out and see if I can make it minimally invasive so that
> the migration path is simple.

Thanks for volunteeering. Please cc Greg KH so we make sure we're not
doing this wrongly wrt the device model.
-Daniel

> > I don't feel like changing the global device order is the right
> > approach, since essentially that's what component was meant to fix.
> > Except it's incomplete since it only provides a solution for
> > bind/unbind and not for shutdown or suspend/resume as other global
> > state changes. I think some drivers "fixed" this by putting stuff like
> > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> > make sure that everything is ready with that trick. But that doesn't
> > compose very well :-/
> 
> Yeah it looks like msm is using prepare/complete for this so that it can
> jump in early and suspend the display pipeline before the components
> suspend themselves. The shutdown path only has one callback so we can't
> play the same games.

Yeah there's tons of hacks. i915 component usage with audio has similar
tricks to make suspend/resume work.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Stephen Boyd
Quoting Russell King - ARM Linux admin (2021-05-11 07:42:37)
> On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
>
> You really are not supposed to be doing anything with component devices
> once they have been unbound. You can do stuff with them only between the
> bind() and the unbind() callbacks for the host device.

Got it. The device is not unbound in this case so this isn't the
problem.

>
> Access to the host devices outside of that is totally undefined and
> should not be done.
>
> The shutdown callback should be fine as long as the other devices are
> still bound, but there will be implications if the shutdown order
> matters.
>
> However, randomly pulling devices around in the DPM list sounds to me
> like a very bad idea. What happens if such re-orderings result in a
> child device being shutdown after a parent device has been shut down?
>

Fair enough. I'll cook up a 'component' bus and see if that can fix this
properly. It will add a new device for the aggregate driver that does
the bind/unbind so the host/parent device will still be ordered on the
DPM list at the same place. The new aggregate device will be after the
components and we'll attach the PM ops and shutdown hooks to that.


Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:19:57AM -0700, Rob Clark wrote:
> On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
> >
> > On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > > >
> > > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > > > >
> > > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > > From: Rob Clark 
> > > > > > >
> > > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > > "video
> > > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > > optional helper vfunc to determine if a plane is attached to a 
> > > > > > > CRTC
> > > > > > > that actually needs dirtyfb, and skip over them.
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark 
> > > > > >
> > > > > > So this is a bit annoying because the idea of all these "remap 
> > > > > > legacy uapi
> > > > > > to atomic constructs" helpers is that they shouldn't need/use 
> > > > > > anything
> > > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > > feels
> > > > > > really bad.
> > > > >
> > > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > > (or similar) is actually required or is a no-op.
> > > > >
> > > > > But it is perhaps less of a problem because this essentially boils
> > > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > > the app?
> > > >
> > > > Yeah it's about not adequately batching up rendering and syncing with
> > > > hw. bare metal x11 is just especially stupid about it :-)
> > > >
> > > > > > Also I feel like it's not entirely the right thing to do here 
> > > > > > either.
> > > > > > We've had this problem already on the fbcon emulation side (which 
> > > > > > also
> > > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > > the fix
> > > > > > there was to have a worker which batches up all the updates and 
> > > > > > avoids any
> > > > > > stalls in bad places.
> > > > >
> > > > > I'm not too worried about fbcon not being able to render faster than
> > > > > vblank.  OTOH it is a pretty big problem for x11
> > > >
> > > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > > (if sometimes flushes each character, so you have to pile them up into
> > > > a single update if that's still pending).
> > > >
> > > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > > probably get
> > > > > > away with assuming there's only a single fb, so the implementation 
> > > > > > becomes
> > > > > > pretty simple:
> > > > > >
> > > > > > - 1 worker, and we keep track of a single pending fb
> > > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > > for
> > > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > > track is
> > > > > >   reset to NULL)
> > > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > > and go
> > > > > >   with a full update, since merging the clip rects is too much work 
> > > > > > :-) I
> > > > > >   think there's helpers so you could be slightly more clever and 
> > > > > > just have
> > > > > >   an overall bounding box
> > > > >
> > > > > This doesn't really fix the problem, you still end up delaying sending
> > > > > the next back-buffer to mesa
> > > >
> > > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > > tracking corruption is possible, but that's not the kernel's problem.
> > > > So how would anything get held up in userspace.
> > >
> > > the part about stalling if a dirtyfb is pending was what I was worried
> > > about.. but I suppose you meant the worker stalling, rather than
> > > userspace stalling (where I had interpreted it the other way around).
> > > As soon as userspace needs to stall, you're losing again.
> >
> > Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> > of dirtyfb request in the kernel.
> >
> > But also I never expect userspace that uses dirtyfb to actually hit this
> > stall point (otherwise we'd need to look at this again). It would really
> > be only there as defense against abuse.
> 
> I don't believe modesetting ddx throttles dirtyfb, it (indirectly)
> calls this from it's BlockHandler.. so if you do end up blocking after
> the N'th dirtyfb, you are still going to end up stalling for vblank,
> you are just deferring that for a frame or two..

Nope, that's not what I mean.

By default we pile up the updates, so you _never_ stall. The worker then
takes the entire update every time it runs and batches them up.

We _only_ stall when we get a dirtyfb with a 

Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Rafael J. Wysocki
On Tue, May 11, 2021 at 7:00 PM Stephen Boyd  wrote:
>
> Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement 
> > > > > the
> > > > > logic for system wide suspend/resume so that any PM ops attached to 
> > > > > the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will 
> > > > > run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> I don't really know. I'm mostly noting that today the order of suspend
> is to suspend the bridge device first and then the aggregate device. If
> the suspend of the aggregate device is traversing the devices like
> drm_atomic_helper_shutdown() then it would operate on the bridge device
> after it has been suspended, like is happening during shutdown. But it
> looks like that isn't happening. At least for the msm driver we're
> suspending the aggregate device after the bridge, and there are some
> weird usages of prepare and complete in there (see msm_pm_prepare() and
> msm_pm_complete) which makes me think that it's all working around this
> component code.

Well, it looks like the "prepare" phase is used sort-of against the
rules (because "prepare" is not supposed to make changes to the
hardware configuration or at least that is not its role) in order to
work around an ordering issue that is present in shutdown which
doesn't have a "prepare" phase.

> The prepare phase is going to suspend the display pipeline, and then the
> bridge device will run its suspend hooks, and then the aggregate driver
> will run its suspend hooks. If we had a proper device for the aggregate
> device instead of the bind/unbind component hooks we could clean this
> up.

I'm not sufficiently familiar with the component code to add anything
constructive here, but generally speaking it looks like the "natural"
dpm_list ordering does not match the order in which the devices in
question should be suspended (or shut down for that matter), so indeed
it is necessary to reorder dpm_list this way or another.

Please also note that it generally may not be sufficient to reorder
dpm_list if the devices are suspended and resumed asynchronously
during system-wide transitions, because in that case the callbacks of
different devices are only started in the dpm_list order, but they may
be completed in a different order (and of course they may run in
parallel with each other).

Shutdown is simpler, because it runs the callback synchronously for
all devices IIRC.


Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 06:28:25PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote:
> > We receive notification of an engine reset from GuC at its
> > completion. Meaning GuC has potentially cleared any HW state
> > we may have been interested in capturing. GuC resumes scheduling
> > on the engine post-reset, as the resets are meant to be transparent,
> > further muddling our error state.
> > 
> > There is ongoing work to define an API for a GuC debug state dump. The
> > suggestion for now is to manually disable FW initiated resets in cases
> > where debug state is needed.
> > 
> > Signed-off-by: Matthew Brost 
> 
> This looks a bit backwards to me:
> 

Definitely a bit hacky but this patch does the best to capture the error as it
can,

> - I figured we should capture error state when we get the G2H, in which
>   case I hope we do know which the offending context was that got shot.
>

We know which context was shot based on the G2H. See 'hung_ce' in this patch.

> - For now we're missing the hw state, but we should still be able to
>   capture the buffers userspace wants us to capture. So that could be
>   wired up already?

Which buffers exactly? We dump all buffers associated with the context. 

> 
> But yeah register state capturing needs support from GuC fw.
>
> I think this is a big enough miss in GuC features that we should list it
> on the rfc as a thing to fix.

Agree this needs to be fixed.

Matt

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c   | 20 +++
> >  drivers/gpu/drm/i915/gt/intel_context.h   |  3 ++
> >  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 --
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +--
> >  drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++---
> >  7 files changed, 91 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 2f01437056a8..3fe7794b2bfd 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -514,6 +514,26 @@ struct i915_request 
> > *intel_context_create_request(struct intel_context *ce)
> > return rq;
> >  }
> >  
> > +struct i915_request *intel_context_find_active_request(struct 
> > intel_context *ce)
> > +{
> > +   struct i915_request *rq, *active = NULL;
> > +   unsigned long flags;
> > +
> > +   GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
> > +
> > +   spin_lock_irqsave(>guc_active.lock, flags);
> > +   list_for_each_entry_reverse(rq, >guc_active.requests,
> > +   sched.link) {
> > +   if (i915_request_completed(rq))
> > +   break;
> > +
> > +   active = rq;
> > +   }
> > +   spin_unlock_irqrestore(>guc_active.lock, flags);
> > +
> > +   return active;
> > +}
> > +
> >  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >  #include "selftest_context.c"
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 9b211ca5ecc7..d2b499ed8a05 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct 
> > intel_context *ce,
> >  
> >  struct i915_request *intel_context_create_request(struct intel_context 
> > *ce);
> >  
> > +struct i915_request *
> > +intel_context_find_active_request(struct intel_context *ce);
> > +
> >  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
> >  {
> > return u64_to_ptr(struct intel_ring, sz);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
> > b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index 3321d0917a99..bb94963a9fa2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct 
> > intel_engine_cs *engine,
> >ktime_t *now);
> >  
> >  struct i915_request *
> > -intel_engine_find_active_request(struct intel_engine_cs *engine);
> > +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
> >  
> >  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
> >  
> > @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs 
> > *engine, unsigned int sibling)
> > return engine->cops->get_sibling(engine, sibling);
> >  }
> >  
> > +static inline void
> > +intel_engine_set_hung_context(struct intel_engine_cs *engine,
> > + struct intel_context *ce)
> > +{
> > +   engine->hung_ce = ce;
> > +}
> > +
> > +static inline void
> > +intel_engine_clear_hung_context(struct intel_engine_cs *engine)
> > +{
> > +   intel_engine_set_hung_context(engine, NULL);
> > +}
> > +
> > 

Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Stephen Boyd
Quoting Daniel Vetter (2021-05-11 06:39:36)
> On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki  wrote:
> >
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement 
> > > > > the
> > > > > logic for system wide suspend/resume so that any PM ops attached to 
> > > > > the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will 
> > > > > run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> At least my point was that both shutdown and suspend/resume have the
> same problem, and the righ fix is (I think at least) to add these
> hooks to the component.c aggregate ops structure. Hence just adding
> new callbacks for shutdown will be an incomplete solution.

To add proper hooks to component.c we'll need to make the aggregate
device into a 'struct device' and make a bus for them that essentially
adds the aggregate device to the bus once all the components are
registered. The bind/unbind can be ported to probe/remove, and then the
aggregate driver can get PM ops that run before the component devices
run their PM ops.

Let me go try it out and see if I can make it minimally invasive so that
the migration path is simple.

>
> I don't feel like changing the global device order is the right
> approach, since essentially that's what component was meant to fix.
> Except it's incomplete since it only provides a solution for
> bind/unbind and not for shutdown or suspend/resume as other global
> state changes. I think some drivers "fixed" this by putting stuff like
> drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> make sure that everything is ready with that trick. But that doesn't
> compose very well :-/

Yeah it looks like msm is using prepare/complete for this so that it can
jump in early and suspend the display pipeline before the components
suspend themselves. The shutdown path only has one callback so we can't
play the same games.


Re: [PATCH] drm: fix semicolon.cocci warnings

2021-05-11 Thread Daniel Vetter
On Wed, May 12, 2021 at 12:11:23AM +0800, kernel test robot wrote:
> From: kernel test robot 
> 
> drivers/gpu/drm/kmb/kmb_dsi.c:284:3-4: Unneeded semicolon
> drivers/gpu/drm/kmb/kmb_dsi.c:304:3-4: Unneeded semicolon
> drivers/gpu/drm/kmb/kmb_dsi.c:321:3-4: Unneeded semicolon
> drivers/gpu/drm/kmb/kmb_dsi.c:340:3-4: Unneeded semicolon
> drivers/gpu/drm/kmb/kmb_dsi.c:364:2-3: Unneeded semicolon
> 
> 
>  Remove unneeded semicolon.
> 
> Generated by: scripts/coccinelle/misc/semicolon.cocci
> 
> Fixes: ade896460e4a ("drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY")
> CC: Geert Uytterhoeven 
> Reported-by: kernel test robot 
> Signed-off-by: kernel test robot 

Applied to drm-misc-next for 5.14, thanks for the patch.
-Daniel

> ---
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   1140ab592e2ebf8153d2b322604031a8868ce7a5
> commit: ade896460e4a62f5e4a892a98d254937f6f5b64c drm: DRM_KMB_DISPLAY should 
> depend on ARCH_KEEMBAY
> :: branch date: 18 hours ago
> :: commit date: 6 months ago
> 
>  kmb_dsi.c |   10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> --- a/drivers/gpu/drm/kmb/kmb_dsi.c
> +++ b/drivers/gpu/drm/kmb/kmb_dsi.c
> @@ -281,7 +281,7 @@ static u32 mipi_get_datatype_params(u32
>   default:
>   DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
>   return -EINVAL;
> - };
> + }
>   break;
>   case DSI_LP_DT_PPS_YCBCR422_16B:
>   data_type_param.size_constraint_pixels = 2;
> @@ -301,7 +301,7 @@ static u32 mipi_get_datatype_params(u32
>   default:
>   DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
>   return -EINVAL;
> - };
> + }
>   break;
>   case DSI_LP_DT_LPPS_YCBCR422_20B:
>   case DSI_LP_DT_PPS_YCBCR422_24B:
> @@ -318,7 +318,7 @@ static u32 mipi_get_datatype_params(u32
>   default:
>   DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
>   return -EINVAL;
> - };
> + }
>   break;
>   case DSI_LP_DT_PPS_RGB565_16B:
>   data_type_param.size_constraint_pixels = 1;
> @@ -337,7 +337,7 @@ static u32 mipi_get_datatype_params(u32
>   default:
>   DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
>   return -EINVAL;
> - };
> + }
>   break;
>   case DSI_LP_DT_PPS_RGB666_18B:
>   data_type_param.size_constraint_pixels = 4;
> @@ -361,7 +361,7 @@ static u32 mipi_get_datatype_params(u32
>   default:
>   DRM_ERROR("DSI: Invalid data_type %d\n", data_type);
>   return -EINVAL;
> - };
> + }
>  
>   *params = data_type_param;
>   return 0;

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] Documentation: gpu: Mention the requirements for new properties

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 05:55:12PM +0200, Maxime Ripard wrote:
> New KMS properties come with a bunch of requirements to avoid each
> driver from running their own, inconsistent, set of properties,
> eventually leading to issues like property conflicts, inconsistencies
> between drivers and semantics, etc.
> 
> Let's document what we expect.
> 
> Signed-off-by: Maxime Ripard 
> ---
>  Documentation/gpu/drm-kms.rst | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index 87e5023e3f55..30f4c376f419 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -463,6 +463,24 @@ KMS Properties
>  This section of the documentation is primarily aimed at user-space 
> developers.
>  For the driver APIs, see the other sections.
>  
> +Requirements
> +
> +
> +KMS drivers might need to add extra properties to support new features.
> +Each new property introduced in a driver need to meet a few
> +requirements, in addition to the one mentioned above.:
> +
> +- It must be standardized, with some documentation to describe the
> +  property can be used.
> +
> +- It must provide a generic helper in the core code to register that
> +  property on the object it attaches to.

Maybe also include anything that drivers might want to precompute, e.g. we
have helpers for cliprects.

> +
> +- Its content must be decoded by the core and provided in the object

object's
> +  associated state structure.
> +
> +- An IGT test must be submitted.

"... where reasonable."

We have that disclaimer already here:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#testing-requirements-for-userspace-api

I think would be good to cross-reference the uapi rules in general

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

With the bikesheds addressed:

Reviewed-by: Daniel Vetter 

But this needs ideally a pile of acks from most display driver teams.
-Daniel

> +
>  Property Types and Blob Property Support
>  
>  
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Rob Clark
On Tue, May 11, 2021 at 9:44 AM Daniel Vetter  wrote:
>
> On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> > On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> > >
> > > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > > >
> > > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > > >
> > > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > > From: Rob Clark 
> > > > > >
> > > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on 
> > > > > > "video
> > > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > > > > that actually needs dirtyfb, and skip over them.
> > > > > >
> > > > > > Signed-off-by: Rob Clark 
> > > > >
> > > > > So this is a bit annoying because the idea of all these "remap legacy 
> > > > > uapi
> > > > > to atomic constructs" helpers is that they shouldn't need/use anything
> > > > > beyond what userspace also has available. So adding hacks for them 
> > > > > feels
> > > > > really bad.
> > > >
> > > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > > (or similar) is actually required or is a no-op.
> > > >
> > > > But it is perhaps less of a problem because this essentially boils
> > > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > > the app?
> > >
> > > Yeah it's about not adequately batching up rendering and syncing with
> > > hw. bare metal x11 is just especially stupid about it :-)
> > >
> > > > > Also I feel like it's not entirely the right thing to do here either.
> > > > > We've had this problem already on the fbcon emulation side (which also
> > > > > shouldn't be able to peek behind the atomic kms uapi curtain), and 
> > > > > the fix
> > > > > there was to have a worker which batches up all the updates and 
> > > > > avoids any
> > > > > stalls in bad places.
> > > >
> > > > I'm not too worried about fbcon not being able to render faster than
> > > > vblank.  OTOH it is a pretty big problem for x11
> > >
> > > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > > the same with fbcon, which trivially can get ahead of vblank otherwise
> > > (if sometimes flushes each character, so you have to pile them up into
> > > a single update if that's still pending).
> > >
> > > > > Since this is for frontbuffer rendering userspace only we can 
> > > > > probably get
> > > > > away with assuming there's only a single fb, so the implementation 
> > > > > becomes
> > > > > pretty simple:
> > > > >
> > > > > - 1 worker, and we keep track of a single pending fb
> > > > > - if there's already a dirty fb pending on a different fb, we stall 
> > > > > for
> > > > >   the worker to start processing that one already (i.e. the fb we 
> > > > > track is
> > > > >   reset to NULL)
> > > > > - if it's pending on the same fb we just toss away all the updates 
> > > > > and go
> > > > >   with a full update, since merging the clip rects is too much work 
> > > > > :-) I
> > > > >   think there's helpers so you could be slightly more clever and just 
> > > > > have
> > > > >   an overall bounding box
> > > >
> > > > This doesn't really fix the problem, you still end up delaying sending
> > > > the next back-buffer to mesa
> > >
> > > With this the dirtyfb would never block. Also glorious frontbuffer
> > > tracking corruption is possible, but that's not the kernel's problem.
> > > So how would anything get held up in userspace.
> >
> > the part about stalling if a dirtyfb is pending was what I was worried
> > about.. but I suppose you meant the worker stalling, rather than
> > userspace stalling (where I had interpreted it the other way around).
> > As soon as userspace needs to stall, you're losing again.
>
> Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
> of dirtyfb request in the kernel.
>
> But also I never expect userspace that uses dirtyfb to actually hit this
> stall point (otherwise we'd need to look at this again). It would really
> be only there as defense against abuse.

I don't believe modesetting ddx throttles dirtyfb, it (indirectly)
calls this from it's BlockHandler.. so if you do end up blocking after
the N'th dirtyfb, you are still going to end up stalling for vblank,
you are just deferring that for a frame or two..

The thing is, for a push style panel, you don't necessarily have to
wait for "vblank" (because "vblank" isn't necessarily a real thing),
so in that scenario dirtyfb could in theory be fast.  What you want to
do is fundamentally different for push vs pull style displays.

> > > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > > of drm_atomic_helper_dirtyfb()
> > >
> > > That's still using information that userspace doesn't 

Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 05:26:34PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> > Add lrc descriptor context lookup array which can resolve the
> > intel_context from the lrc descriptor index. In addition to lookup, it
> > can determine in the lrc descriptor context is currently registered with
> > the GuC by checking if an entry for a descriptor index is present.
> > Future patches in the series will make use of this array.
> > 
> > Cc: John Harrison 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  5 +++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +--
> >  2 files changed, 35 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index d84f37afb9d8..2eb6c497e43c 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -6,6 +6,8 @@
> >  #ifndef _INTEL_GUC_H_
> >  #define _INTEL_GUC_H_
> >  
> > +#include "linux/xarray.h"
> > +
> >  #include "intel_uncore.h"
> >  #include "intel_guc_fw.h"
> >  #include "intel_guc_fwif.h"
> > @@ -47,6 +49,9 @@ struct intel_guc {
> > struct i915_vma *lrc_desc_pool;
> > void *lrc_desc_pool_vaddr;
> >  
> > +   /* guc_id to intel_context lookup */
> > +   struct xarray context_lookup;
> 
> The current code sets a disastrous example, but for stuff like this it's
> always good to explain the locking, and who's holding references and how
> you're handling cycles. Since I guess the intel_context also holds the
> guc_id alive somehow.
> 

I think (?) I know what you mean by this comment. How about adding:

'If an entry in the the context_lookup is present, that means a context
associated with the guc_id is registered with the GuC. We use this xarray as a
lookup mechanism when the GuC communicate with the i915 about the context.'

> Again holds for the entire series, where it makes sense (as in we don't
> expect to rewrite the entire code anyway).

Slightly out of order but one of the last patches in the series, 'Update GuC
documentation' adds a big section of comments that attempts to clarify how all
of this code works. I likely should add a section explaining the data structures
as well.

Matt

> -Daniel
> 
> > +
> > /* Control params for fw initialization */
> > u32 params[GUC_CTL_MAX_DWORDS];
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 6acc1ef34f92..c2b6d27404b7 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct 
> > rb_node *rb)
> > return rb_entry(rb, struct i915_priolist, node);
> >  }
> >  
> > -/* Future patches will use this function */
> > -__attribute__ ((unused))
> >  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 
> > index)
> >  {
> > struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct 
> > intel_guc *guc, u32 index)
> > return [index];
> >  }
> >  
> > +static inline struct intel_context *__get_context(struct intel_guc *guc, 
> > u32 id)
> > +{
> > +   struct intel_context *ce = xa_load(>context_lookup, id);
> > +
> > +   GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> > +
> > +   return ce;
> > +}
> > +
> >  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> >  {
> > u32 size;
> > @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc 
> > *guc)
> > i915_vma_unpin_and_release(>lrc_desc_pool, I915_VMA_RELEASE_MAP);
> >  }
> >  
> > +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> > +{
> > +   struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +
> > +   memset(desc, 0, sizeof(*desc));
> > +   xa_erase_irq(>context_lookup, id);
> > +}
> > +
> > +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > +{
> > +   return __get_context(guc, id);
> > +}
> > +
> > +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> > +  struct intel_context *ce)
> > +{
> > +   xa_store_irq(>context_lookup, id, ce, GFP_ATOMIC);
> > +}
> > +
> >  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >  {
> > /* Leaving stub as this function will be used in future patches */
> > @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >  */
> > GEM_BUG_ON(!guc->lrc_desc_pool);
> >  
> > +   xa_init_flags(>context_lookup, XA_FLAGS_LOCK_IRQ);
> > +
> > return 0;
> >  }
> >  
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


[PATCH] drm/doc/rfc: drop the i915_gem_lmem.h header

2021-05-11 Thread Matthew Auld
The proper headers have now landed in include/uapi/drm/i915_drm.h, so we
can drop i915_gem_lmem.h and instead just reference the real headers for
pulling in the kernel doc.

Suggested-by: Daniel Vetter 
Signed-off-by: Matthew Auld 
---
 Documentation/gpu/rfc/i915_gem_lmem.h   | 237 
 Documentation/gpu/rfc/i915_gem_lmem.rst |   6 +-
 2 files changed, 3 insertions(+), 240 deletions(-)
 delete mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h

diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h 
b/Documentation/gpu/rfc/i915_gem_lmem.h
deleted file mode 100644
index d9c61bea0556..
--- a/Documentation/gpu/rfc/i915_gem_lmem.h
+++ /dev/null
@@ -1,237 +0,0 @@
-/**
- * enum drm_i915_gem_memory_class - Supported memory classes
- */
-enum drm_i915_gem_memory_class {
-   /** @I915_MEMORY_CLASS_SYSTEM: System memory */
-   I915_MEMORY_CLASS_SYSTEM = 0,
-   /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */
-   I915_MEMORY_CLASS_DEVICE,
-};
-
-/**
- * struct drm_i915_gem_memory_class_instance - Identify particular memory 
region
- */
-struct drm_i915_gem_memory_class_instance {
-   /** @memory_class: See enum drm_i915_gem_memory_class */
-   __u16 memory_class;
-
-   /** @memory_instance: Which instance */
-   __u16 memory_instance;
-};
-
-/**
- * struct drm_i915_memory_region_info - Describes one region as known to the
- * driver.
- *
- * Note that we reserve some stuff here for potential future work. As an 
example
- * we might want expose the capabilities for a given region, which could 
include
- * things like if the region is CPU mappable/accessible, what are the supported
- * mapping types etc.
- *
- * Note that to extend struct drm_i915_memory_region_info and struct
- * drm_i915_query_memory_regions in the future the plan is to do the following:
- *
- * .. code-block:: C
- *
- * struct drm_i915_memory_region_info {
- * struct drm_i915_gem_memory_class_instance region;
- * union {
- * __u32 rsvd0;
- * __u32 new_thing1;
- * };
- * ...
- * union {
- * __u64 rsvd1[8];
- * struct {
- * __u64 new_thing2;
- * __u64 new_thing3;
- * ...
- * };
- * };
- * };
- *
- * With this things should remain source compatible between versions for
- * userspace, even as we add new fields.
- *
- * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
- * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
- * at _i915_query_item.query_id.
- */
-struct drm_i915_memory_region_info {
-   /** @region: The class:instance pair encoding */
-   struct drm_i915_gem_memory_class_instance region;
-
-   /** @rsvd0: MBZ */
-   __u32 rsvd0;
-
-   /** @probed_size: Memory probed by the driver (-1 = unknown) */
-   __u64 probed_size;
-
-   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
-   __u64 unallocated_size;
-
-   /** @rsvd1: MBZ */
-   __u64 rsvd1[8];
-};
-
-/**
- * struct drm_i915_query_memory_regions
- *
- * The region info query enumerates all regions known to the driver by filling
- * in an array of struct drm_i915_memory_region_info structures.
- *
- * Example for getting the list of supported regions:
- *
- * .. code-block:: C
- *
- * struct drm_i915_query_memory_regions *info;
- * struct drm_i915_query_item item = {
- * .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
- * };
- * struct drm_i915_query query = {
- * .num_items = 1,
- * .items_ptr = (uintptr_t),
- * };
- * int err, i;
- *
- * // First query the size of the blob we need, this needs to be large
- * // enough to hold our array of regions. The kernel will fill out the
- * // item.length for us, which is the number of bytes we need.
- * err = ioctl(fd, DRM_IOCTL_I915_QUERY, );
- * if (err) ...
- *
- * info = calloc(1, item.length);
- * // Now that we allocated the required number of bytes, we call the ioctl
- * // again, this time with the data_ptr pointing to our newly allocated
- * // blob, which the kernel can then populate with the all the region 
info.
- * item.data_ptr = (uintptr_t),
- *
- * err = ioctl(fd, DRM_IOCTL_I915_QUERY, );
- * if (err) ...
- *
- * // We can now access each region in the array
- * for (i = 0; i < info->num_regions; i++) {
- * struct drm_i915_memory_region_info mr = info->regions[i];
- * u16 class = mr.region.class;
- * u16 instance = mr.region.instance;
- *
- * 
- * }
- *
- * free(info);
- */
-struct drm_i915_query_memory_regions {
-   /** @num_regions: Number of supported regions */
-   __u32 num_regions;
-
-   /** @rsvd: MBZ 

Re: [PATCH] drm/i915: Add relocation exceptions for two other platforms

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 10:31:39AM +0200, Zbigniew Kempczyński wrote:
> We have established previously we stop using relocations starting
> from gen12 platforms with Tigerlake as an exception. Unfortunately
> we need extend transition period and support relocations for two
> other igfx platforms - Rocketlake and Alderlake.
> 
> Signed-off-by: Zbigniew Kempczyński 
> Cc: Dave Airlie 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 

So the annoying thing here is that now media-driver is fixed:

https://github.com/intel/media-driver/commit/144020c37770083974bedf59902b70b8f444c799

Which means igt is really the only thing left.

Dave, is this still ok for an acked exception, or is this now leaning
towards "just fix igt"?
-Daniel
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 297143511f99..f80da1d6d9b2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -496,11 +496,15 @@ eb_validate_vma(struct i915_execbuffer *eb,
>   struct drm_i915_gem_exec_object2 *entry,
>   struct i915_vma *vma)
>  {
> - /* Relocations are disallowed for all platforms after TGL-LP.  This
> -  * also covers all platforms with local memory.
> + /*
> +  * Relocations are disallowed starting from gen12 with some exceptions
> +  * - TGL/RKL/ADL.
>*/
>   if (entry->relocation_count &&
> - INTEL_GEN(eb->i915) >= 12 && !IS_TIGERLAKE(eb->i915))
> + INTEL_GEN(eb->i915) >= 12 && !(IS_TIGERLAKE(eb->i915) ||
> +IS_ROCKETLAKE(eb->i915) ||
> +IS_ALDERLAKE_S(eb->i915) ||
> +IS_ALDERLAKE_P(eb->i915)))
>   return -EINVAL;
>  
>   if (unlikely(entry->flags & eb->invalid_flags))
> -- 
> 2.26.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH RFC 1/3] drm: Add drm_plane_add_modifiers()

2021-05-11 Thread Daniel Vetter
On Mon, May 10, 2021 at 09:49:38PM -0400, Tina Zhang wrote:
> Add a function to add modifiers to a plane.
> 
> Signed-off-by: Tina Zhang 

For one, new functions for drivers needs kerneldoc.

But the real issue here is that you're suppoed to supply the modifiers
when creating the plane, not later on. So this function doesn't make
sense.

Please fix virtio code to use the existing functions
(drm_universal_plane_init() to be specific), or explain what that's not
possible.
-Daniel
> ---
>  drivers/gpu/drm/drm_plane.c | 41 +
>  include/drm/drm_plane.h |  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
> index b570a480090a..793b16d84f86 100644
> --- a/drivers/gpu/drm/drm_plane.c
> +++ b/drivers/gpu/drm/drm_plane.c
> @@ -288,6 +288,47 @@ int drm_universal_plane_init(struct drm_device *dev, 
> struct drm_plane *plane,
>  }
>  EXPORT_SYMBOL(drm_universal_plane_init);
>  
> +int drm_plane_add_modifiers(struct drm_device *dev,
> +   struct drm_plane *plane,
> +   const uint64_t *format_modifiers)
> +{
> + struct drm_mode_config *config = >mode_config;
> + const uint64_t *temp_modifiers = format_modifiers;
> + unsigned int format_modifier_count = 0;
> +
> + /*
> +  * Only considering adding modifiers when no modifier was
> +  * added to that plane before.
> +  */
> + if (!temp_modifiers || plane->modifier_count)
> + return -EINVAL;
> +
> + while (*temp_modifiers++ != DRM_FORMAT_MOD_INVALID)
> + format_modifier_count++;
> +
> + if (format_modifier_count)
> + config->allow_fb_modifiers = true;
> +
> + plane->modifier_count = format_modifier_count;
> + plane->modifiers = kmalloc_array(format_modifier_count,
> +  sizeof(format_modifiers[0]),
> +  GFP_KERNEL);
> +
> + if (format_modifier_count && !plane->modifiers) {
> + DRM_DEBUG_KMS("out of memory when allocating plane\n");
> + return -ENOMEM;
> + }
> +
> + memcpy(plane->modifiers, format_modifiers,
> +format_modifier_count * sizeof(format_modifiers[0]));
> + if (config->allow_fb_modifiers)
> + create_in_format_blob(dev, plane);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_plane_add_modifiers);
> +
> +
>  int drm_plane_register_all(struct drm_device *dev)
>  {
>   unsigned int num_planes = 0;
> diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
> index 50c23eb432b7..0dacdeffc3bc 100644
> --- a/include/drm/drm_plane.h
> +++ b/include/drm/drm_plane.h
> @@ -827,6 +827,9 @@ int drm_universal_plane_init(struct drm_device *dev,
>const uint64_t *format_modifiers,
>enum drm_plane_type type,
>const char *name, ...);
> +int drm_plane_add_modifiers(struct drm_device *dev,
> +struct drm_plane *plane,
> +const uint64_t *format_modifiers);
>  int drm_plane_init(struct drm_device *dev,
>  struct drm_plane *plane,
>  uint32_t possible_crtcs,
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Stephen Boyd
Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:
>
> [cut]
>
> >
> > >
> > > > I will try it, but then I wonder about things like system wide
> > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > active.
> > >
> > > Yup suspend/resume has the exact same problem as shutdown.
> >
> > I think suspend/resume has the exact opposite problem. At least I think
> > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > like is happening today. It looks like drm_atomic_helper_shutdown()
> > operates from the top down when we want bottom up? I admit I have no
> > idea what is supposed to happen here.
>
> Why would the system-wide suspend ordering be different from the
> shutdown ordering?

I don't really know. I'm mostly noting that today the order of suspend
is to suspend the bridge device first and then the aggregate device. If
the suspend of the aggregate device is traversing the devices like
drm_atomic_helper_shutdown() then it would operate on the bridge device
after it has been suspended, like is happening during shutdown. But it
looks like that isn't happening. At least for the msm driver we're
suspending the aggregate device after the bridge, and there are some
weird usages of prepare and complete in there (see msm_pm_prepare() and
msm_pm_complete) which makes me think that it's all working around this
component code.

The prepare phase is going to suspend the display pipeline, and then the
bridge device will run its suspend hooks, and then the aggregate driver
will run its suspend hooks. If we had a proper device for the aggregate
device instead of the bind/unbind component hooks we could clean this
up.


Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 05:32:29PM +0200, Christian König wrote:
> Am 11.05.21 um 16:23 schrieb Daniel Vetter:
> > On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> > > Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > > And that's just the one ioctl I know is big trouble, I'm sure we'll 
> > > > > > find
> > > > > > more funny corner cases when we roll out explicit user fencing.
> > > > > I think we can just ignore sync_file. As far as it concerns me that 
> > > > > UAPI is
> > > > > pretty much dead.
> > > > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > > > built on it.
> > > To be honest I don't think we care about Android at all.
> > we = amd or we = upstream here?
> 
> we = amd, for everybody else that is certainly a different topic.
> 
> But for now AMD is the only one running into this problem.
> 
> Could be that Nouveau sees this as well with the next hw generation, but who
> knows?
> 
> > > > Why is this not much of a problem if it's just within one driver?
> > > Because inside the same driver I can easily add the waits before 
> > > submitting
> > > the MM work as necessary.
> > What is MM work here now?
> 
> MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.
> 
> > > > > > > Adding implicit synchronization on top of that is then rather 
> > > > > > > trivial.
> > > > > > Well that's what I disagree with, since I already see some problems 
> > > > > > that I
> > > > > > don't think we can overcome (the atomic ioctl is one). And that's 
> > > > > > with us
> > > > > > only having a fairly theoretical understanding of the overall 
> > > > > > situation.
> > > > > But how should we then ever support user fences with the atomic IOCTL?
> > > > > 
> > > > > We can't wait in user space since that will disable the support for 
> > > > > waiting
> > > > > in the hardware.
> > > > Well, figure it out :-)
> > > > 
> > > > This is exactly why I'm not seeing anything solved with just rolling a
> > > > function call to a bunch of places, because it's pretending all things 
> > > > are
> > > > solved when clearly that's not the case.
> > > > 
> > > > I really think what we need is to first figure out how to support
> > > > userspace fences as explicit entities across the stack, maybe with
> > > > something like this order:
> > > > 1. enable them purely within a single userspace driver (like vk with
> > > > winsys disabled, or something else like that except not amd because
> > > > there's this amdkfd split for "real" compute)
> > > > 1a. including atomic ioctl, e.g. for vk direct display support this can 
> > > > be
> > > > used without cross-process sharing, new winsys protocols and all that 
> > > > fun
> > > > 2. figure out how to transport these userspace fences with something 
> > > > like
> > > > drm_syncobj
> > > > 2a. figure out the compat story for drivers which dont do userspace 
> > > > fences
> > > > 2b. figure out how to absorb the overhead if the winsys/compositor 
> > > > doesn't
> > > > support explicit sync
> > > > 3. maybe figure out how to make this all happen magically with implicit
> > > > sync, if we really, really care
> > > > 
> > > > If we do 3 before we've nailed all these problems, we're just 
> > > > guaranteeing
> > > > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > > > userspace fences
> > > > - the butchered implicit one that didn't quite work
> > > > - the explicit one
> > > > - the not-so-butchered implicit one with the lessons from the properly
> > > > done explicit one
> > > > 
> > > > The thing is, if you have no idea how to integrate userspace fences
> > > > explicitly into atomic ioctl, then you definitely have no idea how to do
> > > > it implicitly :-)
> > > Well I agree on that. But the question is still how would you do explicit
> > > with atomic?
> > If you supply an userpace fence (is that what we call them now) as
> > in-fence, then your only allowed to get a userspace fence as out-fence.
> 
> Yeah, that part makes perfectly sense. But I don't see the problem with
> that?
> 
> > That way we
> > - don't block anywhere we shouldn't
> > - don't create a dma_fence out of a userspace fence
> > 
> > The problem is this completely breaks your "magically make implicit
> > fencing with userspace fences" plan.
> 
> Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.

So you have to do an uapi change here. At that point we might as well do
it right.

Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.

> > So I have a plan here, 

Re: [RFC PATCH 00/97] Basic GuC submission support in the i915

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 08:26:59AM -0700, Bloomfield, Jon wrote:
> > -Original Message-
> > From: Martin Peres 
> > Sent: Tuesday, May 11, 2021 1:06 AM
> > To: Daniel Vetter 
> > Cc: Jason Ekstrand ; Brost, Matthew
> > ; intel-gfx ;
> > dri-devel ; Ursulin, Tvrtko
> > ; Ekstrand, Jason ;
> > Ceraolo Spurio, Daniele ; Bloomfield, Jon
> > ; Vetter, Daniel ;
> > Harrison, John C 
> > Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
> > 
> > On 10/05/2021 19:33, Daniel Vetter wrote:
> > > On Mon, May 10, 2021 at 3:55 PM Martin Peres 
> > wrote:
> > >>
> > >> On 10/05/2021 02:11, Jason Ekstrand wrote:
> > >>> On May 9, 2021 12:12:36 Martin Peres  wrote:
> > >>>
> >  Hi,
> > 
> >  On 06/05/2021 22:13, Matthew Brost wrote:
> > > Basic GuC submission support. This is the first bullet point in the
> > > upstreaming plan covered in the following RFC [1].
> > >
> > > At a very high level the GuC is a piece of firmware which sits between
> > > the i915 and the GPU. It offloads some of the scheduling of contexts
> > > from the i915 and programs the GPU to submit contexts. The i915
> > > communicates with the GuC and the GuC communicates with the
> > GPU.
> > 
> >  May I ask what will GuC command submission do that execlist
> > won't/can't
> >  do? And what would be the impact on users? Even forgetting the
> > troubled
> >  history of GuC (instability, performance regression, poor level of user
> >  support, 6+ years of trying to upstream it...), adding this much code
> >  and doubling the amount of validation needed should come with a
> >  rationale making it feel worth it... and I am not seeing here. Would 
> >  you
> >  mind providing the rationale behind this work?
> > 
> > >
> > > GuC submission will be disabled by default on all current upstream
> > > platforms behind a module parameter - enable_guc. A value of 3 will
> > > enable submission and HuC loading via the GuC. GuC submission
> > should
> > > work on all gen11+ platforms assuming the GuC firmware is present.
> > 
> >  What is the plan here when it comes to keeping support for execlist? I
> >  am afraid that landing GuC support in Linux is the first step towards
> >  killing the execlist, which would force users to use proprietary
> >  firmwares that even most Intel engineers have little influence over.
> >  Indeed, if "drm/i915/guc: Disable semaphores when using GuC
> > scheduling"
> >  which states "Disable semaphores when using GuC scheduling as
> > semaphores
> >  are broken in the current GuC firmware." is anything to go by, it means
> >  that even Intel developers seem to prefer working around the GuC
> >  firmware, rather than fixing it.
> > >>>
> > >>> Yes, landing GuC support may be the first step in removing execlist
> > >>> support. The inevitable reality is that GPU scheduling is coming and
> > >>> likely to be there only path in the not-too-distant future. (See also
> > >>> the ongoing thread with AMD about fences.) I'm not going to pass
> > >>> judgement on whether or not this is a good thing.  I'm just reading the
> > >>> winds and, in my view, this is where things are headed for good or ill.
> > >>>
> > >>> In answer to the question above, the answer to "what do we gain from
> > >>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
> > and,
> > >>> again, I'm not necessarily advocating for it, but that is likely where
> > >>> things are headed.
> > >>
> > >> This will be a sad day, especially since it seems fundamentally opposed
> > >> with any long-term support, on top of taking away user freedom to
> > >> fix/tweak their system when Intel won't.
> > >>
> > >>> A firmware-based submission model isn't a bad design IMO and, aside
> > from
> > >>> the firmware freedom issues, I think there are actual advantages to the
> > >>> model. Immediately, it'll unlock a few features like parallel submission
> > >>> (more on that in a bit) and long-running compute because they're
> > >>> implemented in GuC and the work to implement them properly in the
> > >>> execlist scheduler is highly non-trivial. Longer term, it may (no
> > >>> guarantees) unlock some performance by getting the kernel out of the
> > way.
> > >>
> > >> Oh, I definitely agree with firmware-based submission model not being a
> > >> bad design. I was even cheering for it in 2015. Experience with it made
> > >> me regret that deeply since :s
> > >>
> > >> But with the DRM scheduler being responsible for most things, I fail to
> > >> see what we could offload in the GuC except context switching (like
> > >> every other manufacturer). The problem is, the GuC does way more than
> > >> just switching registers in bulk, and if the number of revisions of the
> > >> GuC is anything to go by, it is way too complex for me to feel
> > >> comfortable with it.
> > >
> > > We need to flesh out that part of the 

Re: [PATCH 1/2] drm: Fix dirtyfb stalls

2021-05-11 Thread Daniel Vetter
On Mon, May 10, 2021 at 12:06:05PM -0700, Rob Clark wrote:
> On Mon, May 10, 2021 at 10:44 AM Daniel Vetter  wrote:
> >
> > On Mon, May 10, 2021 at 6:51 PM Rob Clark  wrote:
> > >
> > > On Mon, May 10, 2021 at 9:14 AM Daniel Vetter  wrote:
> > > >
> > > > On Sat, May 08, 2021 at 12:56:38PM -0700, Rob Clark wrote:
> > > > > From: Rob Clark 
> > > > >
> > > > > drm_atomic_helper_dirtyfb() will end up stalling for vblank on "video
> > > > > mode" type displays, which is pointless and unnecessary.  Add an
> > > > > optional helper vfunc to determine if a plane is attached to a CRTC
> > > > > that actually needs dirtyfb, and skip over them.
> > > > >
> > > > > Signed-off-by: Rob Clark 
> > > >
> > > > So this is a bit annoying because the idea of all these "remap legacy 
> > > > uapi
> > > > to atomic constructs" helpers is that they shouldn't need/use anything
> > > > beyond what userspace also has available. So adding hacks for them feels
> > > > really bad.
> > >
> > > I suppose the root problem is that userspace doesn't know if dirtyfb
> > > (or similar) is actually required or is a no-op.
> > >
> > > But it is perhaps less of a problem because this essentially boils
> > > down to "x11 vs wayland", and it seems like wayland compositors for
> > > non-vsync'd rendering just pageflips and throws away extra frames from
> > > the app?
> >
> > Yeah it's about not adequately batching up rendering and syncing with
> > hw. bare metal x11 is just especially stupid about it :-)
> >
> > > > Also I feel like it's not entirely the right thing to do here either.
> > > > We've had this problem already on the fbcon emulation side (which also
> > > > shouldn't be able to peek behind the atomic kms uapi curtain), and the 
> > > > fix
> > > > there was to have a worker which batches up all the updates and avoids 
> > > > any
> > > > stalls in bad places.
> > >
> > > I'm not too worried about fbcon not being able to render faster than
> > > vblank.  OTOH it is a pretty big problem for x11
> >
> > That's why we'd let the worker get ahead at most one dirtyfb. We do
> > the same with fbcon, which trivially can get ahead of vblank otherwise
> > (if sometimes flushes each character, so you have to pile them up into
> > a single update if that's still pending).
> >
> > > > Since this is for frontbuffer rendering userspace only we can probably 
> > > > get
> > > > away with assuming there's only a single fb, so the implementation 
> > > > becomes
> > > > pretty simple:
> > > >
> > > > - 1 worker, and we keep track of a single pending fb
> > > > - if there's already a dirty fb pending on a different fb, we stall for
> > > >   the worker to start processing that one already (i.e. the fb we track 
> > > > is
> > > >   reset to NULL)
> > > > - if it's pending on the same fb we just toss away all the updates and 
> > > > go
> > > >   with a full update, since merging the clip rects is too much work :-) 
> > > > I
> > > >   think there's helpers so you could be slightly more clever and just 
> > > > have
> > > >   an overall bounding box
> > >
> > > This doesn't really fix the problem, you still end up delaying sending
> > > the next back-buffer to mesa
> >
> > With this the dirtyfb would never block. Also glorious frontbuffer
> > tracking corruption is possible, but that's not the kernel's problem.
> > So how would anything get held up in userspace.
> 
> the part about stalling if a dirtyfb is pending was what I was worried
> about.. but I suppose you meant the worker stalling, rather than
> userspace stalling (where I had interpreted it the other way around).
> As soon as userspace needs to stall, you're losing again.

Nah, I did mean userspace stalling, so we can't pile up unlimited amounts
of dirtyfb request in the kernel.

But also I never expect userspace that uses dirtyfb to actually hit this
stall point (otherwise we'd need to look at this again). It would really
be only there as defense against abuse.

> > > But we could re-work drm_framebuffer_funcs::dirty to operate on a
> > > per-crtc basis and hoist the loop and check if dirtyfb is needed out
> > > of drm_atomic_helper_dirtyfb()
> >
> > That's still using information that userspace doesn't have, which is a
> > bit irky. We might as well go with your thing here then.
> 
> arguably, this is something we should expose to userspace.. for DSI
> command-mode panels, you probably want to make a different decision
> with regard to how many buffers in your flip-chain..
> 
> Possibly we should add/remove the fb_damage_clips property depending
> on the display type (ie. video/pull vs cmd/push mode)?

I'm not sure whether atomic actually needs this exposed:
- clients will do full flips for every frame anyway, I've not heard of
  anyone seriously doing frontbuffer rendering.
- transporting the cliprects around and then tossing them if the driver
  doesn't need them in their flip is probably not a measurable win

But yeah if I'm wrong and we have a need here and it's useful, then
exposing 

Re: [PATCH v6 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-11 Thread Claire Chang
On Mon, May 10, 2021 at 11:03 PM Christoph Hellwig  wrote:
>
> > +#ifdef CONFIG_DMA_RESTRICTED_POOL
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#endif
>
> I don't think any of this belongs into swiotlb.c.  Marking
> swiotlb_init_io_tlb_mem non-static and having all this code in a separate
> file is probably a better idea.

Will do in the next version.

>
> > +#ifdef CONFIG_DMA_RESTRICTED_POOL
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > + struct device *dev)
> > +{
> > + struct io_tlb_mem *mem = rmem->priv;
> > + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
> > +
> > + if (dev->dma_io_tlb_mem)
> > + return 0;
> > +
> > + /* Since multiple devices can share the same pool, the private data,
> > +  * io_tlb_mem struct, will be initialized by the first device attached
> > +  * to it.
> > +  */
>
> This is not the normal kernel comment style.

Will fix this in the next version.

>
> > +#ifdef CONFIG_ARM
> > + if (!PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
> > + kfree(mem);
> > + return -EINVAL;
> > + }
> > +#endif /* CONFIG_ARM */
>
> And this is weird.  Why would ARM have such a restriction?  And if we have
> such rstrictions it absolutely belongs into an arch helper.

Now I think the CONFIG_ARM can just be removed?
The goal here is to make sure we're using linear map and can safely
use phys_to_dma/dma_to_phys.

>
> > + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
> > +
> > + rmem->priv = mem;
> > +
> > +#ifdef CONFIG_DEBUG_FS
> > + if (!debugfs_dir)
> > + debugfs_dir = debugfs_create_dir("swiotlb", NULL);
> > +
> > + swiotlb_create_debugfs(mem, rmem->name, debugfs_dir);
>
> Doesn't the debugfs_create_dir belong into swiotlb_create_debugfs?  Also
> please use IS_ENABLEd or a stub to avoid ifdefs like this.

Will move it into swiotlb_create_debugfs and use IS_ENABLED in the next version.


Re: [PATCH v6 05/15] swiotlb: Add a new get_io_tlb_mem getter

2021-05-11 Thread Claire Chang
On Mon, May 10, 2021 at 11:03 PM Christoph Hellwig  wrote:
>
> > +static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
> > +{
> > +#ifdef CONFIG_DMA_RESTRICTED_POOL
> > + if (dev && dev->dma_io_tlb_mem)
> > + return dev->dma_io_tlb_mem;
> > +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> > +
> > + return io_tlb_default_mem;
>
> Given that we're also looking into a not addressing restricted pool
> I'd rather always assign the active pool to dev->dma_io_tlb_mem and
> do away with this helper.

Where do you think is the proper place to do the assignment? First
time calling swiotlb_map? or in of_dma_configure_id?


Re: [PATCH v6 08/15] swiotlb: Bounce data from/to restricted DMA pool if available

2021-05-11 Thread Claire Chang
On Mon, May 10, 2021 at 11:05 PM Christoph Hellwig  wrote:
>
> > +static inline bool is_dev_swiotlb_force(struct device *dev)
> > +{
> > +#ifdef CONFIG_DMA_RESTRICTED_POOL
> > + if (dev->dma_io_tlb_mem)
> > + return true;
> > +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> > + return false;
> > +}
> > +
>
> >   /* If SWIOTLB is active, use its maximum mapping size */
> >   if (is_swiotlb_active(dev) &&
> > - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
> > + (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
> > +  is_dev_swiotlb_force(dev)))
>
> This is a mess.  I think the right way is to have an always_bounce flag
> in the io_tlb_mem structure instead.  Then the global swiotlb_force can
> go away and be replace with this and the fact that having no
> io_tlb_mem structure at all means forced no buffering (after a little
> refactoring).

Will do in the next version.


Re: [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 05:37:54PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 12:14:03PM -0700, Matthew Brost wrote:
> > Disable engine barriers for unpinning with GuC. This feature isn't
> > needed with the GuC as it disables context scheduling before unpinning
> > which guarantees the HW will not reference the context. Hence it is
> > not necessary to defer unpinning until a kernel context request
> > completes on each engine in the context engine mask.
> > 
> > Cc: John Harrison 
> > Signed-off-by: Matthew Brost 
> > Signed-off-by: Daniele Ceraolo Spurio 
> 
> Instead of these ifs in the code, can we push this barrier business down
> into backends?
> 

Not a bad idea. This is an example of what I think of implict behavior of the
backend creeping into the higher levels.

> Not in this series, but as one of the things to sort out as part of the
> conversion to drm/scheduler.

Agree. After basic GuC submission gets merged maybe we go through the code and
remove all the implict backend assumptions.

Matt

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_context.c|  2 +-
> >  drivers/gpu/drm/i915/gt/intel_context.h|  1 +
> >  drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++
> >  drivers/gpu/drm/i915/i915_active.c |  3 +++
> >  4 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 1499b8aace2a..7f97753ab164 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct 
> > intel_context *ce)
> >  
> > __i915_active_acquire(>active);
> >  
> > -   if (intel_context_is_barrier(ce))
> > +   if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
> > return 0;
> >  
> > /* Preallocate tracking nodes */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 92ecbab8c1cd..9b211ca5ecc7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -16,6 +16,7 @@
> >  #include "intel_engine_types.h"
> >  #include "intel_ring_types.h"
> >  #include "intel_timeline_types.h"
> > +#include "uc/intel_guc_submission.h"
> >  
> >  #define CE_TRACE(ce, fmt, ...) do {
> > \
> > const struct intel_context *ce__ = (ce);\
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
> > b/drivers/gpu/drm/i915/gt/selftest_context.c
> > index 26685b927169..fa7b99a671dd 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_context.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
> > @@ -209,7 +209,13 @@ static int __live_active_context(struct 
> > intel_engine_cs *engine)
> >  * This test makes sure that the context is kept alive until a
> >  * subsequent idle-barrier (emitted when the engine wakeref hits 0
> >  * with no more outstanding requests).
> > +*
> > +* In GuC submission mode we don't use idle barriers and we instead
> > +* get a message from the GuC to signal that it is safe to unpin the
> > +* context from memory.
> >  */
> > +   if (intel_engine_uses_guc(engine))
> > +   return 0;
> >  
> > if (intel_engine_pm_is_awake(engine)) {
> > pr_err("%s is awake before starting %s!\n",
> > @@ -357,7 +363,11 @@ static int __live_remote_context(struct 
> > intel_engine_cs *engine)
> >  * on the context image remotely (intel_context_prepare_remote_request),
> >  * which inserts foreign fences into intel_context.active, does not
> >  * clobber the idle-barrier.
> > +*
> > +* In GuC submission mode we don't use idle barriers.
> >  */
> > +   if (intel_engine_uses_guc(engine))
> > +   return 0;
> >  
> > if (intel_engine_pm_is_awake(engine)) {
> > pr_err("%s is awake before starting %s!\n",
> > diff --git a/drivers/gpu/drm/i915/i915_active.c 
> > b/drivers/gpu/drm/i915/i915_active.c
> > index b1aa1c482c32..9a264898bb91 100644
> > --- a/drivers/gpu/drm/i915/i915_active.c
> > +++ b/drivers/gpu/drm/i915/i915_active.c
> > @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active 
> > *ref)
> >  
> > GEM_BUG_ON(i915_active_is_idle(ref));
> >  
> > +   if (llist_empty(>preallocated_barriers))
> > +   return;
> > +
> > /*
> >  * Transfer the list of preallocated barriers into the
> >  * i915_active rbtree, but only as proto-nodes. They will be
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [PATCH] Documentation: gpu: Mention the requirements for new properties

2021-05-11 Thread Alex Deucher
On Tue, May 11, 2021 at 11:55 AM Maxime Ripard  wrote:
>
> New KMS properties come with a bunch of requirements to avoid each
> driver from running their own, inconsistent, set of properties,
> eventually leading to issues like property conflicts, inconsistencies
> between drivers and semantics, etc.
>
> Let's document what we expect.
>
> Signed-off-by: Maxime Ripard 
> ---
>  Documentation/gpu/drm-kms.rst | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index 87e5023e3f55..30f4c376f419 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -463,6 +463,24 @@ KMS Properties
>  This section of the documentation is primarily aimed at user-space 
> developers.
>  For the driver APIs, see the other sections.
>
> +Requirements
> +
> +
> +KMS drivers might need to add extra properties to support new features.
> +Each new property introduced in a driver need to meet a few
> +requirements, in addition to the one mentioned above.:
> +
> +- It must be standardized, with some documentation to describe the

"to describe how the"

With that fixed, it looks good to me.

Alex

> +  property can be used.
> +
> +- It must provide a generic helper in the core code to register that
> +  property on the object it attaches to.
> +
> +- Its content must be decoded by the core and provided in the object
> +  associated state structure.
> +
> +- An IGT test must be submitted.
> +
>  Property Types and Blob Property Support
>  
>
> --
> 2.31.1
>


Re: [Intel-gfx] [RFC PATCH 74/97] drm/i915/guc: Capture error state on context reset

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 12:14:28PM -0700, Matthew Brost wrote:
> We receive notification of an engine reset from GuC at its
> completion. Meaning GuC has potentially cleared any HW state
> we may have been interested in capturing. GuC resumes scheduling
> on the engine post-reset, as the resets are meant to be transparent,
> further muddling our error state.
> 
> There is ongoing work to define an API for a GuC debug state dump. The
> suggestion for now is to manually disable FW initiated resets in cases
> where debug state is needed.
> 
> Signed-off-by: Matthew Brost 

This looks a bit backwards to me:

- I figured we should capture error state when we get the G2H, in which
  case I hope we do know which the offending context was that got shot.

- For now we're missing the hw state, but we should still be able to
  capture the buffers userspace wants us to capture. So that could be
  wired up already?

But yeah register state capturing needs support from GuC fw.

I think this is a big enough miss in GuC features that we should list it
on the rfc as a thing to fix.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_context.c   | 20 +++
>  drivers/gpu/drm/i915/gt/intel_context.h   |  3 ++
>  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 --
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +--
>  drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++---
>  7 files changed, 91 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index 2f01437056a8..3fe7794b2bfd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct 
> intel_context *ce)
>   return rq;
>  }
>  
> +struct i915_request *intel_context_find_active_request(struct intel_context 
> *ce)
> +{
> + struct i915_request *rq, *active = NULL;
> + unsigned long flags;
> +
> + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
> +
> + spin_lock_irqsave(>guc_active.lock, flags);
> + list_for_each_entry_reverse(rq, >guc_active.requests,
> + sched.link) {
> + if (i915_request_completed(rq))
> + break;
> +
> + active = rq;
> + }
> + spin_unlock_irqrestore(>guc_active.lock, flags);
> +
> + return active;
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftest_context.c"
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> b/drivers/gpu/drm/i915/gt/intel_context.h
> index 9b211ca5ecc7..d2b499ed8a05 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -195,6 +195,9 @@ int intel_context_prepare_remote_request(struct 
> intel_context *ce,
>  
>  struct i915_request *intel_context_create_request(struct intel_context *ce);
>  
> +struct i915_request *
> +intel_context_find_active_request(struct intel_context *ce);
> +
>  static inline struct intel_ring *__intel_context_ring_size(u64 sz)
>  {
>   return u64_to_ptr(struct intel_ring, sz);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
> b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 3321d0917a99..bb94963a9fa2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -242,7 +242,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs 
> *engine,
>  ktime_t *now);
>  
>  struct i915_request *
> -intel_engine_find_active_request(struct intel_engine_cs *engine);
> +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
>  
>  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
>  
> @@ -316,4 +316,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, 
> unsigned int sibling)
>   return engine->cops->get_sibling(engine, sibling);
>  }
>  
> +static inline void
> +intel_engine_set_hung_context(struct intel_engine_cs *engine,
> +   struct intel_context *ce)
> +{
> + engine->hung_ce = ce;
> +}
> +
> +static inline void
> +intel_engine_clear_hung_context(struct intel_engine_cs *engine)
> +{
> + intel_engine_set_hung_context(engine, NULL);
> +}
> +
> +static inline struct intel_context *
> +intel_engine_get_hung_context(struct intel_engine_cs *engine)
> +{
> + return engine->hung_ce;
> +}
> +
>  #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 10300db1c9a6..ad3987289f09 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1727,7 +1727,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   drm_printf(m, "\tRequests:\n");

Re: [Intel-gfx] [RFC PATCH 68/97] drm/i915/guc: Handle context reset notification

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 12:14:22PM -0700, Matthew Brost wrote:
> GuC will issue a reset on detecting an engine hang and will notify
> the driver via a G2H message. The driver will service the notification
> by resetting the guilty context to a simple state or banning it
> completely.
> 
> Cc: Matthew Brost 
> Cc: John Harrison 
> Signed-off-by: Matthew Brost 

Entirely aside, but I wonder whether we shouldn't just make
non-recoverable contexts the only thing we support. But probably a too big
can of worms.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  2 ++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++
>  drivers/gpu/drm/i915/i915_trace.h | 10 ++
>  4 files changed, 53 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 277b4496a20e..a2abe1c422e3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -263,6 +263,8 @@ int intel_guc_deregister_done_process_msg(struct 
> intel_guc *guc,
> const u32 *msg, u32 len);
>  int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>const u32 *msg, u32 len);
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> + const u32 *msg, u32 len);
>  
>  void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>  void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index b3194d753b13..9c84b2ba63a8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -941,6 +941,12 @@ static int ct_process_request(struct intel_guc_ct *ct, 
> struct ct_incoming_msg *r
>   CT_ERROR(ct, "schedule context failed %x %*ph\n",
> action, 4 * len, payload);
>   break;
> + case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
> + ret = intel_guc_context_reset_process_msg(guc, payload, len);
> + if (unlikely(ret))
> + CT_ERROR(ct, "context reset notification failed %x 
> %*ph\n",
> +   action, 4 * len, payload);
> + break;
>   default:
>   ret = -EOPNOTSUPP;
>   break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 2c3791fc24b7..940017495731 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc 
> *guc,
>   return 0;
>  }
>  
> +static void guc_context_replay(struct intel_context *ce)
> +{
> + struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +
> + __guc_reset_context(ce, true);
> + i915_sched_engine_hi_kick(sched_engine);
> +}
> +
> +static void guc_handle_context_reset(struct intel_guc *guc,
> +  struct intel_context *ce)
> +{
> + trace_intel_context_reset(ce);
> + guc_context_replay(ce);
> +}
> +
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> + const u32 *msg, u32 len)
> +{
> + struct intel_context *ce;
> + int desc_idx = msg[0];
> +
> + if (unlikely(len != 1)) {
> + drm_dbg(_to_gt(guc)->i915->drm, "Invalid length %u", len);
> + return -EPROTO;
> + }
> +
> + ce = g2h_context_lookup(guc, desc_idx);
> + if (unlikely(!ce))
> + return -EPROTO;
> +
> + guc_handle_context_reset(guc, ce);
> +
> + return 0;
> +}
> +
>  void intel_guc_log_submission_info(struct intel_guc *guc,
>  struct drm_printer *p)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_trace.h 
> b/drivers/gpu/drm/i915/i915_trace.h
> index 97c2e83984ed..c095c4d39456 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
> __entry->guc_sched_state_no_lock)
>  );
>  
> +DEFINE_EVENT(intel_context, intel_context_reset,
> +  TP_PROTO(struct intel_context *ce),
> +  TP_ARGS(ce)
> +);
> +
>  DEFINE_EVENT(intel_context, intel_context_register,
>TP_PROTO(struct intel_context *ce),
>TP_ARGS(ce)
> @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
>  {
>  }
>  
> +static inline void
> +trace_intel_context_reset(struct intel_context *ce)
> +{
> +}
> +
>  static inline void
>  trace_intel_context_register(struct intel_context *ce)
>  {
> -- 
> 2.28.0
> 
> ___
> 

Re: [PATCH v6 2/3] drm/mediatek: init panel orientation property

2021-05-11 Thread Chun-Kuang Hu
Hi, Hsin-Yi:

Hsin-Yi Wang  於 2021年4月29日 週四 下午12:28寫道:
>
> Init panel orientation property after connector is initialized. Let the
> panel driver decides the orientation value later.

Acked-by: Chun-Kuang Hu 

>
> Signed-off-by: Hsin-Yi Wang 
> ---
>  drivers/gpu/drm/mediatek/mtk_dsi.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
> b/drivers/gpu/drm/mediatek/mtk_dsi.c
> index ae403c67cbd9..9da1fd649131 100644
> --- a/drivers/gpu/drm/mediatek/mtk_dsi.c
> +++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
> @@ -964,6 +964,13 @@ static int mtk_dsi_encoder_init(struct drm_device *drm, 
> struct mtk_dsi *dsi)
> ret = PTR_ERR(dsi->connector);
> goto err_cleanup_encoder;
> }
> +
> +   ret = drm_connector_init_panel_orientation_property(dsi->connector);
> +   if (ret) {
> +   DRM_ERROR("Unable to init panel orientation\n");
> +   goto err_cleanup_encoder;
> +   }
> +
> drm_connector_attach_encoder(dsi->connector, >encoder);
>
> return 0;
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>


[PATCH] drm: fix semicolon.cocci warnings

2021-05-11 Thread kernel test robot
From: kernel test robot 

drivers/gpu/drm/kmb/kmb_dsi.c:284:3-4: Unneeded semicolon
drivers/gpu/drm/kmb/kmb_dsi.c:304:3-4: Unneeded semicolon
drivers/gpu/drm/kmb/kmb_dsi.c:321:3-4: Unneeded semicolon
drivers/gpu/drm/kmb/kmb_dsi.c:340:3-4: Unneeded semicolon
drivers/gpu/drm/kmb/kmb_dsi.c:364:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: ade896460e4a ("drm: DRM_KMB_DISPLAY should depend on ARCH_KEEMBAY")
CC: Geert Uytterhoeven 
Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   1140ab592e2ebf8153d2b322604031a8868ce7a5
commit: ade896460e4a62f5e4a892a98d254937f6f5b64c drm: DRM_KMB_DISPLAY should 
depend on ARCH_KEEMBAY
:: branch date: 18 hours ago
:: commit date: 6 months ago

 kmb_dsi.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/gpu/drm/kmb/kmb_dsi.c
+++ b/drivers/gpu/drm/kmb/kmb_dsi.c
@@ -281,7 +281,7 @@ static u32 mipi_get_datatype_params(u32
default:
DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
return -EINVAL;
-   };
+   }
break;
case DSI_LP_DT_PPS_YCBCR422_16B:
data_type_param.size_constraint_pixels = 2;
@@ -301,7 +301,7 @@ static u32 mipi_get_datatype_params(u32
default:
DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
return -EINVAL;
-   };
+   }
break;
case DSI_LP_DT_LPPS_YCBCR422_20B:
case DSI_LP_DT_PPS_YCBCR422_24B:
@@ -318,7 +318,7 @@ static u32 mipi_get_datatype_params(u32
default:
DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
return -EINVAL;
-   };
+   }
break;
case DSI_LP_DT_PPS_RGB565_16B:
data_type_param.size_constraint_pixels = 1;
@@ -337,7 +337,7 @@ static u32 mipi_get_datatype_params(u32
default:
DRM_ERROR("DSI: Invalid data_mode %d\n", data_mode);
return -EINVAL;
-   };
+   }
break;
case DSI_LP_DT_PPS_RGB666_18B:
data_type_param.size_constraint_pixels = 4;
@@ -361,7 +361,7 @@ static u32 mipi_get_datatype_params(u32
default:
DRM_ERROR("DSI: Invalid data_type %d\n", data_type);
return -EINVAL;
-   };
+   }
 
*params = data_type_param;
return 0;


Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-11 Thread Andrey Grodzovsky




On 2021-05-11 11:56 a.m., Alex Deucher wrote:

On Mon, May 10, 2021 at 12:37 PM Andrey Grodzovsky
 wrote:


Handle all DMA IOMMU gropup related dependencies before the
group is removed.

v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
v6: Drop the BO unamp list

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 9 +
  drivers/gpu/drm/amd/amdgpu/cik_ih.c| 1 -
  drivers/gpu/drm/amd/amdgpu/cz_ih.c | 1 -
  drivers/gpu/drm/amd/amdgpu/iceland_ih.c| 1 -
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 ---
  drivers/gpu/drm/amd/amdgpu/si_ih.c | 1 -
  drivers/gpu/drm/amd/amdgpu/tonga_ih.c  | 1 -
  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 3 ---
  11 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 18598eda18f6..a0bff4713672 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3256,7 +3256,6 @@ static const struct attribute *amdgpu_dev_attributes[] = {
 NULL
  };

-
  /**
   * amdgpu_device_init - initialize the driver
   *
@@ -3698,12 +3697,13 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 amdgpu_ucode_sysfs_fini(adev);
 sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes);

-
 amdgpu_fbdev_fini(adev);

 amdgpu_irq_fini_hw(adev);

 amdgpu_device_ip_fini_early(adev);
+
+   amdgpu_gart_dummy_page_fini(adev);
  }

  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index c5a9a4fb10d2..354e68081b53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device 
*adev)
   *
   * Frees the dummy page used by the driver (all asics).
   */
-static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
  {
 if (!adev->dummy_page_addr)
 return;
@@ -375,5 +375,4 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
   */
  void amdgpu_gart_fini(struct amdgpu_device *adev)
  {
-   amdgpu_gart_dummy_page_fini(adev);
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index a25fe97b0196..78dc7a23da56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -58,6 +58,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
  void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
  int amdgpu_gart_init(struct amdgpu_device *adev);
  void amdgpu_gart_fini(struct amdgpu_device *adev);
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
  int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
int pages);
  int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 233b64dab94b..a14973a7a9c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -361,6 +361,15 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev)
 if (!amdgpu_device_has_dc_support(adev))
 flush_work(>hotplug_work);
 }
+
+   if (adev->irq.ih_soft.ring)
+   amdgpu_ih_ring_fini(adev, >irq.ih_soft);


Why is the ih_soft handled here and in the various ih sw_fini functions?


Post last rebase new ASICs i think were added which i missed.
Taking care of this with prev. comment by Christian together right now.

Andrey




+   if (adev->irq.ih.ring)
+   amdgpu_ih_ring_fini(adev, >irq.ih);
+   if (adev->irq.ih1.ring)
+   amdgpu_ih_ring_fini(adev, >irq.ih1);
+   if (adev->irq.ih2.ring)
+   amdgpu_ih_ring_fini(adev, >irq.ih2);
  }

  /**
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index 183d44a6583c..df385ffc9768 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -310,7 +310,6 @@ static int cik_ih_sw_fini(void *handle)
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

 amdgpu_irq_fini_sw(adev);
-   amdgpu_ih_ring_fini(adev, >irq.ih);
 amdgpu_irq_remove_domain(adev);

 return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
index d32743949003..b8c47e0cf37a 100644
--- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
@@ -302,7 +302,6 @@ static int 

Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-11 Thread Alex Deucher
On Mon, May 10, 2021 at 12:37 PM Andrey Grodzovsky
 wrote:
>
> Handle all DMA IOMMU gropup related dependencies before the
> group is removed.
>
> v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
> v6: Drop the BO unamp list
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 3 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 9 +
>  drivers/gpu/drm/amd/amdgpu/cik_ih.c| 1 -
>  drivers/gpu/drm/amd/amdgpu/cz_ih.c | 1 -
>  drivers/gpu/drm/amd/amdgpu/iceland_ih.c| 1 -
>  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 ---
>  drivers/gpu/drm/amd/amdgpu/si_ih.c | 1 -
>  drivers/gpu/drm/amd/amdgpu/tonga_ih.c  | 1 -
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 3 ---
>  11 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 18598eda18f6..a0bff4713672 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3256,7 +3256,6 @@ static const struct attribute *amdgpu_dev_attributes[] 
> = {
> NULL
>  };
>
> -
>  /**
>   * amdgpu_device_init - initialize the driver
>   *
> @@ -3698,12 +3697,13 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
> amdgpu_ucode_sysfs_fini(adev);
> sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes);
>
> -
> amdgpu_fbdev_fini(adev);
>
> amdgpu_irq_fini_hw(adev);
>
> amdgpu_device_ip_fini_early(adev);
> +
> +   amdgpu_gart_dummy_page_fini(adev);
>  }
>
>  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index c5a9a4fb10d2..354e68081b53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device 
> *adev)
>   *
>   * Frees the dummy page used by the driver (all asics).
>   */
> -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
>  {
> if (!adev->dummy_page_addr)
> return;
> @@ -375,5 +375,4 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>   */
>  void amdgpu_gart_fini(struct amdgpu_device *adev)
>  {
> -   amdgpu_gart_dummy_page_fini(adev);
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> index a25fe97b0196..78dc7a23da56 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
> @@ -58,6 +58,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
>  void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
>  int amdgpu_gart_init(struct amdgpu_device *adev);
>  void amdgpu_gart_fini(struct amdgpu_device *adev);
> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
>  int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
>int pages);
>  int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 233b64dab94b..a14973a7a9c9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -361,6 +361,15 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev)
> if (!amdgpu_device_has_dc_support(adev))
> flush_work(>hotplug_work);
> }
> +
> +   if (adev->irq.ih_soft.ring)
> +   amdgpu_ih_ring_fini(adev, >irq.ih_soft);

Why is the ih_soft handled here and in the various ih sw_fini functions?

> +   if (adev->irq.ih.ring)
> +   amdgpu_ih_ring_fini(adev, >irq.ih);
> +   if (adev->irq.ih1.ring)
> +   amdgpu_ih_ring_fini(adev, >irq.ih1);
> +   if (adev->irq.ih2.ring)
> +   amdgpu_ih_ring_fini(adev, >irq.ih2);
>  }
>
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> index 183d44a6583c..df385ffc9768 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> @@ -310,7 +310,6 @@ static int cik_ih_sw_fini(void *handle)
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> amdgpu_irq_fini_sw(adev);
> -   amdgpu_ih_ring_fini(adev, >irq.ih);
> amdgpu_irq_remove_domain(adev);
>
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> index d32743949003..b8c47e0cf37a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> @@ -302,7 +302,6 @@ static int cz_ih_sw_fini(void 

[PATCH] Documentation: gpu: Mention the requirements for new properties

2021-05-11 Thread Maxime Ripard
New KMS properties come with a bunch of requirements to avoid each
driver from running their own, inconsistent, set of properties,
eventually leading to issues like property conflicts, inconsistencies
between drivers and semantics, etc.

Let's document what we expect.

Signed-off-by: Maxime Ripard 
---
 Documentation/gpu/drm-kms.rst | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
index 87e5023e3f55..30f4c376f419 100644
--- a/Documentation/gpu/drm-kms.rst
+++ b/Documentation/gpu/drm-kms.rst
@@ -463,6 +463,24 @@ KMS Properties
 This section of the documentation is primarily aimed at user-space developers.
 For the driver APIs, see the other sections.
 
+Requirements
+
+
+KMS drivers might need to add extra properties to support new features.
+Each new property introduced in a driver need to meet a few
+requirements, in addition to the one mentioned above.:
+
+- It must be standardized, with some documentation to describe the
+  property can be used.
+
+- It must provide a generic helper in the core code to register that
+  property on the object it attaches to.
+
+- Its content must be decoded by the core and provided in the object
+  associated state structure.
+
+- An IGT test must be submitted.
+
 Property Types and Blob Property Support
 
 
-- 
2.31.1



Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-11 Thread Alex Deucher
On Fri, May 7, 2021 at 7:45 PM Tejun Heo  wrote:
>
> Hello,
>
> On Fri, May 07, 2021 at 06:30:56PM -0400, Alex Deucher wrote:
> > Maybe we are speaking past each other.  I'm not following.  We got
> > here because a device specific cgroup didn't make sense.  With my
> > Linux user hat on, that makes sense.  I don't want to write code to a
> > bunch of device specific interfaces if I can avoid it.  But as for
> > temporal vs spatial partitioning of the GPU, the argument seems to be
> > a sort of hand-wavy one that both spatial and temporal partitioning
> > make sense on CPUs, but only temporal partitioning makes sense on
> > GPUs.  I'm trying to understand that assertion.  There are some GPUs
>
> Spatial partitioning as implemented in cpuset isn't a desirable model. It's
> there partly because it has historically been there. It doesn't really
> require dynamic hierarchical distribution of anything and is more of a way
> to batch-update per-task configuration, which is how it's actually
> implemented. It's broken too in that it interferes with per-task affinity
> settings. So, not exactly a good example to follow. In addition, this sort
> of partitioning requires more hardware knowledge and GPUs are worse than
> CPUs in that hardwares differ more.
>
> Features like this are trivial to implement from userland side by making
> per-process settings inheritable and restricting who can update the
> settings.
>
> > that can more easily be temporally partitioned and some that can be
> > more easily spatially partitioned.  It doesn't seem any different than
> > CPUs.
>
> Right, it doesn't really matter how the resource is distributed. What
> matters is how granular and generic the distribution can be. If gpus can
> implement work-conserving proportional distribution, that's something which
> is widely useful and inherently requires dynamic scheduling from kernel
> side. If it's about setting per-vendor affinities, this is way too much
> cgroup interface for a feature which can be easily implemented outside
> cgroup. Just do per-process (or whatever handles gpus use) and confine their
> configurations from cgroup side however way.
>
> While the specific theme changes a bit, we're basically having the same
> discussion with the same conclusion over the past however many months.
> Hopefully, the point is clear by now.

Thanks, that helps a lot.

Alex


Re: [PATCH v6 06/16] drm/amdgpu: Handle IOMMU enabled case.

2021-05-11 Thread Andrey Grodzovsky




On 2021-05-11 2:44 a.m., Christian König wrote:

Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky:

Handle all DMA IOMMU gropup related dependencies before the
group is removed.

v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
v6: Drop the BO unamp list

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 9 +
  drivers/gpu/drm/amd/amdgpu/cik_ih.c    | 1 -
  drivers/gpu/drm/amd/amdgpu/cz_ih.c | 1 -
  drivers/gpu/drm/amd/amdgpu/iceland_ih.c    | 1 -
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 ---
  drivers/gpu/drm/amd/amdgpu/si_ih.c | 1 -
  drivers/gpu/drm/amd/amdgpu/tonga_ih.c  | 1 -
  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 3 ---
  11 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 18598eda18f6..a0bff4713672 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3256,7 +3256,6 @@ static const struct attribute 
*amdgpu_dev_attributes[] = {

  NULL
  };
-
  /**
   * amdgpu_device_init - initialize the driver
   *
@@ -3698,12 +3697,13 @@ void amdgpu_device_fini_hw(struct 
amdgpu_device *adev)

  amdgpu_ucode_sysfs_fini(adev);
  sysfs_remove_files(>dev->kobj, amdgpu_dev_attributes);
-
  amdgpu_fbdev_fini(adev);
  amdgpu_irq_fini_hw(adev);
  amdgpu_device_ip_fini_early(adev);
+
+    amdgpu_gart_dummy_page_fini(adev);


I think you should probably just call amdgpu_gart_fini() here.


  }
  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c

index c5a9a4fb10d2..354e68081b53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct 
amdgpu_device *adev)

   *
   * Frees the dummy page used by the driver (all asics).
   */
-static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
  {
  if (!adev->dummy_page_addr)
  return;
@@ -375,5 +375,4 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
   */
  void amdgpu_gart_fini(struct amdgpu_device *adev)
  {
-    amdgpu_gart_dummy_page_fini(adev);
  }


Well either you remove amdgpu_gart_fini() or just call 
amdgpu_gart_fini() instead of amdgpu_gart_dummy_page_fini().


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h

index a25fe97b0196..78dc7a23da56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -58,6 +58,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device 
*adev);

  void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
  int amdgpu_gart_init(struct amdgpu_device *adev);
  void amdgpu_gart_fini(struct amdgpu_device *adev);
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
  int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
 int pages);
  int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c

index 233b64dab94b..a14973a7a9c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -361,6 +361,15 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev)
  if (!amdgpu_device_has_dc_support(adev))
  flush_work(>hotplug_work);
  }
+
+    if (adev->irq.ih_soft.ring)
+    amdgpu_ih_ring_fini(adev, >irq.ih_soft);
+    if (adev->irq.ih.ring)
+    amdgpu_ih_ring_fini(adev, >irq.ih);
+    if (adev->irq.ih1.ring)
+    amdgpu_ih_ring_fini(adev, >irq.ih1);
+    if (adev->irq.ih2.ring)
+    amdgpu_ih_ring_fini(adev, >irq.ih2);


You should probably make the function NULL save instead of checking here.

Christian.


Agree, in fact it's already does this check inside amdgpu_ih_ring_fini
so I will just drop the checks.

Andrey




  }
  /**
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cik_ih.c

index 183d44a6583c..df385ffc9768 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -310,7 +310,6 @@ static int cik_ih_sw_fini(void *handle)
  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
  amdgpu_irq_fini_sw(adev);
-    amdgpu_ih_ring_fini(adev, >irq.ih);
  amdgpu_irq_remove_domain(adev);
  return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cz_ih.c

index d32743949003..b8c47e0cf37a 100644
--- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c

Re: [RFC PATCH 49/97] drm/i915/guc: Disable engine barriers with GuC during unpin

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 12:14:03PM -0700, Matthew Brost wrote:
> Disable engine barriers for unpinning with GuC. This feature isn't
> needed with the GuC as it disables context scheduling before unpinning
> which guarantees the HW will not reference the context. Hence it is
> not necessary to defer unpinning until a kernel context request
> completes on each engine in the context engine mask.
> 
> Cc: John Harrison 
> Signed-off-by: Matthew Brost 
> Signed-off-by: Daniele Ceraolo Spurio 

Instead of these ifs in the code, can we push this barrier business down
into backends?

Not in this series, but as one of the things to sort out as part of the
conversion to drm/scheduler.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_context.c|  2 +-
>  drivers/gpu/drm/i915/gt/intel_context.h|  1 +
>  drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++
>  drivers/gpu/drm/i915/i915_active.c |  3 +++
>  4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index 1499b8aace2a..7f97753ab164 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct 
> intel_context *ce)
>  
>   __i915_active_acquire(>active);
>  
> - if (intel_context_is_barrier(ce))
> + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
>   return 0;
>  
>   /* Preallocate tracking nodes */
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> b/drivers/gpu/drm/i915/gt/intel_context.h
> index 92ecbab8c1cd..9b211ca5ecc7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -16,6 +16,7 @@
>  #include "intel_engine_types.h"
>  #include "intel_ring_types.h"
>  #include "intel_timeline_types.h"
> +#include "uc/intel_guc_submission.h"
>  
>  #define CE_TRACE(ce, fmt, ...) do {  \
>   const struct intel_context *ce__ = (ce);\
> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
> b/drivers/gpu/drm/i915/gt/selftest_context.c
> index 26685b927169..fa7b99a671dd 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_context.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
> @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs 
> *engine)
>* This test makes sure that the context is kept alive until a
>* subsequent idle-barrier (emitted when the engine wakeref hits 0
>* with no more outstanding requests).
> +  *
> +  * In GuC submission mode we don't use idle barriers and we instead
> +  * get a message from the GuC to signal that it is safe to unpin the
> +  * context from memory.
>*/
> + if (intel_engine_uses_guc(engine))
> + return 0;
>  
>   if (intel_engine_pm_is_awake(engine)) {
>   pr_err("%s is awake before starting %s!\n",
> @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs 
> *engine)
>* on the context image remotely (intel_context_prepare_remote_request),
>* which inserts foreign fences into intel_context.active, does not
>* clobber the idle-barrier.
> +  *
> +  * In GuC submission mode we don't use idle barriers.
>*/
> + if (intel_engine_uses_guc(engine))
> + return 0;
>  
>   if (intel_engine_pm_is_awake(engine)) {
>   pr_err("%s is awake before starting %s!\n",
> diff --git a/drivers/gpu/drm/i915/i915_active.c 
> b/drivers/gpu/drm/i915/i915_active.c
> index b1aa1c482c32..9a264898bb91 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -968,6 +968,9 @@ void i915_active_acquire_barrier(struct i915_active *ref)
>  
>   GEM_BUG_ON(i915_active_is_idle(ref));
>  
> + if (llist_empty(>preallocated_barriers))
> + return;
> +
>   /*
>* Transfer the list of preallocated barriers into the
>* i915_active rbtree, but only as proto-nodes. They will be
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Christian König

Am 11.05.21 um 16:23 schrieb Daniel Vetter:

On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:

Am 11.05.21 um 09:31 schrieb Daniel Vetter:

[SNIP]

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.

I think we can just ignore sync_file. As far as it concerns me that UAPI is
pretty much dead.

Uh that's rather bold. Android is built on it. Currently atomic kms is
built on it.

To be honest I don't think we care about Android at all.

we = amd or we = upstream here?


we = amd, for everybody else that is certainly a different topic.

But for now AMD is the only one running into this problem.

Could be that Nouveau sees this as well with the next hw generation, but 
who knows?



Why is this not much of a problem if it's just within one driver?

Because inside the same driver I can easily add the waits before submitting
the MM work as necessary.

What is MM work here now?


MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.


Adding implicit synchronization on top of that is then rather trivial.

Well that's what I disagree with, since I already see some problems that I
don't think we can overcome (the atomic ioctl is one). And that's with us
only having a fairly theoretical understanding of the overall situation.

But how should we then ever support user fences with the atomic IOCTL?

We can't wait in user space since that will disable the support for waiting
in the hardware.

Well, figure it out :-)

This is exactly why I'm not seeing anything solved with just rolling a
function call to a bunch of places, because it's pretending all things are
solved when clearly that's not the case.

I really think what we need is to first figure out how to support
userspace fences as explicit entities across the stack, maybe with
something like this order:
1. enable them purely within a single userspace driver (like vk with
winsys disabled, or something else like that except not amd because
there's this amdkfd split for "real" compute)
1a. including atomic ioctl, e.g. for vk direct display support this can be
used without cross-process sharing, new winsys protocols and all that fun
2. figure out how to transport these userspace fences with something like
drm_syncobj
2a. figure out the compat story for drivers which dont do userspace fences
2b. figure out how to absorb the overhead if the winsys/compositor doesn't
support explicit sync
3. maybe figure out how to make this all happen magically with implicit
sync, if we really, really care

If we do 3 before we've nailed all these problems, we're just guaranteeing
we'll get the wrong solutions and so we'll then have 3 ways of doing
userspace fences
- the butchered implicit one that didn't quite work
- the explicit one
- the not-so-butchered implicit one with the lessons from the properly
done explicit one

The thing is, if you have no idea how to integrate userspace fences
explicitly into atomic ioctl, then you definitely have no idea how to do
it implicitly :-)

Well I agree on that. But the question is still how would you do explicit
with atomic?

If you supply an userpace fence (is that what we call them now) as
in-fence, then your only allowed to get a userspace fence as out-fence.


Yeah, that part makes perfectly sense. But I don't see the problem with 
that?



That way we
- don't block anywhere we shouldn't
- don't create a dma_fence out of a userspace fence

The problem is this completely breaks your "magically make implicit
fencing with userspace fences" plan.


Why?


So I have a plan here, what was yours?


As far as I see that should still work perfectly fine and I have the 
strong feeling I'm missing something here.



Transporting fences between processes is not the fundamental problem here,
but rather the question how we represent all this in the kernel?

In other words I think what you outlined above is just approaching it from
the wrong side again. Instead of looking what the kernel needs to support
this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).

So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.


Well I'm trying to figure out why you see it as such a problem to keep 
implicit sync around.


As far as I can tell it is completely octagonal if we use 
implicit/explicit and dma_fence/user_fence.


It's just a different implementation inside the kernel.

Christian.


-Daniel



RE: [RFC PATCH 00/97] Basic GuC submission support in the i915

2021-05-11 Thread Bloomfield, Jon
> -Original Message-
> From: Martin Peres 
> Sent: Tuesday, May 11, 2021 1:06 AM
> To: Daniel Vetter 
> Cc: Jason Ekstrand ; Brost, Matthew
> ; intel-gfx ;
> dri-devel ; Ursulin, Tvrtko
> ; Ekstrand, Jason ;
> Ceraolo Spurio, Daniele ; Bloomfield, Jon
> ; Vetter, Daniel ;
> Harrison, John C 
> Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915
> 
> On 10/05/2021 19:33, Daniel Vetter wrote:
> > On Mon, May 10, 2021 at 3:55 PM Martin Peres 
> wrote:
> >>
> >> On 10/05/2021 02:11, Jason Ekstrand wrote:
> >>> On May 9, 2021 12:12:36 Martin Peres  wrote:
> >>>
>  Hi,
> 
>  On 06/05/2021 22:13, Matthew Brost wrote:
> > Basic GuC submission support. This is the first bullet point in the
> > upstreaming plan covered in the following RFC [1].
> >
> > At a very high level the GuC is a piece of firmware which sits between
> > the i915 and the GPU. It offloads some of the scheduling of contexts
> > from the i915 and programs the GPU to submit contexts. The i915
> > communicates with the GuC and the GuC communicates with the
> GPU.
> 
>  May I ask what will GuC command submission do that execlist
> won't/can't
>  do? And what would be the impact on users? Even forgetting the
> troubled
>  history of GuC (instability, performance regression, poor level of user
>  support, 6+ years of trying to upstream it...), adding this much code
>  and doubling the amount of validation needed should come with a
>  rationale making it feel worth it... and I am not seeing here. Would you
>  mind providing the rationale behind this work?
> 
> >
> > GuC submission will be disabled by default on all current upstream
> > platforms behind a module parameter - enable_guc. A value of 3 will
> > enable submission and HuC loading via the GuC. GuC submission
> should
> > work on all gen11+ platforms assuming the GuC firmware is present.
> 
>  What is the plan here when it comes to keeping support for execlist? I
>  am afraid that landing GuC support in Linux is the first step towards
>  killing the execlist, which would force users to use proprietary
>  firmwares that even most Intel engineers have little influence over.
>  Indeed, if "drm/i915/guc: Disable semaphores when using GuC
> scheduling"
>  which states "Disable semaphores when using GuC scheduling as
> semaphores
>  are broken in the current GuC firmware." is anything to go by, it means
>  that even Intel developers seem to prefer working around the GuC
>  firmware, rather than fixing it.
> >>>
> >>> Yes, landing GuC support may be the first step in removing execlist
> >>> support. The inevitable reality is that GPU scheduling is coming and
> >>> likely to be there only path in the not-too-distant future. (See also
> >>> the ongoing thread with AMD about fences.) I'm not going to pass
> >>> judgement on whether or not this is a good thing.  I'm just reading the
> >>> winds and, in my view, this is where things are headed for good or ill.
> >>>
> >>> In answer to the question above, the answer to "what do we gain from
> >>> GuC?" may soon be, "you get to use your GPU."  We're not there yet
> and,
> >>> again, I'm not necessarily advocating for it, but that is likely where
> >>> things are headed.
> >>
> >> This will be a sad day, especially since it seems fundamentally opposed
> >> with any long-term support, on top of taking away user freedom to
> >> fix/tweak their system when Intel won't.
> >>
> >>> A firmware-based submission model isn't a bad design IMO and, aside
> from
> >>> the firmware freedom issues, I think there are actual advantages to the
> >>> model. Immediately, it'll unlock a few features like parallel submission
> >>> (more on that in a bit) and long-running compute because they're
> >>> implemented in GuC and the work to implement them properly in the
> >>> execlist scheduler is highly non-trivial. Longer term, it may (no
> >>> guarantees) unlock some performance by getting the kernel out of the
> way.
> >>
> >> Oh, I definitely agree with firmware-based submission model not being a
> >> bad design. I was even cheering for it in 2015. Experience with it made
> >> me regret that deeply since :s
> >>
> >> But with the DRM scheduler being responsible for most things, I fail to
> >> see what we could offload in the GuC except context switching (like
> >> every other manufacturer). The problem is, the GuC does way more than
> >> just switching registers in bulk, and if the number of revisions of the
> >> GuC is anything to go by, it is way too complex for me to feel
> >> comfortable with it.
> >
> > We need to flesh out that part of the plan more, but we're not going
> > to use drm scheduler for everything. It's only to handle the dma-fence
> > legacy side of things, which means:
> > - timeout handling for batches that take too long
> > - dma_fence dependency sorting/handling
> > - boosting of context from 

Re: [RFC PATCH 43/97] drm/i915/guc: Add lrc descriptor context lookup array

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 12:13:57PM -0700, Matthew Brost wrote:
> Add lrc descriptor context lookup array which can resolve the
> intel_context from the lrc descriptor index. In addition to lookup, it
> can determine in the lrc descriptor context is currently registered with
> the GuC by checking if an entry for a descriptor index is present.
> Future patches in the series will make use of this array.
> 
> Cc: John Harrison 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h|  5 +++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +--
>  2 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index d84f37afb9d8..2eb6c497e43c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -6,6 +6,8 @@
>  #ifndef _INTEL_GUC_H_
>  #define _INTEL_GUC_H_
>  
> +#include "linux/xarray.h"
> +
>  #include "intel_uncore.h"
>  #include "intel_guc_fw.h"
>  #include "intel_guc_fwif.h"
> @@ -47,6 +49,9 @@ struct intel_guc {
>   struct i915_vma *lrc_desc_pool;
>   void *lrc_desc_pool_vaddr;
>  
> + /* guc_id to intel_context lookup */
> + struct xarray context_lookup;

The current code sets a disastrous example, but for stuff like this it's
always good to explain the locking, and who's holding references and how
you're handling cycles. Since I guess the intel_context also holds the
guc_id alive somehow.

Again holds for the entire series, where it makes sense (as in we don't
expect to rewrite the entire code anyway).
-Daniel

> +
>   /* Control params for fw initialization */
>   u32 params[GUC_CTL_MAX_DWORDS];
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 6acc1ef34f92..c2b6d27404b7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct 
> rb_node *rb)
>   return rb_entry(rb, struct i915_priolist, node);
>  }
>  
> -/* Future patches will use this function */
> -__attribute__ ((unused))
>  static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>  {
>   struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct 
> intel_guc *guc, u32 index)
>   return [index];
>  }
>  
> +static inline struct intel_context *__get_context(struct intel_guc *guc, u32 
> id)
> +{
> + struct intel_context *ce = xa_load(>context_lookup, id);
> +
> + GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
> +
> + return ce;
> +}
> +
>  static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>  {
>   u32 size;
> @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc 
> *guc)
>   i915_vma_unpin_and_release(>lrc_desc_pool, I915_VMA_RELEASE_MAP);
>  }
>  
> +static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> +{
> + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +
> + memset(desc, 0, sizeof(*desc));
> + xa_erase_irq(>context_lookup, id);
> +}
> +
> +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> +{
> + return __get_context(guc, id);
> +}
> +
> +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> +struct intel_context *ce)
> +{
> + xa_store_irq(>context_lookup, id, ce, GFP_ATOMIC);
> +}
> +
>  static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>  {
>   /* Leaving stub as this function will be used in future patches */
> @@ -404,6 +430,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
>*/
>   GEM_BUG_ON(!guc->lrc_desc_pool);
>  
> + xa_init_flags(>context_lookup, XA_FLAGS_LOCK_IRQ);
> +
>   return 0;
>  }
>  
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC PATCH 32/97] drm/i915: Introduce i915_sched_engine object

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 12:13:46PM -0700, Matthew Brost wrote:
> Introduce i915_sched_engine object which is lower level data structure
> that i915_scheduler / generic code can operate on without touching
> execlist specific structures. This allows additional submission backends
> to be added without breaking the layer.

Maybe add a comment here that this is defacto a detour since we're now
aiming to use drm/scheduler instead. But also since the current code is a
bit a mess, we expect this detour to be overall faster since we can then
refactor in-tree.

Maybe also highlight this a bit more in the rfc to make sure this is
clear.
-Daniel

> 
> Cc: Daniele Ceraolo Spurio 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_wait.c  |   4 +-
>  drivers/gpu/drm/i915/gt/intel_engine.h|  16 -
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  77 ++--
>  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   4 +-
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c |  10 +-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |  42 +--
>  drivers/gpu/drm/i915/gt/intel_engine_user.c   |   2 +-
>  .../drm/i915/gt/intel_execlists_submission.c  | 350 +++---
>  .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
>  drivers/gpu/drm/i915/gt/mock_engine.c |  17 +-
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  36 +-
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   6 +-
>  drivers/gpu/drm/i915/gt/selftest_lrc.c|   6 +-
>  drivers/gpu/drm/i915/gt/selftest_reset.c  |   2 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  75 ++--
>  drivers/gpu/drm/i915/i915_gpu_error.c |   7 +-
>  drivers/gpu/drm/i915/i915_request.c   |  50 +--
>  drivers/gpu/drm/i915/i915_request.h   |   2 +-
>  drivers/gpu/drm/i915/i915_scheduler.c | 168 -
>  drivers/gpu/drm/i915/i915_scheduler.h |  65 +++-
>  drivers/gpu/drm/i915/i915_scheduler_types.h   |  63 
>  21 files changed, 575 insertions(+), 440 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index 4b9856d5ba14..af1fbf8e2a9a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -104,8 +104,8 @@ static void fence_set_priority(struct dma_fence *fence,
>   engine = rq->engine;
>  
>   rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> - if (engine->schedule)
> - engine->schedule(rq, attr);
> + if (engine->sched_engine->schedule)
> + engine->sched_engine->schedule(rq, attr);
>   rcu_read_unlock();
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
> b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 8d9184920c51..988d9688ae4d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -123,20 +123,6 @@ execlists_active(const struct intel_engine_execlists 
> *execlists)
>   return active;
>  }
>  
> -static inline void
> -execlists_active_lock_bh(struct intel_engine_execlists *execlists)
> -{
> - local_bh_disable(); /* prevent local softirq and lock recursion */
> - tasklet_lock(>tasklet);
> -}
> -
> -static inline void
> -execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
> -{
> - tasklet_unlock(>tasklet);
> - local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
> -}
> -
>  struct i915_request *
>  execlists_unwind_incomplete_requests(struct intel_engine_execlists 
> *execlists);
>  
> @@ -257,8 +243,6 @@ intel_engine_find_active_request(struct intel_engine_cs 
> *engine);
>  
>  u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
>  
> -void intel_engine_init_active(struct intel_engine_cs *engine,
> -   unsigned int subclass);
>  #define ENGINE_PHYSICAL  0
>  #define ENGINE_MOCK  1
>  #define ENGINE_VIRTUAL   2
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 828e1669f92c..ec82a7ec0c8d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -8,6 +8,7 @@
>  #include "gem/i915_gem_context.h"
>  
>  #include "i915_drv.h"
> +#include "i915_scheduler.h"
>  
>  #include "intel_breadcrumbs.h"
>  #include "intel_context.h"
> @@ -326,9 +327,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
> intel_engine_id id)
>   if (engine->context_size)
>   DRIVER_CAPS(i915)->has_logical_contexts = true;
>  
> - /* Nothing to do here, execute in order of dependencies */
> - engine->schedule = NULL;
> -
>   ewma__engine_latency_init(>latency);
>   seqcount_init(>stats.lock);
>  
> @@ -583,9 +581,6 @@ void intel_engine_init_execlists(struct intel_engine_cs 
> *engine)
>   memset(execlists->pending, 0, sizeof(execlists->pending));
>   execlists->active =
>   memset(execlists->inflight, 0, 

Re: [RFC PATCH 20/97] drm/i915/guc: Introduce unified HXG messages

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 12:13:34PM -0700, Matthew Brost wrote:
> From: Michal Wajdeczko 
> 
> New GuC firmware will unify format of MMIO and CTB H2G messages.
> Introduce their definitions now to allow gradual transition of
> our code to match new changes.
> 
> Signed-off-by: Michal Wajdeczko 
> Signed-off-by: Matthew Brost 
> Cc: Michał Winiarski 
> ---
>  .../gpu/drm/i915/gt/uc/abi/guc_messages_abi.h | 226 ++
>  1 file changed, 226 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h 
> b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> index 775e21f3058c..1c264819aa03 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_messages_abi.h
> @@ -6,6 +6,232 @@
>  #ifndef _ABI_GUC_MESSAGES_ABI_H
>  #define _ABI_GUC_MESSAGES_ABI_H
>  
> +/**
> + * DOC: HXG Message

These aren't useful if we don't pull them in somewhere in the
Documentation/gpu hierarchy. General comment, and also please check that
it all renders correctly still.

btw if you respin a patch not originally by you we generally add a (v1) to
the original s-o-b line (or whever the version split was) and explain in
the usual changelog in the commit message what was changed.

This holds for the entire series ofc.
-Daniel

> + *
> + * All messages exchanged with GuC are defined using 32 bit dwords.
> + * First dword is treated as a message header. Remaining dwords are optional.
> + *
> + * .. _HXG Message:
> + *
> + *  
> +---+---+--+
> + *  |   | Bits  | Description
>   |
> + *  
> +===+===+==+
> + *  |   |   |
>   |
> + *  | 0 |31 | **ORIGIN** - originator of the message 
>   |
> + *  |   |   |   - _`GUC_HXG_ORIGIN_HOST` = 0 
>   |
> + *  |   |   |   - _`GUC_HXG_ORIGIN_GUC` = 1  
>   |
> + *  |   |   |
>   |
> + *  |   
> +---+--+
> + *  |   | 30:28 | **TYPE** - message type
>   |
> + *  |   |   |   - _`GUC_HXG_TYPE_REQUEST` = 0
>   |
> + *  |   |   |   - _`GUC_HXG_TYPE_EVENT` = 1  
>   |
> + *  |   |   |   - _`GUC_HXG_TYPE_NO_RESPONSE_BUSY` = 3   
>   |
> + *  |   |   |   - _`GUC_HXG_TYPE_NO_RESPONSE_RETRY` = 5  
>   |
> + *  |   |   |   - _`GUC_HXG_TYPE_RESPONSE_FAILURE` = 6   
>   |
> + *  |   |   |   - _`GUC_HXG_TYPE_RESPONSE_SUCCESS` = 7   
>   |
> + *  |   
> +---+--+
> + *  |   |  27:0 | **AUX** - auxiliary data (depends TYPE)
>   |
> + *  
> +---+---+--+
> + *  | 1 |  31:0 | optional payload (depends on TYPE) 
>   |
> + *  +---+---+
>   |
> + *  |...|   |
>   |
> + *  +---+---+
>   |
> + *  | n |  31:0 |
>   |
> + *  
> +---+---+--+
> + */
> +
> +#define GUC_HXG_MSG_MIN_LEN  1u
> +#define GUC_HXG_MSG_0_ORIGIN (0x1 << 31)
> +#define   GUC_HXG_ORIGIN_HOST0u
> +#define   GUC_HXG_ORIGIN_GUC 1u
> +#define GUC_HXG_MSG_0_TYPE   (0x7 << 28)
> +#define   GUC_HXG_TYPE_REQUEST   0u
> +#define   GUC_HXG_TYPE_EVENT 1u
> +#define   GUC_HXG_TYPE_NO_RESPONSE_BUSY  3u
> +#define   GUC_HXG_TYPE_NO_RESPONSE_RETRY 5u
> +#define   GUC_HXG_TYPE_RESPONSE_FAILURE  6u
> +#define   GUC_HXG_TYPE_RESPONSE_SUCCESS  7u
> +#define GUC_HXG_MSG_0_AUX(0xfff << 0)
> +
> +/**
> + * DOC: HXG Request
> + *
> + * The `HXG Request`_ message should be used to initiate synchronous activity
> + * for which confirmation or return data is expected.
> + *
> + * The recipient of this message shall use `HXG Response`_, `HXG Failure`_
> + * or `HXG Retry`_ message as a definite reply, and may use `HXG Busy`_
> + * message as a intermediate reply.
> + *
> + * Format of @DATA0 and all @DATAn fields depends on the @ACTION code.
> + *
> + * _HXG Request:
> + *
> + *  
> +---+---+--+
> + *  |   | Bits  | Description   

Re: [Intel-gfx] [RFC PATCH 5/5] drm/i915: Update execbuf IOCTL to accept N BBs

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 10:30:49AM -0700, Matthew Brost wrote:
> Add I915_EXEC_NUMBER_BB_* to drm_i915_gem_execbuffer2.flags which allows
> submitting N BBs per IOCTL.
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 

I dropped my big question on the previous patch already, I'll check this
out again when it's all squashed into the parallel extension patch so we
have everything in one commit.
-Daniel

> ---
>  include/uapi/drm/i915_drm.h | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0175b12b33b8..d3072cad4a7e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1291,7 +1291,26 @@ struct drm_i915_gem_execbuffer2 {
>   */
>  #define I915_EXEC_USE_EXTENSIONS (1 << 21)
>  
> -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1))
> +/*
> + * Number of BB in execbuf2 IOCTL - 1, used to submit more than BB in a 
> single
> + * execbuf2 IOCTL.
> + *
> + * Return -EINVAL if more than 1 BB (value 0) is specified if
> + * I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT hasn't been called on the gem
> + * context first. Also returns -EINVAL if gem context has been setup with
> + * I915_PARALLEL_NO_PREEMPT_MID_BATCH and the number BBs not equal to the 
> total
> + * number hardware contexts in the gem context.
> + */
> +#define I915_EXEC_NUMBER_BB_LSB  (22)
> +#define I915_EXEC_NUMBER_BB_MASK (0x3f << I915_EXEC_NUMBER_BB_LSB)
> +#define I915_EXEC_NUMBER_BB_MSB  (27)
> +#define i915_execbuffer2_set_number_bb(eb2, num_bb) \
> + (eb2).flags = ((eb2).flags & ~I915_EXEC_NUMBER_BB_MASK) | \
> + (((num_bb - 1) << I915_EXEC_NUMBER_BB_LSB) & I915_EXEC_NUMBER_BB_MASK)
> +#define i915_execbuffer2_get_number_bb(eb2) \
> + eb2).flags & I915_EXEC_NUMBER_BB_MASK) >> I915_EXEC_NUMBER_BB_LSB) 
> + 1)
> +
> +#define __I915_EXEC_UNKNOWN_FLAGS (-(1 << (I915_EXEC_NUMBER_BB_MSB + 1)))
>  
>  #define I915_EXEC_CONTEXT_ID_MASK(0x)
>  #define i915_execbuffer2_set_context_id(eb2, context) \
> -- 
> 2.28.0
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [RFC PATCH 1/5] drm/doc/rfc: i915 GuC submission / DRM scheduler integration plan

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 03:58:43PM +0100, Daniel Stone wrote:
> Hi,
> 
> On Tue, 11 May 2021 at 15:34, Daniel Vetter  wrote:
> > On Thu, May 06, 2021 at 10:30:45AM -0700, Matthew Brost wrote:
> > > +No major changes are required to the uAPI for basic GuC submission. The 
> > > only
> > > +change is a new scheduler attribute: 
> > > I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
> > > +This attribute indicates the 2k i915 user priority levels are statically 
> > > mapped
> > > +into 3 levels as follows:
> > > +
> > > +* -1k to -1 Low priority
> > > +* 0 Medium priority
> > > +* 1 to 1k High priority
> > > +
> > > +This is needed because the GuC only has 4 priority bands. The highest 
> > > priority
> > > +band is reserved with the kernel. This aligns with the DRM scheduler 
> > > priority
> > > +levels too.
> >
> > Please Cc: mesa and get an ack from Jason Ekstrand or Ken Graunke on this,
> > just to be sure.
> 
> A reference to the actual specs this targets would help. I don't have
> oneAPI to hand if it's relevant, but the two in graphics world are
> https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
> and 
> https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
> - both of them pretty much say that the implementation may do anything
> or nothing at all, so this isn't a problem for spec conformance, only
> a matter of user priority (sorry).

Good point, Matt please also include the level0 spec here (aside from
egl/vk extensions). Might need to ping Michal Mrozek internally and cc:
him on this one here too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v6 01/16] drm/ttm: Remap all page faults to per process dummy page.

2021-05-11 Thread Christian König




Am 11.05.21 um 16:44 schrieb Andrey Grodzovsky:


On 2021-05-11 2:38 a.m., Christian König wrote:

Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky:

On device removal reroute all CPU mappings to dummy page.

v3:
Remove loop to find DRM file and instead access it
by vma->vm_file->private_data. Move dummy page installation
into a separate function.

v4:
Map the entire BOs VA space into on demand allocated dummy page
on the first fault for that BO.

v5: Remove duplicate return.

v6: Polish ttm_bo_vm_dummy_page, remove superflous code.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 57 
-

  include/drm/ttm/ttm_bo_api.h    |  2 ++
  2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
b/drivers/gpu/drm/ttm/ttm_bo_vm.c

index b31b18058965..e5a9615519d1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -34,6 +34,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
@@ -380,19 +382,72 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct 
vm_fault *vmf,

  }
  EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
  +static void ttm_bo_release_dummy_page(struct drm_device *dev, 
void *res)

+{
+    struct page *dummy_page = (struct page *)res;
+
+    __free_page(dummy_page);
+}
+
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot)
+{
+    struct vm_area_struct *vma = vmf->vma;
+    struct ttm_buffer_object *bo = vma->vm_private_data;
+    struct drm_device *ddev = bo->base.dev;
+    vm_fault_t ret = VM_FAULT_NOPAGE;
+    unsigned long address;
+    unsigned long pfn;
+    struct page *page;
+
+    /* Allocate new dummy page to map all the VA range in this VMA 
to it*/

+    page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+    if (!page)
+    return VM_FAULT_OOM;
+
+    pfn = page_to_pfn(page);
+
+    /* Prefault the entire VMA range right away to avoid further 
faults */
+    for (address = vma->vm_start; address < vma->vm_end; address += 
PAGE_SIZE) {

+



+    if (unlikely(address >= vma->vm_end))
+    break;


That extra check can be removed as far as I can see.



+
+    if (vma->vm_flags & VM_MIXEDMAP)
+    ret = vmf_insert_mixed_prot(vma, address,
+    __pfn_to_pfn_t(pfn, PFN_DEV),
+    prot);
+    else
+    ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+    }
+



+    /* Set the page to be freed using drmm release action */
+    if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, 
page))

+    return VM_FAULT_OOM;


You should probably move that before inserting the page into the VMA 
and also free the allocated page if it goes wrong.



drmm_add_action_or_reset will automatically release the page if the 
add action fails, that the 'reset' part of the function.


Ah! Ok that makes it even more important that you do this before you 
insert the page into any VMA.


Otherwise userspace has access to a freed page with the rather ugly 
consequences.


Christian.



Andrey




Apart from that patch looks good to me,
Christian.


+
+    return ret;
+}
+EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
+
  vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
  {
  struct vm_area_struct *vma = vmf->vma;
  pgprot_t prot;
  struct ttm_buffer_object *bo = vma->vm_private_data;
+    struct drm_device *ddev = bo->base.dev;
  vm_fault_t ret;
+    int idx;
    ret = ttm_bo_vm_reserve(bo, vmf);
  if (ret)
  return ret;
    prot = vma->vm_page_prot;
-    ret = ttm_bo_vm_fault_reserved(vmf, prot, 
TTM_BO_VM_NUM_PREFAULT, 1);

+    if (drm_dev_enter(ddev, )) {
+    ret = ttm_bo_vm_fault_reserved(vmf, prot, 
TTM_BO_VM_NUM_PREFAULT, 1);

+    drm_dev_exit(idx);
+    } else {
+    ret = ttm_bo_vm_dummy_page(vmf, prot);
+    }
  if (ret == VM_FAULT_RETRY && !(vmf->flags & 
FAULT_FLAG_RETRY_NOWAIT))

  return ret;
  diff --git a/include/drm/ttm/ttm_bo_api.h 
b/include/drm/ttm/ttm_bo_api.h

index 639521880c29..254ede97f8e3 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -620,4 +620,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, 
unsigned long addr,

   void *buf, int len, int write);
  bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all);
  +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t 
prot);

+
  #endif






Re: [Intel-gfx] [RFC PATCH 4/5] drm/i915: Introduce 'set parallel submit' extension

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 10:30:48AM -0700, Matthew Brost wrote:
> i915_drm.h updates for 'set parallel submit' extension.
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 
> ---
>  include/uapi/drm/i915_drm.h | 126 
>  1 file changed, 126 insertions(+)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 26d2e135aa31..0175b12b33b8 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1712,6 +1712,7 @@ struct drm_i915_gem_context_param {
>   * Extensions:
>   *   i915_context_engines_load_balance 
> (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
>   *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
> + *   i915_context_engines_parallel_submit 
> (I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)

Hm just relalized, but I don't think this hyperlinsk correctly, and I'm
also not sure this formats very well as a nice list. Using item lists
should look pretty nice like we're doing for the various kms properties,
e.g.

FOO:
  Explain what FOO does

BAR:
  Explain what BAR does. struct bar also automatically generates a link

Please check with make htmldocs and polish this a bit (might need a small
prep patch).

>   */
>  #define I915_CONTEXT_PARAM_ENGINES   0xa
>  
> @@ -1894,9 +1895,134 @@ struct i915_context_param_engines {
>   __u64 extensions; /* linked chain of extension blocks, 0 terminates */
>  #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0 /* see 
> i915_context_engines_load_balance */
>  #define I915_CONTEXT_ENGINES_EXT_BOND 1 /* see i915_context_engines_bond */
> +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> i915_context_engines_parallel_submit */
>   struct i915_engine_class_instance engines[0];
>  } __attribute__((packed));
>  
> +/*
> + * i915_context_engines_parallel_submit:
> + *
> + * Setup a gem context to allow multiple BBs to be submitted in a single 
> execbuf
> + * IOCTL. Those BBs will then be scheduled to run on the GPU in parallel.
> + *
> + * All hardware contexts in the engine set are configured for parallel
> + * submission (i.e. once this gem context is configured for parallel 
> submission,
> + * all the hardware contexts, regardless if a BB is available on each 
> individual
> + * context, will be submitted to the GPU in parallel). A user can submit BBs 
> to
> + * subset of the hardware contexts, in a single execbuf IOCTL, but it is not
> + * recommended as it may reserve physical engines with nothing to run on 
> them.
> + * Highly recommended to configure the gem context with N hardware contexts 
> then
> + * always submit N BBs in a single IOCTL.
> + *
> + * Their are two currently defined ways to control the placement of the
> + * hardware contexts on physical engines: default behavior (no flags) and
> + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the
> + * future as new hardware / use cases arise. Details of how to use this
> + * interface below above the flags.
> + *
> + * Returns -EINVAL if hardware context placement configuration invalid or if 
> the
> + * placement configuration isn't supported on the platform / submission
> + * interface.
> + * Returns -ENODEV if extension isn't supported on the platform / submission
> + * inteface.
> + */
> +struct i915_context_engines_parallel_submit {
> + struct i915_user_extension base;

Ok this is good, since it makes sure we can't possible use this in
CTX_SETPARAM.

> +
> +/*
> + * Default placement behvavior (currently unsupported):
> + *
> + * Rather than restricting parallel submission to a single class with a
> + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a mode 
> that
> + * enables parallel submission across multiple engine classes. In this case 
> each
> + * context's logical engine mask indicates where that context can placed. It 
> is
> + * implied in this mode that all contexts have mutual exclusive placement 
> (e.g.
> + * if one context is running CS0 no other contexts can run on CS0).
> + *
> + * Example 1 pseudo code:
> + * CSX[Y] = engine class X, logical instance Y
> + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + * set_engines(INVALID, INVALID)
> + * set_load_balance(engine_index=0, num_siblings=2, engines=CS0[0],CS0[1])
> + * set_load_balance(engine_index=1, num_siblings=2, engines=CS1[0],CS1[1])
> + * set_parallel()
> + *
> + * Results in the following valid placements:
> + * CS0[0], CS1[0]
> + * CS0[0], CS1[1]
> + * CS0[1], CS1[0]
> + * CS0[1], CS1[1]
> + *
> + * Example 2 pseudo code:
> + * CS[X] = generic engine of same class, logical instance X
> + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + * set_engines(INVALID, INVALID)
> + * set_load_balance(engine_index=0, num_siblings=3, 
> engines=CS[0],CS[1],CS[2])
> + * set_load_balance(engine_index=1, num_siblings=3, 
> engines=CS[0],CS[1],CS[2])
> + * 

Re: [Intel-gfx] [RFC PATCH 1/5] drm/doc/rfc: i915 GuC submission / DRM scheduler integration plan

2021-05-11 Thread Daniel Stone
Hi,

On Tue, 11 May 2021 at 15:34, Daniel Vetter  wrote:
> On Thu, May 06, 2021 at 10:30:45AM -0700, Matthew Brost wrote:
> > +No major changes are required to the uAPI for basic GuC submission. The 
> > only
> > +change is a new scheduler attribute: 
> > I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP.
> > +This attribute indicates the 2k i915 user priority levels are statically 
> > mapped
> > +into 3 levels as follows:
> > +
> > +* -1k to -1 Low priority
> > +* 0 Medium priority
> > +* 1 to 1k High priority
> > +
> > +This is needed because the GuC only has 4 priority bands. The highest 
> > priority
> > +band is reserved with the kernel. This aligns with the DRM scheduler 
> > priority
> > +levels too.
>
> Please Cc: mesa and get an ack from Jason Ekstrand or Ken Graunke on this,
> just to be sure.

A reference to the actual specs this targets would help. I don't have
oneAPI to hand if it's relevant, but the two in graphics world are
https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
and 
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/chap5.html#devsandqueues-priority
- both of them pretty much say that the implementation may do anything
or nothing at all, so this isn't a problem for spec conformance, only
a matter of user priority (sorry).

Cheers,
Daniel


Re: [RFC PATCH 3/5] drm/i915: Expose logical engine instance to user

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 10:30:47AM -0700, Matthew Brost wrote:
> Expose logical engine instance to user via query engine info IOCTL. This
> is required for split-frame workloads as these need to be placed on
> engines in a logically contiguous order. The logical mapping can change
> based on fusing. Rather than having user have knowledge of the fusing we
> simply just expose the logical mapping with the existing query engine
> info IOCTL.
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 
> ---
>  include/uapi/drm/i915_drm.h | 7 ++-

Two things on all these 3 patches:

- Until we've merged the uapi it shouldn't show up in uapi headers. See
  what Matt A. has done with a fake local header in Documentation/gpu/rfc
  which you can pull in.

- Since this one is tiny I think just the text in the rfc is good enough,
  I'd drop this.

- Squash the others in with the parallel submit rfc patch so that the
  structs and long-form text are all in one patch please, makes reviewing
  the overall thing a bit simpler. Rule is to have a complete change per
  patch, and then not split things further.

>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 9f331ad629f5..26d2e135aa31 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -2396,14 +2396,19 @@ struct drm_i915_engine_info {
>  
>   /** @flags: Engine flags. */
>   __u64 flags;
> +#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE(1 << 0)
>  
>   /** @capabilities: Capabilities of this engine. */
>   __u64 capabilities;
>  #define I915_VIDEO_CLASS_CAPABILITY_HEVC (1 << 0)
>  #define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC  (1 << 1)
>  
> + /** Logical engine instance */

I think in the final version that we merge with the uapi this should:
- explain why we need this
- link to relevant other uapi like the paralle submit extension

Cheers, Daniel

> + __u16 logical_instance;
> +
>   /** @rsvd1: Reserved fields. */
> - __u64 rsvd1[4];
> + __u16 rsvd1[3];
> + __u64 rsvd2[3];
>  };
>  
>  /**
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v6 04/16] drm/amdkfd: Split kfd suspend from devie exit

2021-05-11 Thread Andrey Grodzovsky




On 2021-05-11 2:40 a.m., Christian König wrote:

Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky:

Helps to expdite HW related stuff to amdgpu_pci_remove

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_device.c    | 3 ++-
  3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c

index 5f6696a3c778..2b06dee9a0ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -170,7 +170,7 @@ void amdgpu_amdkfd_device_init(struct 
amdgpu_device *adev)

  }
  }
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
+void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev)
  {
  if (adev->kfd.dev) {
  kgd2kfd_device_exit(adev->kfd.dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h

index 14f68c028126..f8e10af99c28 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -127,7 +127,7 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device 
*adev,

  const void *ih_ring_entry);
  void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
  void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
+void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev);
  int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum 
kgd_engine_type engine,

  uint32_t vmid, uint64_t gpu_addr,
  uint32_t *ib_cmd, uint32_t ib_len);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index 357b9bf62a1c..ab6d2a43c9a3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -858,10 +858,11 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
  return kfd->init_complete;
  }
+
+


Looks like unnecessary white space change to me.


  void kgd2kfd_device_exit(struct kfd_dev *kfd)
  {
  if (kfd->init_complete) {
-    kgd2kfd_suspend(kfd, false);


Where is the call to this function now?

Christian.


In patch 'drm/amdgpu: Add early fini callback' in
amdgpu_device_ip_fini_early->amdgpu_amdkfd_suspend->kgd2kfd_suspend

Andrey




  device_queue_manager_uninit(kfd->dqm);
  kfd_interrupt_exit(kfd);
  kfd_topology_remove_device(kfd);




Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Russell King - ARM Linux admin
On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.

You really are not supposed to be doing anything with component devices
once they have been unbound. You can do stuff with them only between the
bind() and the unbind() callbacks for the host device.

Access to the host devices outside of that is totally undefined and
should not be done.

The shutdown callback should be fine as long as the other devices are
still bound, but there will be implications if the shutdown order
matters.

However, randomly pulling devices around in the DPM list sounds to me
like a very bad idea. What happens if such re-orderings result in a
child device being shutdown after a parent device has been shut down?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [Intel-gfx] [RFC PATCH 2/5] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 10:30:46AM -0700, Matthew Brost wrote:
> Add entry fpr i915 new parallel submission uAPI plan.
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 
> ---
>  Documentation/gpu/rfc/i915_scheduler.rst | 56 +++-
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/gpu/rfc/i915_scheduler.rst 
> b/Documentation/gpu/rfc/i915_scheduler.rst
> index fa6780a11c86..e3455b33edfe 100644
> --- a/Documentation/gpu/rfc/i915_scheduler.rst
> +++ b/Documentation/gpu/rfc/i915_scheduler.rst
> @@ -13,7 +13,8 @@ i915 with the DRM scheduler is:
> modparam enable_guc
>   * Lots of rework will need to be done to integrate with DRM scheduler so
> no need to nit pick everything in the code, it just should be
> -   functional and not regress execlists
> +   functional, no major coding style / layering errors, and not regress
> +   execlists

I guess this hunk should be in the previous patch?

>   * Update IGTs / selftests as needed to work with GuC submission
>   * Enable CI on supported platforms for a baseline
>   * Rework / get CI heathly for GuC submission in place as needed
> @@ -67,4 +68,55 @@ levels too.
>  
>  New parallel submission uAPI
>  
> -Details to come in a following patch.
> +The existing bonding uAPI is completely broken with GuC submission because
> +whether a submission is a single context submit or parallel submit isn't 
> known
> +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
> +contexts in parallel with the GuC the context must be explictly registered 
> with
> +N contexts and all N contexts must be submitted in a single command to the 
> GuC.
> +This interfaces doesn't support dynamically changing between N contexts as 
> the
> +bonding uAPI does. Hence the need for a new parallel submission interface. 
> Also
> +the legacy bonding uAPI is quite confusing and not intuitive at all.

I think you should sit together with Jason on irc or so for a bit and get
an earful of how it's all broken irrespective of GuC submission or not.
Just to hammer in our case :-)

> +
> +The new parallel submission uAPI consists of 3 parts:
> +
> +* Export engines logical mapping
> +* A 'set_parallel' extension to configure contexts for parallel
> +  submission
> +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
> +
> +Export engines logical mapping
> +--
> +Certain use cases require BBs to be placed on engine instances in logical 
> order
> +(e.g. split-frame on gen11+). The logical mapping of engine instances can 
> change
> +based on fusing. Rather than making UMDs be aware of fusing, simply expose 
> the
> +logical mapping with the existing query engine info IOCTL. Also the GuC
> +submission interface currently only supports submitting multiple contexts to
> +engines in logical order.

Maybe highlight more that this is a new restriction with GuC compared to
execlist, which is why we need to expose this information to userspace.
Also on the platforms thus far supported in upstream there's at most 2
engines of the same type, so really not an issue.

> +
> +A single bit will be added to drm_i915_engine_info.flags indicating that the
> +logical instance has been returned and a new field,
> +drm_i915_engine_info.logical_instance, returns the logical instance.
> +
> +A 'set_parallel' extension to configure contexts for parallel submission
> +
> +The 'set_parallel' extension configures N contexts for parallel submission. 
> It
> +is setup step that should be called before using any of the contexts. See
> +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
> +similar existing examples. Once the N contexts are configured for parallel
> +submission the execbuf2 IOCTL can be called submiting 1-N BBs in a single 
> IOCTL.
> +Although submitting less than N BBs is allowed it is not recommended as that
> +will likely leave parts of the hardware reserved and idle. Initially only
> +support GuC submission. Execlist support can be added later if needed.

Can we just require that you always submit N batchbuffers, or does this
create a problem for userspace? Allowing things just because is generally
not a good idea with uapi, it's better to limit and then allow when
there's a need.

Ofc if we already have a need then explain why and that's all fine.

Also detailed comments on the kerneldoc I'll do in the next patches.

> +
> +Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
> +i915_context_engines_parallel_submit to the uAPI to implement this extension.
> +
> +Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
> +---
> +Contexts that have been configured with the 

Re: [PATCH v6 01/16] drm/ttm: Remap all page faults to per process dummy page.

2021-05-11 Thread Andrey Grodzovsky



On 2021-05-11 2:38 a.m., Christian König wrote:

Am 10.05.21 um 18:36 schrieb Andrey Grodzovsky:

On device removal reroute all CPU mappings to dummy page.

v3:
Remove loop to find DRM file and instead access it
by vma->vm_file->private_data. Move dummy page installation
into a separate function.

v4:
Map the entire BOs VA space into on demand allocated dummy page
on the first fault for that BO.

v5: Remove duplicate return.

v6: Polish ttm_bo_vm_dummy_page, remove superflous code.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 57 -
  include/drm/ttm/ttm_bo_api.h    |  2 ++
  2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
b/drivers/gpu/drm/ttm/ttm_bo_vm.c

index b31b18058965..e5a9615519d1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -34,6 +34,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
@@ -380,19 +382,72 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct 
vm_fault *vmf,

  }
  EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
  +static void ttm_bo_release_dummy_page(struct drm_device *dev, void 
*res)

+{
+    struct page *dummy_page = (struct page *)res;
+
+    __free_page(dummy_page);
+}
+
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot)
+{
+    struct vm_area_struct *vma = vmf->vma;
+    struct ttm_buffer_object *bo = vma->vm_private_data;
+    struct drm_device *ddev = bo->base.dev;
+    vm_fault_t ret = VM_FAULT_NOPAGE;
+    unsigned long address;
+    unsigned long pfn;
+    struct page *page;
+
+    /* Allocate new dummy page to map all the VA range in this VMA 
to it*/

+    page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+    if (!page)
+    return VM_FAULT_OOM;
+
+    pfn = page_to_pfn(page);
+
+    /* Prefault the entire VMA range right away to avoid further 
faults */
+    for (address = vma->vm_start; address < vma->vm_end; address += 
PAGE_SIZE) {

+



+    if (unlikely(address >= vma->vm_end))
+    break;


That extra check can be removed as far as I can see.



+
+    if (vma->vm_flags & VM_MIXEDMAP)
+    ret = vmf_insert_mixed_prot(vma, address,
+    __pfn_to_pfn_t(pfn, PFN_DEV),
+    prot);
+    else
+    ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+    }
+



+    /* Set the page to be freed using drmm release action */
+    if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, 
page))

+    return VM_FAULT_OOM;


You should probably move that before inserting the page into the VMA 
and also free the allocated page if it goes wrong.



drmm_add_action_or_reset will automatically release the page if the add 
action fails, that the 'reset' part of the function.


Andrey




Apart from that patch looks good to me,
Christian.


+
+    return ret;
+}
+EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
+
  vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
  {
  struct vm_area_struct *vma = vmf->vma;
  pgprot_t prot;
  struct ttm_buffer_object *bo = vma->vm_private_data;
+    struct drm_device *ddev = bo->base.dev;
  vm_fault_t ret;
+    int idx;
    ret = ttm_bo_vm_reserve(bo, vmf);
  if (ret)
  return ret;
    prot = vma->vm_page_prot;
-    ret = ttm_bo_vm_fault_reserved(vmf, prot, 
TTM_BO_VM_NUM_PREFAULT, 1);

+    if (drm_dev_enter(ddev, )) {
+    ret = ttm_bo_vm_fault_reserved(vmf, prot, 
TTM_BO_VM_NUM_PREFAULT, 1);

+    drm_dev_exit(idx);
+    } else {
+    ret = ttm_bo_vm_dummy_page(vmf, prot);
+    }
  if (ret == VM_FAULT_RETRY && !(vmf->flags & 
FAULT_FLAG_RETRY_NOWAIT))

  return ret;
  diff --git a/include/drm/ttm/ttm_bo_api.h 
b/include/drm/ttm/ttm_bo_api.h

index 639521880c29..254ede97f8e3 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -620,4 +620,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, 
unsigned long addr,

   void *buf, int len, int write);
  bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all);
  +vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
+
  #endif




Re: [PATCH v3 1/1] kernel.h: Split out panic and oops helpers

2021-05-11 Thread Alex Elder

On 5/11/21 2:41 AM, Andy Shevchenko wrote:

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although
for the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

Signed-off-by: Andy Shevchenko 
Reviewed-by: Bjorn Andersson 
Acked-by: Mike Rapoport 
Acked-by: Corey Minyard 
Acked-by: Christian Brauner 
Acked-by: Arnd Bergmann 
Acked-by: Kees Cook 
Acked-by: Wei Liu 
Acked-by: Rasmus Villemoes 
Co-developed-by: Andrew Morton 
Signed-off-by: Andrew Morton 
Acked-by: Sebastian Reichel 
Acked-by: Luis Chamberlain 
Acked-by: Stephen Boyd 
Acked-by: Thomas Bogendoerfer 
Acked-by: Helge Deller  # parisc
---
v3: rebased on top of v5.13-rc1, collected a few more tags

Note WRT Andrew's SoB tag above: I have added it since part of the cases
I took from him. Andrew, feel free to amend or tell me how you want me
to do.



Acked-by: Alex Elder 

. . .


diff --git a/drivers/net/ipa/ipa_smp2p.c b/drivers/net/ipa/ipa_smp2p.c
index a5f7a79a1923..34b68dc43886 100644
--- a/drivers/net/ipa/ipa_smp2p.c
+++ b/drivers/net/ipa/ipa_smp2p.c
@@ -8,6 +8,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  


. . .


Re: [PATCH 1/1] drm/vc4: Remove redundant error printing in vc4_ioremap_regs()

2021-05-11 Thread Maxime Ripard
On Tue, May 11, 2021 at 05:29:23PM +0800, Zhen Lei wrote:
> When devm_ioremap_resource() fails, a clear enough error message will be
> printed by its subfunction __devm_ioremap_resource(). The error
> information contains the device name, failure cause, and possibly resource
> information.
> 
> Therefore, remove the error printing here to simplify code and reduce the
> binary size.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Zhen Lei 

Merged, thanks

Maxime


signature.asc
Description: PGP signature


Re: [Intel-gfx] [RFC PATCH 1/5] drm/doc/rfc: i915 GuC submission / DRM scheduler integration plan

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 10:30:45AM -0700, Matthew Brost wrote:
> Add entry for i915 GuC submission / DRM scheduler integration plan.
> Follow up patch with details of new parallel submission uAPI to come.
> 
> Cc: Jon Bloomfield 
> Cc: Jason Ekstrand 
> Cc: Dave Airlie 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Matthew Brost 

Would be good to Cc: some drm/scheduler folks here for the next round:

$ scripts/get_maintainer.pl -f -- drivers/gpu/drm/scheduler/

says we have maybe the following missing:

"Christian König" 
Luben Tuikov 
Alex Deucher 
Steven Price 

Lee Jones did a ton of warning fixes over the entire tree, so doesn't care
about drm/scheduler design directly.

> ---
>  Documentation/gpu/rfc/i915_scheduler.rst | 70 
>  Documentation/gpu/rfc/index.rst  |  4 ++
>  2 files changed, 74 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_scheduler.rst
> 
> diff --git a/Documentation/gpu/rfc/i915_scheduler.rst 
> b/Documentation/gpu/rfc/i915_scheduler.rst
> new file mode 100644
> index ..fa6780a11c86
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_scheduler.rst
> @@ -0,0 +1,70 @@
> +=
> +I915 GuC Submission/DRM Scheduler Section
> +=
> +
> +Upstream plan
> +=
> +For upstream the overall plan for landing GuC submission and integrating the
> +i915 with the DRM scheduler is:
> +
> +* Merge basic GuC submission
> + * Basic submission support for all gen11+ platforms
> + * Not enabled by default on any current platforms but can be enabled via
> +   modparam enable_guc
> + * Lots of rework will need to be done to integrate with DRM scheduler so
> +   no need to nit pick everything in the code, it just should be
> +   functional and not regress execlists
> + * Update IGTs / selftests as needed to work with GuC submission
> + * Enable CI on supported platforms for a baseline
> + * Rework / get CI heathly for GuC submission in place as needed
> +* Merge new parallel submission uAPI
> + * Bonding uAPI completely incompatible with GuC submission

Maybe clarify that this isn't the only issue with the bonding uapi, so
perhaps add "Plus it has severe design issues in general, which is why we
want to retire it no matter what". Or something like that. Not sure we
should go into full details here, maybe as part of the next patch about
parallel submit and all that.

> + * New uAPI adds I915_CONTEXT_ENGINES_EXT_PARALLEL context setup step
> +   which configures contexts N wide
> + * After I915_CONTEXT_ENGINES_EXT_PARALLEL a user can submit N batches to
> +   a context in a single execbuf IOCTL and the batches run on the GPU in
> +   paralllel
> + * Initially only for GuC submission but execlists can be supported if
> +   needed
> +* Convert the i915 to use the DRM scheduler
> + * GuC submission backend fully integrated with DRM scheduler
> + * All request queues removed from backend (e.g. all backpressure
> +   handled in DRM scheduler)
> + * Resets / cancels hook in DRM scheduler
> + * Watchdog hooks into DRM scheduler
> + * Lots of complexity of the GuC backend can be pulled out once
> +   integrated with DRM scheduler (e.g. state machine gets
> +   simplier, locking gets simplier, etc...)
> + * Execlist backend will do the minimum required to hook in the DRM
> +   scheduler so it can live next to the fully integrated GuC backend
> + * Legacy interface
> + * Features like timeslicing / preemption / virtual engines would
> +   be difficult to integrate with the DRM scheduler and these
> +   features are not required for GuC submission as the GuC does
> +   these things for us
> + * ROI low on fully integrating into DRM scheduler
> + * Fully integrating would add lots of complexity to DRM
> +   scheduler
> + * Port i915 priority inheritance / boosting feature in DRM scheduler

Maybe a few words on what this does and why we care? Just so drm/scheduler
people know what's coming.

> + * Remove in-order completion assumptions from DRM scheduler

I think it'd be good to put a few words here why we need this. We want to
use drm scheduler for dependencies, but rely on the hw/fw scheduler (or
well backend for execlist) to handle preemption, round-robin and that kind
of stuff. Hence we want to have all runnable requests in the backend
(excluding backpressure and stuff like that), and they can complete
out-of-order.

Maybe also highlight this one in the commit message to get drm/scheduler
folks' attention on this and the previous one for discussion.

> + * Pull out i915 priority levels and use DRM priority levels
> + * Optimize DRM scheduler as needed


Re: [PATCH v7 0/3] drm/i915/display: Try YCbCr420 color when RGB fails

2021-05-11 Thread Ville Syrjälä
On Mon, May 10, 2021 at 03:33:46PM +0200, Werner Sembach wrote:
> When encoder validation of a display mode fails, retry with less bandwidth
> heavy YCbCr420 color mode, if available. This enables some HDMI 1.4 setups
> to support 4k60Hz output, which previously failed silently.
> 
> AMDGPU had nearly the exact same issue. This problem description is
> therefore copied from my commit message of the AMDGPU patch.
> 
> On some setups, while the monitor and the gpu support display modes with
> pixel clocks of up to 600MHz, the link encoder might not. This prevents
> YCbCr444 and RGB encoding for 4k60Hz, but YCbCr420 encoding might still be
> possible. However, which color mode is used is decided before the link
> encoder capabilities are checked. This patch fixes the problem by retrying
> to find a display mode with YCbCr420 enforced and using it, if it is
> valid.
> 
> This patchset is revision 7. Fixed a rebase issue in 1/3 and moved message
> from error output to debug output in 2/3.

Looks good and CI seem shappy. 

Series pushed to drm-intel-next. Thanks.

-- 
Ville Syrjälä
Intel


Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend

2021-05-11 Thread Thomas Hellström



On 5/11/21 4:09 PM, Christian König wrote:



Am 11.05.21 um 16:06 schrieb Thomas Hellström (Intel):


On 5/11/21 3:58 PM, Christian König wrote:

Am 11.05.21 um 15:25 schrieb Thomas Hellström:

Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system 
region

over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system 
region,

as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

There are some changes to TTM to allow for purging system memory 
buffer

objects and to refuse swapping of some objects: Unfortunately i915 gem
still relies heavily on short-term object pinning, and we've chosen to
keep short-term-pinned buffer objects on the TTM LRU lists for now,
meaning that we need some sort of mechanism to tell TTM they are not
swappable. A longer term goal is to get rid of the short-term pinning.


Well just use the eviction_valuable interface for this.


Yes, we do that for vram/lmem eviction, but we have nothing similar 
for system swapping. Do I understand you correctly that you want me 
to add a call to eviction_valuable() also for that instead of 
swap_possible()?


You should already have that. eviction_valuable is called in both cases.

Hmm. I can only see it called from ttm_mem_evict_first() which is not in 
the swapping path? Or do I miss something?


Thanks,

Thomas





Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > [SNIP]
> > > > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > > > more funny corner cases when we roll out explicit user fencing.
> > > I think we can just ignore sync_file. As far as it concerns me that UAPI 
> > > is
> > > pretty much dead.
> > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > built on it.
> 
> To be honest I don't think we care about Android at all.

we = amd or we = upstream here?

> > > What we should support is drm_syncobj, but that also only as an in-fence
> > > since that's what our hardware supports.
> > Convince Android folks, minimally. Probably a lot more. Yes with hindsight
> > we should have just gone for drm_syncobj instead of the sync_file thing,
> > but hindsight and all that.
> > 
> > This is kinda why I don't think trying to support the existing uapi with
> > userspace fences underneath with some magic tricks is a good idea. It's
> > just a pile of work, plus it's not really architecturally clean.
> > 
> > > > Anotherone that looks very sketchy right now is buffer sharing between
> > > > different userspace drivers, like compute <-> media (if you have some
> > > > fancy AI pipeline in your media workload, as an example).
> > > Yeah, we are certainly going to get that. But only inside the same driver,
> > > so not much of a problem.
> > Why is this not much of a problem if it's just within one driver?
> 
> Because inside the same driver I can easily add the waits before submitting
> the MM work as necessary.

What is MM work here now?

> > > > > Adding implicit synchronization on top of that is then rather trivial.
> > > > Well that's what I disagree with, since I already see some problems 
> > > > that I
> > > > don't think we can overcome (the atomic ioctl is one). And that's with 
> > > > us
> > > > only having a fairly theoretical understanding of the overall situation.
> > > But how should we then ever support user fences with the atomic IOCTL?
> > > 
> > > We can't wait in user space since that will disable the support for 
> > > waiting
> > > in the hardware.
> > Well, figure it out :-)
> > 
> > This is exactly why I'm not seeing anything solved with just rolling a
> > function call to a bunch of places, because it's pretending all things are
> > solved when clearly that's not the case.
> > 
> > I really think what we need is to first figure out how to support
> > userspace fences as explicit entities across the stack, maybe with
> > something like this order:
> > 1. enable them purely within a single userspace driver (like vk with
> > winsys disabled, or something else like that except not amd because
> > there's this amdkfd split for "real" compute)
> > 1a. including atomic ioctl, e.g. for vk direct display support this can be
> > used without cross-process sharing, new winsys protocols and all that fun
> > 2. figure out how to transport these userspace fences with something like
> > drm_syncobj
> > 2a. figure out the compat story for drivers which dont do userspace fences
> > 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
> > support explicit sync
> > 3. maybe figure out how to make this all happen magically with implicit
> > sync, if we really, really care
> > 
> > If we do 3 before we've nailed all these problems, we're just guaranteeing
> > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > userspace fences
> > - the butchered implicit one that didn't quite work
> > - the explicit one
> > - the not-so-butchered implicit one with the lessons from the properly
> >done explicit one
> > 
> > The thing is, if you have no idea how to integrate userspace fences
> > explicitly into atomic ioctl, then you definitely have no idea how to do
> > it implicitly :-)
> 
> Well I agree on that. But the question is still how would you do explicit
> with atomic?

If you supply an userpace fence (is that what we call them now) as
in-fence, then your only allowed to get a userspace fence as out-fence.
That way we
- don't block anywhere we shouldn't
- don't create a dma_fence out of a userspace fence

The problem is this completely breaks your "magically make implicit
fencing with userspace fences" plan.

So I have a plan here, what was yours?

> Transporting fences between processes is not the fundamental problem here,
> but rather the question how we represent all this in the kernel?
> 
> In other words I think what you outlined above is just approaching it from
> the wrong side again. Instead of looking what the kernel needs to support
> this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is 

[PATCH] drm/msm/dpu: fix smart dma support

2021-05-11 Thread Dmitry Baryshkov
Downstream driver uses dpu->caps->smart_dma_rev to update
sspp->cap->features with the bit corresponding to the supported SmartDMA
version. Upstream driver does not do this, resulting in SSPP subdriver
not enbaling setup_multirect callback. Make SSPP subdriver check global
smart_dma_rev to decide if setup_multirect should be enabled.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 10 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h | 16 
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c|  9 +
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index b569030a0847..036334e3d99d 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -157,7 +157,7 @@ static const struct dpu_caps sdm845_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
.qseed_type = DPU_SSPP_SCALER_QSEED3,
-   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2,
+   .smart_dma_rev = DPU_SMART_DMA_V2,
.ubwc_version = DPU_HW_UBWC_VER_20,
.has_src_split = true,
.has_dim_layer = true,
@@ -173,7 +173,7 @@ static const struct dpu_caps sc7180_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0x9,
.qseed_type = DPU_SSPP_SCALER_QSEED4,
-   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2,
+   .smart_dma_rev = DPU_SMART_DMA_V2,
.ubwc_version = DPU_HW_UBWC_VER_20,
.has_dim_layer = true,
.has_idle_pc = true,
@@ -185,7 +185,7 @@ static const struct dpu_caps sm8150_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
.qseed_type = DPU_SSPP_SCALER_QSEED3,
-   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
+   .smart_dma_rev = DPU_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_30,
.has_src_split = true,
.has_dim_layer = true,
@@ -201,7 +201,7 @@ static const struct dpu_caps sm8250_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0xb,
.qseed_type = DPU_SSPP_SCALER_QSEED3LITE,
-   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
+   .smart_dma_rev = DPU_SMART_DMA_V2, /* TODO: v2.5 */
.ubwc_version = DPU_HW_UBWC_VER_40,
.has_src_split = true,
.has_dim_layer = true,
@@ -215,7 +215,7 @@ static const struct dpu_caps sc7280_dpu_caps = {
.max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
.max_mixer_blendstages = 0x7,
.qseed_type = DPU_SSPP_SCALER_QSEED4,
-   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2,
+   .smart_dma_rev = DPU_SMART_DMA_V2,
.ubwc_version = DPU_HW_UBWC_VER_30,
.has_dim_layer = true,
.has_idle_pc = true,
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index 4dfd8a20ad5c..04ebccd92d4e 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -70,6 +70,18 @@ enum {
DPU_HW_UBWC_VER_40 = 0x400,
 };
 
+/**
+ * SmartDMA support
+ * @DPU_SMART_DMA_UNSUPPORTED,   SmartDMA not support
+ * @DPU_SMART_DMA_V1,   SmartDMA 1.0 support
+ * @DPU_SMART_DMA_V2,   SmartDMA 2.0 support
+ */
+enum {
+   DPU_SMART_DMA_UNSUPPORTED,
+   DPU_SMART_DMA_V1,
+   DPU_SMART_DMA_V2,
+};
+
 /**
  * MDP TOP BLOCK features
  * @DPU_MDP_PANIC_PER_PIPE Panic configuration needs to be be done per pipe
@@ -104,8 +116,6 @@ enum {
  * @DPU_SSPP_QOS,SSPP support QoS control, danger/safe/creq
  * @DPU_SSPP_QOS_8LVL,   SSPP support 8-level QoS control
  * @DPU_SSPP_EXCL_RECT,  SSPP supports exclusion rect
- * @DPU_SSPP_SMART_DMA_V1,   SmartDMA 1.0 support
- * @DPU_SSPP_SMART_DMA_V2,   SmartDMA 2.0 support
  * @DPU_SSPP_TS_PREFILL  Supports prefill with traffic shaper
  * @DPU_SSPP_TS_PREFILL_REC1 Supports prefill with traffic shaper multirec
  * @DPU_SSPP_CDP Supports client driven prefetch
@@ -124,8 +134,6 @@ enum {
DPU_SSPP_QOS,
DPU_SSPP_QOS_8LVL,
DPU_SSPP_EXCL_RECT,
-   DPU_SSPP_SMART_DMA_V1,
-   DPU_SSPP_SMART_DMA_V2,
DPU_SSPP_TS_PREFILL,
DPU_SSPP_TS_PREFILL_REC1,
DPU_SSPP_CDP,
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
index 34d81aa16041..3ce4c5cd5d05 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
@@ -647,7 +647,8 @@ static void dpu_hw_sspp_setup_cdp(struct dpu_hw_pipe *ctx,
 }
 
 static void _setup_layer_ops(struct dpu_hw_pipe *c,
-   unsigned long features)
+   unsigned long features,
+   int smart_dma_rev)
 {
if 

[PATCH] drm/msm/dpu: simplify dpu_core_irq_en/disable helpers

2021-05-11 Thread Dmitry Baryshkov
dpu_core_irq_en/disable helpers are always called with the irq_count
equal to 1. Merge them with _dpu_core_en/disable functions and make them
handle just one interrupt index at a time.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c | 50 
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h | 20 
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c  |  4 +-
 3 files changed, 18 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
index c10761ea191c..0ee9ac21e24a 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
@@ -63,11 +63,11 @@ int dpu_core_irq_idx_lookup(struct dpu_kms *dpu_kms,
 }
 
 /**
- * _dpu_core_irq_enable - enable core interrupt given by the index
+ * dpu_core_irq_enable - enable core interrupt given by the index
  * @dpu_kms:   Pointer to dpu kms context
  * @irq_idx:   interrupt index
  */
-static int _dpu_core_irq_enable(struct dpu_kms *dpu_kms, int irq_idx)
+int dpu_core_irq_enable(struct dpu_kms *dpu_kms, int irq_idx)
 {
unsigned long irq_flags;
int ret = 0, enable_count;
@@ -85,6 +85,8 @@ static int _dpu_core_irq_enable(struct dpu_kms *dpu_kms, int 
irq_idx)
}
 
enable_count = atomic_read(_kms->irq_obj.enable_counts[irq_idx]);
+   if (enable_count)
+   DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idx, 
enable_count);
DRM_DEBUG_KMS("irq_idx=%d enable_count=%d\n", irq_idx, enable_count);
trace_dpu_core_irq_enable_idx(irq_idx, enable_count);
 
@@ -109,31 +111,12 @@ static int _dpu_core_irq_enable(struct dpu_kms *dpu_kms, 
int irq_idx)
return ret;
 }
 
-int dpu_core_irq_enable(struct dpu_kms *dpu_kms, int *irq_idxs, u32 irq_count)
-{
-   int i, ret = 0, counts;
-
-   if (!irq_idxs || !irq_count) {
-   DPU_ERROR("invalid params\n");
-   return -EINVAL;
-   }
-
-   counts = atomic_read(_kms->irq_obj.enable_counts[irq_idxs[0]]);
-   if (counts)
-   DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idxs[0], counts);
-
-   for (i = 0; (i < irq_count) && !ret; i++)
-   ret = _dpu_core_irq_enable(dpu_kms, irq_idxs[i]);
-
-   return ret;
-}
-
 /**
- * _dpu_core_irq_disable - disable core interrupt given by the index
+ * dpu_core_irq_disable - disable core interrupt given by the index
  * @dpu_kms:   Pointer to dpu kms context
  * @irq_idx:   interrupt index
  */
-static int _dpu_core_irq_disable(struct dpu_kms *dpu_kms, int irq_idx)
+int dpu_core_irq_disable(struct dpu_kms *dpu_kms, int irq_idx)
 {
int ret = 0, enable_count;
 
@@ -148,6 +131,8 @@ static int _dpu_core_irq_disable(struct dpu_kms *dpu_kms, 
int irq_idx)
}
 
enable_count = atomic_read(_kms->irq_obj.enable_counts[irq_idx]);
+   if (enable_count > 1)
+   DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idx, 
enable_count);
DRM_DEBUG_KMS("irq_idx=%d enable_count=%d\n", irq_idx, enable_count);
trace_dpu_core_irq_disable_idx(irq_idx, enable_count);
 
@@ -164,25 +149,6 @@ static int _dpu_core_irq_disable(struct dpu_kms *dpu_kms, 
int irq_idx)
return ret;
 }
 
-int dpu_core_irq_disable(struct dpu_kms *dpu_kms, int *irq_idxs, u32 irq_count)
-{
-   int i, ret = 0, counts;
-
-   if (!irq_idxs || !irq_count) {
-   DPU_ERROR("invalid params\n");
-   return -EINVAL;
-   }
-
-   counts = atomic_read(_kms->irq_obj.enable_counts[irq_idxs[0]]);
-   if (counts == 2)
-   DRM_ERROR("irq_idx=%d enable_count=%d\n", irq_idxs[0], counts);
-
-   for (i = 0; (i < irq_count) && !ret; i++)
-   ret = _dpu_core_irq_disable(dpu_kms, irq_idxs[i]);
-
-   return ret;
-}
-
 u32 dpu_core_irq_read(struct dpu_kms *dpu_kms, int irq_idx, bool clear)
 {
if (!dpu_kms->hw_intr)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
index e30775e6585b..2ac781738e83 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.h
@@ -43,34 +43,30 @@ int dpu_core_irq_idx_lookup(
uint32_t instance_idx);
 
 /**
- * dpu_core_irq_enable - IRQ helper function for enabling one or more IRQs
+ * dpu_core_irq_enable - IRQ helper function for enabling IRQ
  * @dpu_kms:   DPU handle
- * @irq_idxs:  Array of irq index
- * @irq_count: Number of irq_idx provided in the array
+ * @irq_idx:   irq index
  * @return:0 for success enabling IRQ, otherwise failure
  *
  * This function increments count on each enable and decrements on each
- * disable.  Interrupts is enabled if count is 0 before increment.
+ * disable.  Interrupt is enabled if count is 0 before increment.
  */
 int dpu_core_irq_enable(
struct dpu_kms *dpu_kms,
-  

Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend

2021-05-11 Thread Christian König




Am 11.05.21 um 16:06 schrieb Thomas Hellström (Intel):


On 5/11/21 3:58 PM, Christian König wrote:

Am 11.05.21 um 15:25 schrieb Thomas Hellström:

Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system 
region,

as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

There are some changes to TTM to allow for purging system memory buffer
objects and to refuse swapping of some objects: Unfortunately i915 gem
still relies heavily on short-term object pinning, and we've chosen to
keep short-term-pinned buffer objects on the TTM LRU lists for now,
meaning that we need some sort of mechanism to tell TTM they are not
swappable. A longer term goal is to get rid of the short-term pinning.


Well just use the eviction_valuable interface for this.


Yes, we do that for vram/lmem eviction, but we have nothing similar 
for system swapping. Do I understand you correctly that you want me to 
add a call to eviction_valuable() also for that instead of 
swap_possible()?


You should already have that. eviction_valuable is called in both cases.






In general please make separate patches for the TTM changes and for 
the i915 changes using them for easier review.


I'll respin with a split. Do you want me to do the same also for the 
other two patches that minmally touch TTM?


Yes, that makes it much easier to review the general usefulness of 
interface changes.


Thanks,
Christian.



Thanks,

Thomas






[PATCH 1/2] drm/msm/dpu: simplify clocks handling

2021-05-11 Thread Dmitry Baryshkov
DPU driver contains code to parse clock items from device tree into
special data struct and then enable/disable/set rate for the clocks
using that data struct. However the DPU driver itself uses only parsing
and enabling/disabling part (the rate setting is used by DP driver).

Move this implementation to the DP driver (which actually uses rate
setting) and replace hand-coded enable/disable/get loops in the DPU
with the respective clk_bulk operations. Put operation is removed
completely because, it is handled using devres instead.

DP implementation is unchanged for now.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/Makefile  |  2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 24 ++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h |  6 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c   | 46 +++--
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   |  4 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c  | 26 +++
 .../dpu1/dpu_io_util.c => dp/dp_clk_util.c}   | 69 +--
 .../dpu1/dpu_io_util.h => dp/dp_clk_util.h}   |  2 -
 drivers/gpu/drm/msm/dp/dp_parser.h|  2 +-
 drivers/gpu/drm/msm/msm_drv.c | 49 +
 drivers/gpu/drm/msm/msm_drv.h |  1 +
 11 files changed, 84 insertions(+), 147 deletions(-)
 rename drivers/gpu/drm/msm/{disp/dpu1/dpu_io_util.c => dp/dp_clk_util.c} (61%)
 rename drivers/gpu/drm/msm/{disp/dpu1/dpu_io_util.h => dp/dp_clk_util.h} (92%)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 610d630326bb..6621b75e3c7b 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -71,7 +71,6 @@ msm-y := \
disp/dpu1/dpu_hw_top.o \
disp/dpu1/dpu_hw_util.o \
disp/dpu1/dpu_hw_vbif.o \
-   disp/dpu1/dpu_io_util.o \
disp/dpu1/dpu_kms.o \
disp/dpu1/dpu_mdss.o \
disp/dpu1/dpu_plane.o \
@@ -104,6 +103,7 @@ msm-$(CONFIG_DRM_MSM_GPU_STATE) += 
adreno/a6xx_gpu_state.o
 
 msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
dp/dp_catalog.o \
+   dp/dp_clk_util.o \
dp/dp_ctrl.o \
dp/dp_display.o \
dp/dp_drm.o \
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c
index 7cba5bbdf4b7..ec3595b48bef 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c
@@ -284,17 +284,6 @@ void dpu_core_perf_crtc_release_bw(struct drm_crtc *crtc)
}
 }
 
-static int _dpu_core_perf_set_core_clk_rate(struct dpu_kms *kms, u64 rate)
-{
-   struct dss_clk *core_clk = kms->perf.core_clk;
-
-   if (core_clk->max_rate && (rate > core_clk->max_rate))
-   rate = core_clk->max_rate;
-
-   core_clk->rate = rate;
-   return dev_pm_opp_set_rate(>pdev->dev, core_clk->rate);
-}
-
 static u64 _dpu_core_perf_get_core_clk_rate(struct dpu_kms *kms)
 {
u64 clk_rate = kms->perf.perf_tune.min_core_clk;
@@ -306,7 +295,7 @@ static u64 _dpu_core_perf_get_core_clk_rate(struct dpu_kms 
*kms)
dpu_cstate = to_dpu_crtc_state(crtc->state);
clk_rate = max(dpu_cstate->new_perf.core_clk_rate,
clk_rate);
-   clk_rate = clk_round_rate(kms->perf.core_clk->clk,
+   clk_rate = clk_round_rate(kms->perf.core_clk,
clk_rate);
}
}
@@ -405,10 +394,11 @@ int dpu_core_perf_crtc_update(struct drm_crtc *crtc,
 
trace_dpu_core_perf_update_clk(kms->dev, stop_req, clk_rate);
 
-   ret = _dpu_core_perf_set_core_clk_rate(kms, clk_rate);
+   if (clk_rate > kms->perf.max_core_clk_rate)
+   clk_rate = kms->perf.max_core_clk_rate;
+   ret = dev_pm_opp_set_rate(>pdev->dev, clk_rate);
if (ret) {
-   DPU_ERROR("failed to set %s clock rate %llu\n",
-   kms->perf.core_clk->clk_name, clk_rate);
+   DPU_ERROR("failed to set core clock rate %llu\n", 
clk_rate);
return ret;
}
 
@@ -529,13 +519,13 @@ void dpu_core_perf_destroy(struct dpu_core_perf *perf)
 int dpu_core_perf_init(struct dpu_core_perf *perf,
struct drm_device *dev,
struct dpu_mdss_cfg *catalog,
-   struct dss_clk *core_clk)
+   struct clk *core_clk)
 {
perf->dev = dev;
perf->catalog = catalog;
perf->core_clk = core_clk;
 
-   perf->max_core_clk_rate = core_clk->max_rate;
+   perf->max_core_clk_rate = clk_get_rate(core_clk);
if (!perf->max_core_clk_rate) {
DPU_DEBUG("optional max core clk rate, use default\n");
perf->max_core_clk_rate = DPU_PERF_DEFAULT_MAX_CORE_CLK_RATE;
diff --git 

[PATCH 0/2] drm/msm: rework clock handling

2021-05-11 Thread Dmitry Baryshkov
msm_dss_clk_*() functions significantly duplicate clk_bulk_* family of
functions. Drop custom code and use bulk clocks directly.


Dmitry Baryshkov (2):
  drm/msm/dpu: simplify clocks handling
  drm/msm/dp: rewrite dss_module_power to use bulk clock functions

 drivers/gpu/drm/msm/Makefile  |   1 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c |  24 +---
 drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h |   6 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c   | 187 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.h   |  40 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c   |  46 ++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h   |   4 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c  |  26 ++--
 drivers/gpu/drm/msm/dp/dp_ctrl.c  |  19 ++-
 drivers/gpu/drm/msm/dp/dp_parser.c|  21 ++-
 drivers/gpu/drm/msm/dp/dp_parser.h|  17 ++-
 drivers/gpu/drm/msm/dp/dp_power.c |  81 ++-
 drivers/gpu/drm/msm/msm_drv.c |  49 +++
 drivers/gpu/drm/msm/msm_drv.h |   1 +
 14 files changed, 164 insertions(+), 358 deletions(-)
 delete mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.c
 delete mode 100644 drivers/gpu/drm/msm/disp/dpu1/dpu_io_util.h




[PATCH 2/2] drm/msm/dp: rewrite dss_module_power to use bulk clock functions

2021-05-11 Thread Dmitry Baryshkov
In order to simplify DP code, drop hand-coded loops over clock arrays,
replacing them with clk_bulk_* functions.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/Makefile |   1 -
 drivers/gpu/drm/msm/dp/dp_clk_util.c | 120 ---
 drivers/gpu/drm/msm/dp/dp_clk_util.h |  38 -
 drivers/gpu/drm/msm/dp/dp_ctrl.c |  19 ++---
 drivers/gpu/drm/msm/dp/dp_parser.c   |  21 -
 drivers/gpu/drm/msm/dp/dp_parser.h   |  17 +++-
 drivers/gpu/drm/msm/dp/dp_power.c|  81 +-
 7 files changed, 83 insertions(+), 214 deletions(-)
 delete mode 100644 drivers/gpu/drm/msm/dp/dp_clk_util.c
 delete mode 100644 drivers/gpu/drm/msm/dp/dp_clk_util.h

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 6621b75e3c7b..0c1a559dd2fc 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -103,7 +103,6 @@ msm-$(CONFIG_DRM_MSM_GPU_STATE) += 
adreno/a6xx_gpu_state.o
 
 msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
dp/dp_catalog.o \
-   dp/dp_clk_util.o \
dp/dp_ctrl.o \
dp/dp_display.o \
dp/dp_drm.o \
diff --git a/drivers/gpu/drm/msm/dp/dp_clk_util.c 
b/drivers/gpu/drm/msm/dp/dp_clk_util.c
deleted file mode 100644
index 44a4fc59ff31..
--- a/drivers/gpu/drm/msm/dp/dp_clk_util.c
+++ /dev/null
@@ -1,120 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/* Copyright (c) 2012-2015, 2017-2018, The Linux Foundation.
- * All rights reserved.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-#include "dp_clk_util.h"
-
-void msm_dss_put_clk(struct dss_clk *clk_arry, int num_clk)
-{
-   int i;
-
-   for (i = num_clk - 1; i >= 0; i--) {
-   if (clk_arry[i].clk)
-   clk_put(clk_arry[i].clk);
-   clk_arry[i].clk = NULL;
-   }
-}
-
-int msm_dss_get_clk(struct device *dev, struct dss_clk *clk_arry, int num_clk)
-{
-   int i, rc = 0;
-
-   for (i = 0; i < num_clk; i++) {
-   clk_arry[i].clk = clk_get(dev, clk_arry[i].clk_name);
-   rc = PTR_ERR_OR_ZERO(clk_arry[i].clk);
-   if (rc) {
-   DEV_ERR("%pS->%s: '%s' get failed. rc=%d\n",
-   __builtin_return_address(0), __func__,
-   clk_arry[i].clk_name, rc);
-   goto error;
-   }
-   }
-
-   return rc;
-
-error:
-   for (i--; i >= 0; i--) {
-   if (clk_arry[i].clk)
-   clk_put(clk_arry[i].clk);
-   clk_arry[i].clk = NULL;
-   }
-
-   return rc;
-}
-
-int msm_dss_clk_set_rate(struct dss_clk *clk_arry, int num_clk)
-{
-   int i, rc = 0;
-
-   for (i = 0; i < num_clk; i++) {
-   if (clk_arry[i].clk) {
-   if (clk_arry[i].type != DSS_CLK_AHB) {
-   DEV_DBG("%pS->%s: '%s' rate %ld\n",
-   __builtin_return_address(0), __func__,
-   clk_arry[i].clk_name,
-   clk_arry[i].rate);
-   rc = clk_set_rate(clk_arry[i].clk,
-   clk_arry[i].rate);
-   if (rc) {
-   DEV_ERR("%pS->%s: %s failed. rc=%d\n",
-   __builtin_return_address(0),
-   __func__,
-   clk_arry[i].clk_name, rc);
-   break;
-   }
-   }
-   } else {
-   DEV_ERR("%pS->%s: '%s' is not available\n",
-   __builtin_return_address(0), __func__,
-   clk_arry[i].clk_name);
-   rc = -EPERM;
-   break;
-   }
-   }
-
-   return rc;
-}
-
-int msm_dss_enable_clk(struct dss_clk *clk_arry, int num_clk, int enable)
-{
-   int i, rc = 0;
-
-   if (enable) {
-   for (i = 0; i < num_clk; i++) {
-   DEV_DBG("%pS->%s: enable '%s'\n",
-   __builtin_return_address(0), __func__,
-   clk_arry[i].clk_name);
-   rc = clk_prepare_enable(clk_arry[i].clk);
-   if (rc)
-   DEV_ERR("%pS->%s: %s en fail. rc=%d\n",
-   __builtin_return_address(0),
-   __func__,
-   clk_arry[i].clk_name, rc);
-
-   if (rc && i) {
-   msm_dss_enable_clk(_arry[i - 1],
-   i - 1, false);
-   break;
-   }

Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend

2021-05-11 Thread Intel



On 5/11/21 3:58 PM, Christian König wrote:

Am 11.05.21 um 15:25 schrieb Thomas Hellström:

Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system 
region,

as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

There are some changes to TTM to allow for purging system memory buffer
objects and to refuse swapping of some objects: Unfortunately i915 gem
still relies heavily on short-term object pinning, and we've chosen to
keep short-term-pinned buffer objects on the TTM LRU lists for now,
meaning that we need some sort of mechanism to tell TTM they are not
swappable. A longer term goal is to get rid of the short-term pinning.


Well just use the eviction_valuable interface for this.


Yes, we do that for vram/lmem eviction, but we have nothing similar for 
system swapping. Do I understand you correctly that you want me to add a 
call to eviction_valuable() also for that instead of swap_possible()?





In general please make separate patches for the TTM changes and for 
the i915 changes using them for easier review.


I'll respin with a split. Do you want me to do the same also for the 
other two patches that minmally touch TTM?


Thanks,

Thomas




Re: [PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend

2021-05-11 Thread Christian König

Am 11.05.21 um 15:25 schrieb Thomas Hellström:

Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system region,
as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

There are some changes to TTM to allow for purging system memory buffer
objects and to refuse swapping of some objects: Unfortunately i915 gem
still relies heavily on short-term object pinning, and we've chosen to
keep short-term-pinned buffer objects on the TTM LRU lists for now,
meaning that we need some sort of mechanism to tell TTM they are not
swappable. A longer term goal is to get rid of the short-term pinning.


Well just use the eviction_valuable interface for this.

In general please make separate patches for the TTM changes and for the 
i915 changes using them for easier review.


Christian.



Remove the old lmem backend.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/i915/Makefile |   1 +
  drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  83 ---
  drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |   5 -
  drivers/gpu/drm/i915/gem/i915_gem_object.c| 126 +++--
  drivers/gpu/drm/i915/gem/i915_gem_object.h|   9 +
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +
  drivers/gpu/drm/i915/gem/i915_gem_region.c|   6 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 534 ++
  drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  48 ++
  drivers/gpu/drm/i915/gt/intel_region_lmem.c   |   3 +-
  drivers/gpu/drm/i915/i915_gem.c   |   5 +-
  drivers/gpu/drm/i915/intel_memory_region.c|   1 -
  drivers/gpu/drm/i915/intel_memory_region.h|   1 -
  drivers/gpu/drm/i915/intel_region_ttm.c   |   5 +-
  drivers/gpu/drm/i915/intel_region_ttm.h   |   7 +-
  drivers/gpu/drm/ttm/ttm_bo.c  |  12 +
  include/drm/ttm/ttm_device.h  |   9 +
  17 files changed, 733 insertions(+), 140 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c
  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 958ccc1edfed..ef0d884a9e2d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm.o \
gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index f42803ea48f2..2b8cd15de1d9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -4,73 +4,10 @@
   */
  
  #include "intel_memory_region.h"

-#include "intel_region_ttm.h"
  #include "gem/i915_gem_region.h"
  #include "gem/i915_gem_lmem.h"
  #include "i915_drv.h"
  
-static void lmem_put_pages(struct drm_i915_gem_object *obj,

- struct sg_table *pages)
-{
-   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
-   obj->mm.dirty = false;
-   sg_free_table(pages);
-   kfree(pages);
-}
-
-static int lmem_get_pages(struct drm_i915_gem_object *obj)
-{
-   unsigned int flags;
-   struct sg_table *pages;
-
-   flags = I915_ALLOC_MIN_PAGE_SIZE;
-   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
-   flags |= I915_ALLOC_CONTIGUOUS;
-
-   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
-obj->base.size,
-flags);
-   if (IS_ERR(obj->mm.st_mm_node))
-   return PTR_ERR(obj->mm.st_mm_node);
-
-   /* Range manager is always contigous */
-   if (obj->mm.region->is_range_manager)
-   obj->flags |= I915_BO_ALLOC_CONTIGUOUS;
-   pages = intel_region_ttm_node_to_st(obj->mm.region, obj->mm.st_mm_node);
-   if (IS_ERR(pages))
-   return PTR_ERR(pages);
-
-   __i915_gem_object_set_pages(obj, pages,
-   i915_sg_dma_page_sizes(pages->sgl));
-
-   if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) {
-   void __iomem *vaddr =
-   i915_gem_object_lmem_io_map(obj, 0, obj->base.size);
-
-   if (!vaddr) {
-   struct sg_table *pages =
-   

Re: [PATCH 0/2] drm/qxl: two one-liner fixes.

2021-05-11 Thread Thomas Zimmermann



Am 11.05.21 um 12:45 schrieb Gerd Hoffmann:



Gerd Hoffmann (2):
   drm/qxl: drop redundant code
   drm/qxl: balance dumb_shadow_bo pin

  drivers/gpu/drm/qxl/qxl_display.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)



Acked-by: Thomas Zimmermann 

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature


[PATCH v2 1/1] drm/mediatek: Remove redundant error printing

2021-05-11 Thread Zhen Lei
When devm_ioremap_resource() fails, a clear enough error message will be
printed by its subfunction __devm_ioremap_resource(). The error
information contains the device name, failure cause, and possibly resource
information.

Therefore, remove the error printing here to simplify code and reduce the
binary size.

Reported-by: Hulk Robot 
Signed-off-by: Zhen Lei 
---
 drivers/gpu/drm/mediatek/mtk_cec.c| 7 ++-
 drivers/gpu/drm/mediatek/mtk_disp_ccorr.c | 4 +---
 drivers/gpu/drm/mediatek/mtk_disp_ovl.c   | 4 +---
 drivers/gpu/drm/mediatek/mtk_disp_rdma.c  | 4 +---
 drivers/gpu/drm/mediatek/mtk_dpi.c| 7 ++-
 drivers/gpu/drm/mediatek/mtk_dsi.c| 1 -
 6 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_cec.c 
b/drivers/gpu/drm/mediatek/mtk_cec.c
index e9cef5c0c8f7eff..c47b54936cfa6b8 100644
--- a/drivers/gpu/drm/mediatek/mtk_cec.c
+++ b/drivers/gpu/drm/mediatek/mtk_cec.c
@@ -195,11 +195,8 @@ static int mtk_cec_probe(struct platform_device *pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
cec->regs = devm_ioremap_resource(dev, res);
-   if (IS_ERR(cec->regs)) {
-   ret = PTR_ERR(cec->regs);
-   dev_err(dev, "Failed to ioremap cec: %d\n", ret);
-   return ret;
-   }
+   if (IS_ERR(cec->regs))
+   return PTR_ERR(cec->regs);
 
cec->clk = devm_clk_get(dev, NULL);
if (IS_ERR(cec->clk)) {
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c 
b/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c
index 141cb36b9c07b74..2b9923e5c6382f7 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_ccorr.c
@@ -173,10 +173,8 @@ static int mtk_disp_ccorr_probe(struct platform_device 
*pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
priv->regs = devm_ioremap_resource(dev, res);
-   if (IS_ERR(priv->regs)) {
-   dev_err(dev, "failed to ioremap ccorr\n");
+   if (IS_ERR(priv->regs))
return PTR_ERR(priv->regs);
-   }
 
 #if IS_REACHABLE(CONFIG_MTK_CMDQ)
ret = cmdq_dev_get_client_reg(dev, >cmdq_reg, 0);
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c 
b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
index 961f87f8d4d156f..48927135c247537 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
@@ -395,10 +395,8 @@ static int mtk_disp_ovl_probe(struct platform_device *pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
priv->regs = devm_ioremap_resource(dev, res);
-   if (IS_ERR(priv->regs)) {
-   dev_err(dev, "failed to ioremap ovl\n");
+   if (IS_ERR(priv->regs))
return PTR_ERR(priv->regs);
-   }
 #if IS_REACHABLE(CONFIG_MTK_CMDQ)
ret = cmdq_dev_get_client_reg(dev, >cmdq_reg, 0);
if (ret)
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c 
b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
index 728aaadfea8cfcc..e8d31b4c12b7727 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
@@ -294,10 +294,8 @@ static int mtk_disp_rdma_probe(struct platform_device 
*pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
priv->regs = devm_ioremap_resource(dev, res);
-   if (IS_ERR(priv->regs)) {
-   dev_err(dev, "failed to ioremap rdma\n");
+   if (IS_ERR(priv->regs))
return PTR_ERR(priv->regs);
-   }
 #if IS_REACHABLE(CONFIG_MTK_CMDQ)
ret = cmdq_dev_get_client_reg(dev, >cmdq_reg, 0);
if (ret)
diff --git a/drivers/gpu/drm/mediatek/mtk_dpi.c 
b/drivers/gpu/drm/mediatek/mtk_dpi.c
index bea91c81626e154..f8020bc046cb63f 100644
--- a/drivers/gpu/drm/mediatek/mtk_dpi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dpi.c
@@ -741,11 +741,8 @@ static int mtk_dpi_probe(struct platform_device *pdev)
}
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
dpi->regs = devm_ioremap_resource(dev, mem);
-   if (IS_ERR(dpi->regs)) {
-   ret = PTR_ERR(dpi->regs);
-   dev_err(dev, "Failed to ioremap mem resource: %d\n", ret);
-   return ret;
-   }
+   if (IS_ERR(dpi->regs))
+   return PTR_ERR(dpi->regs);
 
dpi->engine_clk = devm_clk_get(dev, "engine");
if (IS_ERR(dpi->engine_clk)) {
diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
b/drivers/gpu/drm/mediatek/mtk_dsi.c
index ae403c67cbd922d..89e351dfab88177 100644
--- a/drivers/gpu/drm/mediatek/mtk_dsi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
@@ -1062,7 +1062,6 @@ static int mtk_dsi_probe(struct platform_device *pdev)
dsi->regs = devm_ioremap_resource(dev, regs);
if (IS_ERR(dsi->regs)) {
ret = PTR_ERR(dsi->regs);
-   dev_err(dev, "Failed to ioremap memory: %d\n", ret);
goto err_unregister_host;
}
 
-- 
2.26.0.106.g9fadedd




[PATCH v2 0/1] drm/mediatek: Remove redundant error printing

2021-05-11 Thread Zhen Lei
v1 --> v2:
1. Combine the modifications of several drm/mediatek files into one patch.
2. According to Baruch Siach's review comment, simplify the following code 
snippets:
   -ret = PTR_ERR(cec->regs);
   -return ret;
   +return PTR_ERR(cec->regs);

Zhen Lei (1):
  drm/mediatek: Remove redundant error printing

 drivers/gpu/drm/mediatek/mtk_cec.c| 7 ++-
 drivers/gpu/drm/mediatek/mtk_disp_ccorr.c | 4 +---
 drivers/gpu/drm/mediatek/mtk_disp_ovl.c   | 4 +---
 drivers/gpu/drm/mediatek/mtk_disp_rdma.c  | 4 +---
 drivers/gpu/drm/mediatek/mtk_dpi.c| 7 ++-
 drivers/gpu/drm/mediatek/mtk_dsi.c| 1 -
 6 files changed, 7 insertions(+), 20 deletions(-)

-- 
2.26.0.106.g9fadedd




Re: [PATCH] component: Move host device to end of device lists on binding

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki  wrote:
>
> On Mon, May 10, 2021 at 9:08 PM Stephen Boyd  wrote:
>
> [cut]
>
> >
> > >
> > > > I will try it, but then I wonder about things like system wide
> > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > active.
> > >
> > > Yup suspend/resume has the exact same problem as shutdown.
> >
> > I think suspend/resume has the exact opposite problem. At least I think
> > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > like is happening today. It looks like drm_atomic_helper_shutdown()
> > operates from the top down when we want bottom up? I admit I have no
> > idea what is supposed to happen here.
>
> Why would the system-wide suspend ordering be different from the
> shutdown ordering?

At least my point was that both shutdown and suspend/resume have the
same problem, and the righ fix is (I think at least) to add these
hooks to the component.c aggregate ops structure. Hence just adding
new callbacks for shutdown will be an incomplete solution.

I don't feel like changing the global device order is the right
approach, since essentially that's what component was meant to fix.
Except it's incomplete since it only provides a solution for
bind/unbind and not for shutdown or suspend/resume as other global
state changes. I think some drivers "fixed" this by putting stuff like
drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
make sure that everything is ready with that trick. But that doesn't
compose very well :-/
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH 7/7] drm/i915/lmem: Verify checks for lmem residency

2021-05-11 Thread Thomas Hellström
Since objects can be migrated or evicted when not pinned or locked,
update the checks for lmem residency or future residency so that
the value returned is not immediately stale.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/display/intel_display.c |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 42 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 29 ++
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |  4 ++
 4 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index de1f13d203b5..b95def2d5af3 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -11615,7 +11615,7 @@ intel_user_framebuffer_create(struct drm_device *dev,
 
/* object is backed with LMEM for discrete */
i915 = to_i915(obj->base.dev);
-   if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) {
+   if (HAS_LMEM(i915) && !i915_gem_object_validates_to_lmem(obj)) {
/* object is "remote", not in local memory */
i915_gem_object_put(obj);
return ERR_PTR(-EREMOTE);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index 2b8cd15de1d9..d539dffa1554 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -23,10 +23,50 @@ i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
return io_mapping_map_wc(>mm.region->iomap, offset, size);
 }
 
+/**
+ * i915_gem_object_validates_to_lmem - Whether the object is resident in
+ * lmem when pages are present.
+ * @obj: The object to check.
+ *
+ * Migratable objects residency may change from under us if the object is
+ * not pinned or locked. This function is intended to be used to check whether
+ * the object can only reside in lmem when pages are present.
+ *
+ * Return: Whether the object is always resident in lmem when pages are
+ * present.
+ */
+bool i915_gem_object_validates_to_lmem(struct drm_i915_gem_object *obj)
+{
+   struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+
+   return !i915_gem_object_migratable(obj) &&
+   mr && (mr->type == INTEL_MEMORY_LOCAL ||
+  mr->type == INTEL_MEMORY_STOLEN_LOCAL);
+}
+
+/**
+ * i915_gem_object_is_lmem - Whether the object is resident in
+ * lmem
+ * @obj: The object to check.
+ *
+ * Even if an object is allowed to migrate and change memory region,
+ * this function checks whether it will always be present in lmem when
+ * valid *or* if that's not the case, whether it's currently resident in lmem.
+ * For migratable and evictable objects, the latter only makes sense when
+ * the object is locked.
+ *
+ * Return: Whether the object migratable but resident in lmem, or not
+ * migratable and will be present in lmem when valid.
+ */
 bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
 {
-   struct intel_memory_region *mr = obj->mm.region;
+   struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
 
+#ifdef CONFIG_LOCKDEP
+   if (i915_gem_object_migratable(obj) &&
+   i915_gem_object_evictable(obj))
+   assert_object_held(obj);
+#endif
return mr && (mr->type == INTEL_MEMORY_LOCAL ||
  mr->type == INTEL_MEMORY_STOLEN_LOCAL);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index c53488f391dd..0475b1c94454 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -458,6 +458,35 @@ bool i915_gem_object_evictable(struct drm_i915_gem_object 
*obj)
return pin_count == 0;
 }
 
+/**
+ * i915_gem_object_migratable - Whether the object is migratable out of the
+ * current region.
+ * @obj: Pointer to the object.
+ *
+ * Return: Whether the object is allowed to be resident in other
+ * regions than the current while pages are present.
+ */
+bool i915_gem_object_migratable(struct drm_i915_gem_object *obj)
+{
+   struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+   struct intel_memory_region *placement;
+   int i;
+
+   if (!mr)
+   return false;
+
+   if (!obj->mm.n_placements)
+   return false;
+
+   for (i = 0; i < obj->mm.n_placements; ++i) {
+   placement = obj->mm.placements[i];
+   if (placement != mr)
+   return true;
+   }
+
+   return false;
+}
+
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
INIT_WORK(>mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index ae5930e307d5..a3ad8cf4eefd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -596,6 +596,10 @@ void 

[PATCH 6/7] drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend

2021-05-11 Thread Thomas Hellström
Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system region,
as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

There are some changes to TTM to allow for purging system memory buffer
objects and to refuse swapping of some objects: Unfortunately i915 gem
still relies heavily on short-term object pinning, and we've chosen to
keep short-term-pinned buffer objects on the TTM LRU lists for now,
meaning that we need some sort of mechanism to tell TTM they are not
swappable. A longer term goal is to get rid of the short-term pinning.

Remove the old lmem backend.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  83 ---
 drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |   5 -
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 126 +++--
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   9 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +
 drivers/gpu/drm/i915/gem/i915_gem_region.c|   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 534 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  48 ++
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |   3 +-
 drivers/gpu/drm/i915/i915_gem.c   |   5 +-
 drivers/gpu/drm/i915/intel_memory_region.c|   1 -
 drivers/gpu/drm/i915/intel_memory_region.h|   1 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |   5 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   7 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  12 +
 include/drm/ttm/ttm_device.h  |   9 +
 17 files changed, 733 insertions(+), 140 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 958ccc1edfed..ef0d884a9e2d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm.o \
gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index f42803ea48f2..2b8cd15de1d9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -4,73 +4,10 @@
  */
 
 #include "intel_memory_region.h"
-#include "intel_region_ttm.h"
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_lmem.h"
 #include "i915_drv.h"
 
-static void lmem_put_pages(struct drm_i915_gem_object *obj,
- struct sg_table *pages)
-{
-   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
-   obj->mm.dirty = false;
-   sg_free_table(pages);
-   kfree(pages);
-}
-
-static int lmem_get_pages(struct drm_i915_gem_object *obj)
-{
-   unsigned int flags;
-   struct sg_table *pages;
-
-   flags = I915_ALLOC_MIN_PAGE_SIZE;
-   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
-   flags |= I915_ALLOC_CONTIGUOUS;
-
-   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
-obj->base.size,
-flags);
-   if (IS_ERR(obj->mm.st_mm_node))
-   return PTR_ERR(obj->mm.st_mm_node);
-
-   /* Range manager is always contigous */
-   if (obj->mm.region->is_range_manager)
-   obj->flags |= I915_BO_ALLOC_CONTIGUOUS;
-   pages = intel_region_ttm_node_to_st(obj->mm.region, obj->mm.st_mm_node);
-   if (IS_ERR(pages))
-   return PTR_ERR(pages);
-
-   __i915_gem_object_set_pages(obj, pages,
-   i915_sg_dma_page_sizes(pages->sgl));
-
-   if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) {
-   void __iomem *vaddr =
-   i915_gem_object_lmem_io_map(obj, 0, obj->base.size);
-
-   if (!vaddr) {
-   struct sg_table *pages =
-   __i915_gem_object_unset_pages(obj);
-
-   if (!IS_ERR_OR_NULL(pages))
-   lmem_put_pages(obj, pages);
-   }
-
-   memset_io(vaddr, 0, obj->base.size);
-   io_mapping_unmap(vaddr);
-   }
-
-   return 0;
-}
-

[PATCH 5/7] drm/i915/ttm, drm/ttm: Add a generic TTM memcpy move for page-based iomem

2021-05-11 Thread Thomas Hellström
The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a critical
point of failure and with a slight change of semantics we could also push
the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation going on
that could violate the dma_fence lockdep rules.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for their
particular environment.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/Makefile |   1 +
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 155 ++
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 141 
 drivers/gpu/drm/ttm/ttm_bo.c  |   1 +
 4 files changed, 298 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cb8823570996..958ccc1edfed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
gem/i915_gemfs.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
new file mode 100644
index ..1116d7df1461
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+/**
+ * DOC: Usage and intentions.
+ *
+ * This file contains functionality that we might want to move into
+ * ttm_bo_util.c if there is a common interest.
+ * Currently a kmap_local only memcpy with support for page-based iomem 
regions,
+ * and fast memcpy from write-combined memory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "i915_memcpy.h"
+
+#include "gem/i915_gem_ttm_bo_util.h"
+
+static void i915_ttm_kmap_iter_tt_kmap_local(struct i915_ttm_kmap_iter *iter,
+struct dma_buf_map *dmap,
+pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_tt *iter_tt =
+   container_of(iter, typeof(*iter_tt), base);
+
+   dma_buf_map_set_vaddr(dmap, kmap_local_page(iter_tt->tt->pages[i]));
+}
+
+static void i915_ttm_kmap_iter_iomap_kmap_local(struct i915_ttm_kmap_iter 
*iter,
+   struct dma_buf_map *dmap,
+   pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_iomap *iter_io =
+   container_of(iter, typeof(*iter_io), base);
+   void __iomem *addr;
+
+retry:
+   while (i >= iter_io->cache.end) {
+   iter_io->cache.sg = iter_io->cache.sg ?
+   sg_next(iter_io->cache.sg) : iter_io->st->sgl;
+   iter_io->cache.i = iter_io->cache.end;
+   iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >>
+   PAGE_SHIFT;
+   iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) -
+   iter_io->start;
+   }
+
+   if (i < iter_io->cache.i) {
+   iter_io->cache.end = 0;
+   iter_io->cache.sg = NULL;
+   goto retry;
+   }
+
+   addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs +
+  (((resource_size_t)i - iter_io->cache.i)
+   << PAGE_SHIFT));
+   dma_buf_map_set_vaddr_iomem(dmap, addr);
+}
+
+struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_tt_ops = {
+   .kmap_local = i915_ttm_kmap_iter_tt_kmap_local
+};
+
+struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_io_ops = {
+   .kmap_local =  i915_ttm_kmap_iter_iomap_kmap_local
+};
+
+static void kunmap_local_dma_buf_map(struct dma_buf_map *map)
+{
+   if (map->is_iomem)
+   io_mapping_unmap_local(map->vaddr_iomem);
+   else
+   kunmap_local(map->vaddr);
+}
+
+/**
+ * i915_ttm_move_memcpy - Helper to perform a memcpy ttm move operation.
+ * @bo: The struct ttm_buffer_object.
+ * @new_mem: The struct ttm_resource we're moving to (copy destination).
+ * @new_kmap: A struct i915_ttm_kmap_iter 

[PATCH 3/7] drm/i915/ttm, drm/ttm: Initialize the ttm device and memory managers.

2021-05-11 Thread Thomas Hellström
Temporarily remove the buddy allocator and related selftests
and hook up the TTM range manager for i915 regions.

In order to support some of the mock region-related selftests, we need to
be able to initialize the TTM range-manager standalone without a struct
ttm_device. Add two functions to allow that to the TTM api.

Finally modify the mock region selftests somewhat to account for a
fragmenting manager.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  58 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_region.c| 120 ---
 drivers/gpu/drm/i915/gem/i915_gem_region.h|   4 -
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|   9 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|   2 -
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  27 +-
 drivers/gpu/drm/i915/i915_buddy.c | 435 --
 drivers/gpu/drm/i915/i915_buddy.h | 131 ---
 drivers/gpu/drm/i915/i915_drv.c   |   8 +
 drivers/gpu/drm/i915/i915_drv.h   |   7 +-
 drivers/gpu/drm/i915/i915_gem.c   |   1 +
 drivers/gpu/drm/i915/i915_globals.c   |   1 -
 drivers/gpu/drm/i915/i915_globals.h   |   1 -
 drivers/gpu/drm/i915/i915_scatterlist.c   |  70 ++
 drivers/gpu/drm/i915/i915_scatterlist.h   |  35 +
 drivers/gpu/drm/i915/intel_memory_region.c| 180 ++--
 drivers/gpu/drm/i915/intel_memory_region.h|  44 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   | 246 ++
 drivers/gpu/drm/i915/intel_region_ttm.h   |  29 +
 drivers/gpu/drm/i915/selftests/i915_buddy.c   | 789 --
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 .../drm/i915/selftests/intel_memory_region.c  | 133 +--
 drivers/gpu/drm/i915/selftests/mock_region.c  |  51 +-
 drivers/gpu/drm/ttm/ttm_range_manager.c   |  55 +-
 include/drm/ttm/ttm_bo_driver.h   |  23 +
 31 files changed, 715 insertions(+), 1771 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/i915_buddy.c
 delete mode 100644 drivers/gpu/drm/i915/i915_buddy.h
 create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.c
 create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.h
 delete mode 100644 drivers/gpu/drm/i915/selftests/i915_buddy.c

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 1e1cb245fca7..b63d374dff23 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -26,6 +26,7 @@ config DRM_I915
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
select VMAP_PFN
+   select DRM_TTM
help
  Choose this option if you have a system that has "Intel Graphics
  Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d0d936d9137b..cb8823570996 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -50,6 +50,7 @@ i915-y += i915_drv.o \
  intel_memory_region.o \
  intel_pch.o \
  intel_pm.o \
+ intel_region_ttm.o \
  intel_runtime_pm.o \
  intel_sideband.o \
  intel_step.o \
@@ -160,7 +161,6 @@ gem-y += \
 i915-y += \
  $(gem-y) \
  i915_active.o \
- i915_buddy.o \
  i915_cmd_parser.o \
  i915_gem_evict.o \
  i915_gem_gtt.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index f44bdd08f7cb..f42803ea48f2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -4,16 +4,70 @@
  */
 
 #include "intel_memory_region.h"
+#include "intel_region_ttm.h"
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_lmem.h"
 #include "i915_drv.h"
 
+static void lmem_put_pages(struct drm_i915_gem_object *obj,
+ struct sg_table *pages)
+{
+   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
+   obj->mm.dirty = false;
+   sg_free_table(pages);
+   kfree(pages);
+}
+
+static int lmem_get_pages(struct drm_i915_gem_object *obj)
+{
+   unsigned int flags;
+   struct sg_table *pages;
+
+   flags = I915_ALLOC_MIN_PAGE_SIZE;
+   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
+   flags |= I915_ALLOC_CONTIGUOUS;
+
+   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
+obj->base.size,
+flags);
+   if (IS_ERR(obj->mm.st_mm_node))
+   return PTR_ERR(obj->mm.st_mm_node);
+
+   /* Range manager 

[PATCH 4/7] drm/i915/ttm: Embed a ttm buffer object in the i915 gem object

2021-05-11 Thread Thomas Hellström
Embed a struct ttm_buffer_object into the i915 gem object, making sure
we alias the gem object part. It's a bit unfortunate that the
struct ttm_buffer_ojbect embeds a gem object since we otherwise could
make the TTM part private to the TTM backend, and use the usual
i915 gem object for the other backends.
To make this a bit more storage efficient for the other backends,
we'd have to use a pointer for the gem object which would require
a lot of changes in the driver. We postpone that for later.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   |  7 +++
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 12 +++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index abadf0994ad0..c8953e3f5c70 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -62,6 +62,13 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
  const struct drm_i915_gem_object_ops *ops,
  struct lock_class_key *key, unsigned flags)
 {
+   /*
+* A gem object is embedded both in a struct ttm_buffer_object :/ and
+* in a drm_i915_gem_object. Make sure they are aliased.
+*/
+   BUILD_BUG_ON(offsetof(typeof(*obj), base) !=
+offsetof(typeof(*obj), __do_not_access.base));
+
spin_lock_init(>vma.lock);
INIT_LIST_HEAD(>vma.list);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index dbd7fffe956e..98f69d8fd37d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -10,6 +10,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include "i915_active.h"
@@ -99,7 +100,16 @@ struct i915_gem_object_page_iter {
 };
 
 struct drm_i915_gem_object {
-   struct drm_gem_object base;
+   /*
+* We might have reason to revisit the below since it wastes
+* a lot of space for non-ttm gem objects.
+* In any case, always use the accessors for the ttm_buffer_object
+* when accessing it.
+*/
+   union {
+   struct drm_gem_object base;
+   struct ttm_buffer_object __do_not_access;
+   };
 
const struct drm_i915_gem_object_ops *ops;
 
-- 
2.30.2



[PATCH 1/7] drm/i915: Untangle the vma pages_mutex

2021-05-11 Thread Thomas Hellström
From: Thomas Hellström 

Any sleeping dma_resv lock taken while the vma pages_mutex is held
will cause a lockdep splat.
Move the i915_gem_object_pin_pages() call out of the pages_mutex
critical section.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/i915_vma.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index a6cd0fa62847..7b1c0f4e60d7 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -800,32 +800,37 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned 
int flags)
 static int vma_get_pages(struct i915_vma *vma)
 {
int err = 0;
+   bool pinned_pages = false;
 
if (atomic_add_unless(>pages_count, 1, 0))
return 0;
 
+   if (vma->obj) {
+   err = i915_gem_object_pin_pages(vma->obj);
+   if (err)
+   return err;
+   pinned_pages = true;
+   }
+
/* Allocations ahoy! */
-   if (mutex_lock_interruptible(>pages_mutex))
-   return -EINTR;
+   if (mutex_lock_interruptible(>pages_mutex)) {
+   err = -EINTR;
+   goto unpin;
+   }
 
if (!atomic_read(>pages_count)) {
-   if (vma->obj) {
-   err = i915_gem_object_pin_pages(vma->obj);
-   if (err)
-   goto unlock;
-   }
-
err = vma->ops->set_pages(vma);
-   if (err) {
-   if (vma->obj)
-   i915_gem_object_unpin_pages(vma->obj);
+   if (err)
goto unlock;
-   }
+   pinned_pages = false;
}
atomic_inc(>pages_count);
 
 unlock:
mutex_unlock(>pages_mutex);
+unpin:
+   if (pinned_pages)
+   __i915_gem_object_unpin_pages(vma->obj);
 
return err;
 }
@@ -838,10 +843,10 @@ static void __vma_put_pages(struct i915_vma *vma, 
unsigned int count)
if (atomic_sub_return(count, >pages_count) == 0) {
vma->ops->clear_pages(vma);
GEM_BUG_ON(vma->pages);
-   if (vma->obj)
-   i915_gem_object_unpin_pages(vma->obj);
}
mutex_unlock(>pages_mutex);
+   if (vma->obj)
+   i915_gem_object_unpin_pages(vma->obj);
 }
 
 static void vma_put_pages(struct i915_vma *vma)
-- 
2.30.2



[PATCH 2/7] drm/i915: Don't free shared locks while shared

2021-05-11 Thread Thomas Hellström
We are currently sharing the VM reservation locks across a number of
gem objects with page-table memory. Since TTM will individiualize the
reservation locks when freeing objects, including accessing the shared
locks, make sure that the shared locks are not freed until that is done.
For PPGTT we add an additional refcount, for GGTT we flush the object
freeing workqueue before freeing the shared lock.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  3 ++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 13 --
 drivers/gpu/drm/i915/gt/intel_gtt.c   | 45 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 29 +++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  2 +-
 6 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 28144410df86..abadf0994ad0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -252,6 +252,9 @@ static void __i915_gem_free_objects(struct drm_i915_private 
*i915,
if (obj->mm.n_placements > 1)
kfree(obj->mm.placements);
 
+   if (obj->resv_shared_from)
+   i915_vm_resv_put(obj->resv_shared_from);
+
/* But keep the pointer alive for RCU-protected lookups */
call_rcu(>rcu, __i915_gem_free_object_rcu);
cond_resched();
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 0727d0c76aa0..450340a73186 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -149,6 +149,7 @@ struct drm_i915_gem_object {
 * when i915_gem_ww_ctx_backoff() or i915_gem_ww_ctx_fini() are called.
 */
struct list_head obj_link;
+   struct dma_resv *resv_shared_from;
 
union {
struct rcu_head rcu;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 35069ca5d7de..128d781e429f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -746,7 +746,13 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
 
mutex_unlock(>vm.mutex);
i915_address_space_fini(>vm);
-   dma_resv_fini(>vm.resv);
+   /*
+* Make sure our pagetable gem objects have been freed,
+* so that nobody shares our reservation object anymore.
+*/
+   i915_gem_flush_free_objects(ggtt->vm.i915);
+   GEM_WARN_ON(kref_read(>vm.resv_ref) != 1);
+   dma_resv_fini(>vm._resv);
 
arch_phys_wc_del(ggtt->mtrr);
 
@@ -829,6 +835,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 
size)
return -ENOMEM;
}
 
+   kref_init(>vm.resv_ref);
ret = setup_scratch_page(>vm);
if (ret) {
drm_err(>drm, "Scratch setup failed\n");
@@ -1135,7 +1142,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
intel_gt *gt)
ggtt->vm.gt = gt;
ggtt->vm.i915 = i915;
ggtt->vm.dma = i915->drm.dev;
-   dma_resv_init(>vm.resv);
+   dma_resv_init(>vm._resv);
 
if (INTEL_GEN(i915) <= 5)
ret = i915_gmch_probe(ggtt);
@@ -1144,7 +1151,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
intel_gt *gt)
else
ret = gen8_gmch_probe(ggtt);
if (ret) {
-   dma_resv_fini(>vm.resv);
+   dma_resv_fini(>vm._resv);
return ret;
}
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 9b98f9d9faa3..695b22b17644 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -22,8 +22,11 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct 
i915_address_space *vm, int sz)
 * object underneath, with the idea that one object_lock() will lock
 * them all at once.
 */
-   if (!IS_ERR(obj))
-   obj->base.resv = >resv;
+   if (!IS_ERR(obj)) {
+   obj->base.resv = i915_vm_resv_get(vm);
+   obj->resv_shared_from = obj->base.resv;
+   }
+
return obj;
 }
 
@@ -40,8 +43,11 @@ struct drm_i915_gem_object *alloc_pt_dma(struct 
i915_address_space *vm, int sz)
 * object underneath, with the idea that one object_lock() will lock
 * them all at once.
 */
-   if (!IS_ERR(obj))
-   obj->base.resv = >resv;
+   if (!IS_ERR(obj)) {
+   obj->base.resv = i915_vm_resv_get(vm);
+   obj->resv_shared_from = obj->base.resv;
+   }
+
return obj;
 }
 
@@ -102,7 +108,7 @@ void __i915_vm_close(struct i915_address_space *vm)
 int i915_vm_lock_objects(struct i915_address_space *vm,
   

[PATCH 0/7] drm/i915: Move LMEM (VRAM) management over to TTM

2021-05-11 Thread Thomas Hellström
This is an initial patch series to move discrete memory management over to
TTM. It will be followed up shortly with adding more functionality.

The buddy allocator is temporarily removed along with its selftests and
It is replaced with the TTM range manager and some selftests are adjusted
to account for introduced fragmentation. Work is ongoing to reintroduce the
buddy allocator as a TTM resource manager.

A new memcpy ttm move is introduced that uses kmap_local() functionality
rather than vmap(). Among other things stated in the patch commit message
it helps us deal with page-pased LMEM memory. It is generic enough to replace
the ttm memcpy move with some additional work if so desired. On x86 it also
enables prefetching reads from write-combined memory.

Finally the old i915 gem object LMEM backend is replaced with a
i915 gem object TTM backend and some additional i915 gem object ops are
introduced to support the added functionality.
Currently it is used only to support management and eviction of the LMEM
region, but work is underway to extend the support to system memory. In this
way we use TTM the way it was originally intended, having the GPU binding
taken care of by driver code.

Intention is to follow up with
- System memory support
- TTM CPU pagefaulting
- Pipelined accelerated moves / migration


Thomas Hellström (7):
  drm/i915: Untangle the vma pages_mutex
  drm/i915: Don't free shared locks while shared
  drm/i915/ttm, drm/ttm: Initialize the ttm device and memory managers.
  drm/i915/ttm: Embed a ttm buffer object in the i915 gem object
  drm/i915/ttm, drm/ttm: Add a generic TTM memcpy move for page-based
iomem
  drm/i915/ttm, drm/ttm: Introduce a TTM i915 gem object backend
  drm/i915/lmem: Verify checks for lmem residency

 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   4 +-
 drivers/gpu/drm/i915/display/intel_display.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  71 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |   5 -
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 161 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  13 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  37 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_region.c| 126 +--
 drivers/gpu/drm/i915/gem/i915_gem_region.h|   4 -
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|   9 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 534 
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  48 ++
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 155 
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 141 
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |  13 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|   2 -
 drivers/gpu/drm/i915/gt/intel_gtt.c   |  45 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  29 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   2 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  30 +-
 drivers/gpu/drm/i915/i915_buddy.c | 435 --
 drivers/gpu/drm/i915/i915_buddy.h | 131 ---
 drivers/gpu/drm/i915/i915_drv.c   |   8 +
 drivers/gpu/drm/i915/i915_drv.h   |   7 +-
 drivers/gpu/drm/i915/i915_gem.c   |   6 +-
 drivers/gpu/drm/i915/i915_globals.c   |   1 -
 drivers/gpu/drm/i915/i915_globals.h   |   1 -
 drivers/gpu/drm/i915/i915_scatterlist.c   |  70 ++
 drivers/gpu/drm/i915/i915_scatterlist.h   |  35 +
 drivers/gpu/drm/i915/i915_vma.c   |  33 +-
 drivers/gpu/drm/i915/intel_memory_region.c| 181 ++--
 drivers/gpu/drm/i915/intel_memory_region.h|  45 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   | 247 ++
 drivers/gpu/drm/i915/intel_region_ttm.h   |  32 +
 drivers/gpu/drm/i915/selftests/i915_buddy.c   | 789 --
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 .../drm/i915/selftests/intel_memory_region.c  | 133 +--
 drivers/gpu/drm/i915/selftests/mock_region.c  |  51 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  13 +
 drivers/gpu/drm/ttm/ttm_range_manager.c   |  55 +-
 include/drm/ttm/ttm_bo_driver.h   |  23 +
 include/drm/ttm/ttm_device.h  |   9 +
 46 files changed, 1876 insertions(+), 1879 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h
 delete mode 100644 drivers/gpu/drm/i915/i915_buddy.c
 delete mode 100644 drivers/gpu/drm/i915/i915_buddy.h
 create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.c
 create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.h
 delete mode 100644 drivers/gpu/drm/i915/selftests/i915_buddy.c

-- 
2.30.2

  1   2   >