Re: [Intel-gfx] [PATCH v6 6/8] drm/i915/pxp: MTL-KCR interrupt ctrl's are in GT-0

2023-04-05 Thread Teres Alexis, Alan Previn
> 
alan:snip

> >   void intel_pxp_irq_enable(struct intel_pxp *pxp)
> >   {
> > -   struct intel_gt *gt = pxp->ctrl_gt;
> > +   struct intel_gt *gt = intel_pxp_get_irq_gt(pxp);
> 
> in this function we use the gt for:
> 
> 1 - the lock: see above about this
> 
> 2 - gen11_gt_reset_one_iir(): this should work with the media GT (we use 
> it for media GuC)
> 
> 3 - writes to the GEN11_CRYPTO_* regs: those should also work with the 
> media GT uncore as these regs are in the same range as the GuC scratch 
> regs and we use the media uncore for those accesses.
> 
alan:snip
> > @@ -83,7 +101,7 @@ void intel_pxp_irq_enable(struct intel_pxp *pxp)
> >   
> >   void intel_pxp_irq_disable(struct intel_pxp *pxp)
> >   {
> > -   struct intel_gt *gt = pxp->ctrl_gt;
> > +   struct intel_gt *gt = intel_pxp_get_irq_gt(pxp);
> >   
> 
> AFAICS this functions uses the same 3 cases as above.
> 
> Overall, I am not sure this patch is required. Am I missing something?
> 
alan: context: during development of my initial few revs, i needed to
explicitly do that switch-over to the other gt in order to even get the IRQs.
(i.e. as if the forcewake didnt wake up the range)... but upon recent
retesting it seems to work fine. I guess there must have been a bug
somewhere else in my branch.
So yes i believe this means we can totally remove this patch



Re: [Intel-gfx] [PATCH v6 7/8] drm/i915/pxp: On MTL, KCR enabling doesn't wait on tee component

2023-04-05 Thread Teres Alexis, Alan Previn
> 
alan:snip
> > @@ -140,10 +141,15 @@ static void pxp_init_full(struct intel_pxp *pxp)
> > if (ret)
> > return;
> >   
> > -   if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
> > +   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
> > ret = intel_pxp_gsccs_init(pxp);
> > -   else
> > +   if (!ret) {
> > +   with_intel_runtime_pm(>ctrl_gt->i915->runtime_pm, 
> > wakeref)
> > +   intel_pxp_init_hw(pxp);
> 
> personal preference: I'd move this (and the matching call in fini) 
> inside intel_pxp_gsccs_init/fini. That way we can see this as more 
> back-end specific: the gsccs initialize everything immediately, while 
> the tee back-end follows a 2-step approach with the component.
> Not a blocker since it is a personal preference, so with or without the 
> change:
> 
> Reviewed-by: Daniele Ceraolo Spurio 
> 
> Daniele

alan: will make that change too - thanks for the R-b.
alan:snip


Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 06:50:22AM -0700, Rob Clark wrote:
> On Wed, Apr 5, 2023 at 6:31 AM Daniel Vetter  wrote:
> >
> > If the crtc is being switched on or off then the semantics of
> > computing the timestampe of the next vblank is somewhat ill-defined.
> > And indeed, the code splats with a warning in the timestamp
> > computation code. Specifically it hits the check to make sure that
> > atomic drivers have full set up the timing constants in the drm_vblank
> > structure, and that's just not the case before the crtc is actually
> > on.
> >
> > For robustness it seems best to just not set deadlines for modesets.
> >
> > v2: Also skip on inactive crtc (Ville)
> >
> > Link: 
> > https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> > Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> > Cc: Ville Syrjälä 
> > Cc: Rob Clark 
> > Cc: Daniel Vetter 
> > Cc: Maarten Lankhorst 
> > Cc: Maxime Ripard 
> > Cc: Thomas Zimmermann 
> > Reported-by: Dmitry Baryshkov 
> > Tested-by: Dmitry Baryshkov  # test patch only
> > Cc: Dmitry Baryshkov 
> > Signed-off-by: Daniel Vetter 
> 
> Reviewed-by: Rob Clark 

Merged to drm-misc-next, thanks for review.

> 
> > ---
> >  drivers/gpu/drm/drm_atomic_helper.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> > b/drivers/gpu/drm/drm_atomic_helper.c
> > index f21b5a74176c..d44fb9b87ef8 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device 
> > *dev,
> > for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> > ktime_t v;
> >
> > +   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> > +   continue;
> > +
> > +   if (!new_crtc_state->active)
> > +   continue;
> > +
> > if (drm_crtc_next_vblank_start(crtc, ))
> > continue;
> >
> > --
> > 2.40.0
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Patchwork
== Series Details ==

Series: drm/i915/guc: Disable PL1 power limit when loading GuC firmware
URL   : https://patchwork.freedesktop.org/series/116172/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12976 -> Patchwork_116172v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/index.html

Participating hosts (37 -> 36)
--

  Missing(1): fi-snb-2520m 

Known issues


  Here are the changes found in Patchwork_116172v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_suspend@basic-s3@lmem0:
- bat-dg2-9:  [PASS][1] -> [FAIL][2] ([fdo#103375]) +3 similar 
issues
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12976/bat-dg2-9/igt@gem_exec_suspend@basic...@lmem0.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-dg2-9/igt@gem_exec_suspend@basic...@lmem0.html

  * igt@i915_selftest@live@gt_heartbeat:
- fi-kbl-soraka:  [PASS][3] -> [DMESG-FAIL][4] ([i915#5334] / 
[i915#7872])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12976/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@slpc:
- bat-rpls-2: NOTRUN -> [DMESG-FAIL][5] ([i915#6367] / [i915#7913] 
/ [i915#7996])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-2/igt@i915_selftest@l...@slpc.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
- bat-rpls-2: NOTRUN -> [SKIP][6] ([i915#7828])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-2/igt@kms_chamelium_...@common-hpd-after-suspend.html
- bat-rpls-1: NOTRUN -> [SKIP][7] ([i915#7828])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-1/igt@kms_chamelium_...@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-c-dp-1:
- bat-dg2-8:  [PASS][8] -> [FAIL][9] ([i915#7932])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12976/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html

  * igt@kms_pipe_crc_basic@suspend-read-crc:
- bat-rpls-1: NOTRUN -> [SKIP][10] ([i915#1845])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-1/igt@kms_pipe_crc_ba...@suspend-read-crc.html
- bat-rpls-2: NOTRUN -> [SKIP][11] ([i915#1845])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-2/igt@kms_pipe_crc_ba...@suspend-read-crc.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3:
- bat-dg2-9:  [PASS][12] -> [FAIL][13] ([fdo#103375] / [i915#7932])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12976/bat-dg2-9/igt@kms_pipe_crc_basic@suspend-read-...@pipe-c-dp-3.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-dg2-9/igt@kms_pipe_crc_basic@suspend-read-...@pipe-c-dp-3.html

  
 Possible fixes 

  * igt@gem_exec_suspend@basic-s3@smem:
- bat-rpls-1: [ABORT][14] ([i915#6687] / [i915#7978]) -> [PASS][15]
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12976/bat-rpls-1/igt@gem_exec_suspend@basic...@smem.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-1/igt@gem_exec_suspend@basic...@smem.html

  * igt@i915_selftest@live@reset:
- bat-rpls-2: [ABORT][16] ([i915#4983] / [i915#7913] / [i915#7981]) 
-> [PASS][17]
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12976/bat-rpls-2/igt@i915_selftest@l...@reset.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116172v1/bat-rpls-2/igt@i915_selftest@l...@reset.html

  
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7872]: https://gitlab.freedesktop.org/drm/intel/issues/7872
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#7981]: https://gitlab.freedesktop.org/drm/intel/issues/7981
  [i915#7996]: https://gitlab.freedesktop.org/drm/intel/issues/7996


Build changes

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Liu, Yi L
Hi Eric,

> From: Eric Auger 
> Sent: Thursday, April 6, 2023 1:58 AM
[...]
>  diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>  index 25432ef213ee..5a34364e3b94 100644
>  --- a/include/uapi/linux/vfio.h
>  +++ b/include/uapi/linux/vfio.h
>  @@ -650,11 +650,32 @@ enum {
>    * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE +
> 12,
>    *struct 
>  vfio_pci_hot_reset_info)
>    *
>  + * This command is used to query the affected devices in the hot reset 
>  for
>  + * a given device.  User could use the information reported by this 
>  command
>  + * to figure out the affected devices among the devices it has opened.
> the 'opened' terminology does not look sufficient here because it is not
> only a matter of the device being opened using cdev but it also needs to
> have been bound to an iommufd, dev_id being the output of the
> dev-iommufd binding.
> 
> By the way I am now confused. What does happen if the reset impact some
> devices which are not bound to an iommu ctx. Previously we returned the
> iommu group which always pre-exists but now you will report invalid id?

For such devices, user could use the bdf information to check if
affected device is opened by the user. If yes, do some necessary
preparation on the device before issuing hot reset.

Regards,
Yi Liu

>  + * This command always reports the segment, bus and devfn information 
>  for
>  + * each affected device, and selectively report the group_id or the 
>  dev_id
>  + * per the way how the device being queried is opened.
>  + *  - If the device is opened via the traditional group/container 
>  manner,
>  + *this command reports the group_id for each affected device.
>  + *
>  + *  - If the device is opened as a cdev, this command needs to 
>  report
> >>> s/needs to report/reports
> >> got it.
> >>
>  + *dev_id for each affected device and set the
>  + *VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag.  For the
> affected
>  + *devices that are not opened as cdev or bound to different 
>  iommufds
>  + *with the device that is queried, report an invalid dev_id to 
>  avoid
> or not bound at all
> >>> s/bound to different iommufds with the device that is queried/bound to
> >>> iommufds different from the reset device one?
> >> hmmm, I'm not a native speaker here. This _INFO is to query if want
> >> hot reset a given device, what devices would be affected. So it appears
> >> the queried device is better. But I'd admit "the queried device" is also
> >> "the reset device". may Alex help pick one. 
> > - If the calling device is opened directly via cdev rather than
> >   accessed through the vfio group, the returned
> >   vfio_pci_depdendent_device structure reports the dev_id
> >   rather than the group_id, which is indicated by the
> >   VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag in
> >   vfio_pci_hot_reset_info.  If the reset affects devices that
> >   are not opened within the same iommufd context as the calling
> >   device, IOMMUFD_INVALID_ID will be provided as the dev_id.
> >
> > But that kind of brings to light the question of what does the user do
> > when they encounter this situation.  If the device is not opened, the
> > reset can complete.  If the device is opened by a different user, the
> > reset is blocked.  The only logical conclusion is that the user should
> > try the reset regardless of the result of the info ioctl, which the
> > null-array approach further solidifies as the direction of the API.
> > I'm not liking this.  Thanks,
> >
> > Alex
> 
> Thanks
> 
> Eric
> >
> >
>  + *potential dev_id conflict as dev_id is local to iommufd.  For 
>  such
>  + *affected devices, user shall fall back to use the segment, 
>  bus and
>  + *devfn info to map it to opened device.
>  + *
>    * Return: 0 on success, -errno on failure:
>    *  -enospc = insufficient buffer, -enodev = unsupported for device.
>    */
>   struct vfio_pci_dependent_device {
>  -__u32   group_id;
>  +union {
>  +__u32   group_id;
>  +__u32   dev_id;
>  +};
>   __u16   segment;
>   __u8bus;
>   __u8devfn; /* Use PCI_SLOT/PCI_FUNC */
>  @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
>   struct vfio_pci_hot_reset_info {
>   __u32   argsz;
>   __u32   flags;
>  +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID  (1 << 0)
>   __u32   count;
>   struct vfio_pci_dependent_devicedevices[];
>   };
> >>> Eric



[Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Patchwork
== Series Details ==

Series: drm/i915/guc: Disable PL1 power limit when loading GuC firmware
URL   : https://patchwork.freedesktop.org/series/116172/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
+./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:237:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:237:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:239:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:239:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:100:17: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:100:17: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:100:23: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:100:23: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:100:9: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:100:9: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:105:1: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:105:1: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:109:9: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:109:9: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:111:10: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:111:10: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:111:14: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:111:14: warning: unreplaced 

Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Dixit, Ashutosh
On Wed, 05 Apr 2023 13:12:29 -0700, Rodrigo Vivi wrote:
>
> On Wed, Apr 05, 2023 at 12:42:30PM -0700, Dixit, Ashutosh wrote:
> > On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote:
> > >

Hi Rodrigo,

> >
> > > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> > > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> > > > >
> > > >
> > > > Hi Vinay,
> > > >
> > > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct 
> > > > > intel_guc_slpc *slpc, u32 val)
> > > > >   val > slpc->max_freq_softlimit)
> > > > >   return -EINVAL;
> > > > >
> > > > > + /* Ignore efficient freq if lower min freq is requested */
> > > > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < 
> > > > > slpc->rp1_freq);
> > > > > + if (ret)
> > > > > + goto out;
> > > > > +
> > > >
> > > > I don't agree with this. If we are now providing an interface 
> > > > explicitly to
> > > > ignore RPe, that should be /only/ way to ignore RPe. There should be no
> > > > other "under the hood" ignoring of RPe. In other words, ignoring RPe 
> > > > should
> > > > be minimized unless explicitly requested.
> > > >
> > > > I don't clearly understand why this was done previously but it makes 
> > > > even
> > > > less sense to me now after this patch.
> > >
> > > well, I had suggested this previously. And just because without this we 
> > > would
> > > be breaking API expectations.
> > >
> > > When user selects a minimal frequency it expect that to stick. But with 
> > > the
> > > efficient freq enabled in guc if minimal is less than the efficient one,
> > > this request is likely ignored.
> > >
> > > Well, even worse is that we are actually caching the request in the soft 
> > > values.
> > > So we show a minimal, but the hardware without any workload is operating 
> > > at
> > > efficient.
> > >
> > > So, the thought process was: 'if user requested a very low minimal, we 
> > > give them
> > > the minimal requested, even if that means to disable the efficient freq.'
> >
> > Hmm, I understand this even less now :)
> >
> > * Why is RPe ignored when min < RPe? Since the freq can be between min and
> >   max? Shouldn't the condition be min > RPe, that is turn RPe off if min
> >   higher that RPe is requested?
>
> that is not how guc efficient freq selection works. (unless my memory is
> tricking me right now.)
>
> So, if we select a min that is between RPe and RP0, guc will respect and
> use the selected min. So we don't need to disable guc selection of the
> efficient.
>
> This is not true when we select a very low min like RPn. If we select RPn
> as min and guc efficient freq selection is enabled guc will simply ignore
> our request. So the only way to give the user what is asked, is to also
> disable guc's efficient freq selection. (I probably confused you in the
> previous email because I used 'RP0' when I meant 'RPn'. I hope it gets
> clear now).
>
> >
> > * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD?
>
> Oh... yeap, this is an issue indeed. Specially with i915 where we have
> the soft values cached instead of asking guc everytime.
>
> That's a good point. The variance is not big, but we will hit corner cases.
> One way is to keep checking and updating everytime a sysfs is touched.

This I believe not possible in all cases. Say the freq's are set through
sysfs first and the workload starts later. In this case RPe will probably
start changing after the workload starts, not when freq's are set in sysfs.

> Other way is do what you are suggesting and let's just accept and deal
> with the reality that is: "we cannot guarantee a min freq selection if user
> doesn't disable the efficient freq selection".
>
> >
> > * Finally, we know that enabling RPe broke the kernel freq API because RPe
> >   could go over max_freq. So it is actually the max freq which is not
> >   obeyed after RPe is enabled.
>
> Oh! so it was my bad memory indeed and everything is the other way around?
> But I just looked to Xe code, my most recent memory, and I just needed
> to toggle the efficient freq off on the case that I mentioned, when min
> selection is below the efficient one. With that all the API expectation
> that I coded in IGT works neatly.

From what I saw the following bugs:

https://gitlab.freedesktop.org/drm/intel/-/issues/6806
https://gitlab.freedesktop.org/drm/intel/-/issues/6786

and the following patches in response to these bugs (as well as the
discussion on these patches):

https://patchwork.freedesktop.org/series/111282/
https://patchwork.freedesktop.org/series/110574/

were all due to the fact that the max freq is not obeyed after RPe is
enabled.

These patches were never merged but I will now try to submit them again
after this ignore efficient freq patch gets reviewed and merged.

Thanks.
--
Ashutosh

> >
> > So we ignore RPe in some select cases (which also I don't understand as
> > mentioned above but maybe I am missing something) to 

Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Dixit, Ashutosh
On Tue, 28 Mar 2023 02:14:42 -0700, Tvrtko Ursulin wrote:
>

Hi Tvrtko,

> On 27/03/2023 18:47, Rodrigo Vivi wrote:
> >
> > +Daniel
> >
> > On Mon, Mar 27, 2023 at 09:58:52AM -0700, Dixit, Ashutosh wrote:
> >> On Sun, 26 Mar 2023 04:52:59 -0700, Rodrigo Vivi wrote:
> >>>
> >>
> >> Hi Rodrigo,
> >>
> >>> On Fri, Mar 24, 2023 at 04:31:22PM -0700, Dixit, Ashutosh wrote:
>  On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote:
> >
> 
>  Hi Vinay,
> 
>  Thanks for the review. Comments inline below.
> 
> > On 3/15/2023 8:59 PM, Ashutosh Dixit wrote:
> >> On dGfx, the PL1 power limit being enabled and set to a low value 
> >> results
> >> in a low GPU operating freq. It also negates the freq raise operation 
> >> which
> >> is done before GuC firmware load. As a result GuC firmware load can 
> >> time
> >> out. Such timeouts were seen in the GL #8062 bug below (where the PL1 
> >> power
> >> limit was enabled and set to a low value). Therefore disable the PL1 
> >> power
> >> limit when allowed by HW when loading GuC firmware.
> > v3 label missing in subject.
> >>
> >> v2:
> >>- Take mutex (to disallow writes to power1_max) across GuC reset/fw 
> >> load
> >>- Add hwm_power_max_restore to error return code path
> >>
> >> v3 (Jani N):
> >>- Add/remove explanatory comments
> >>- Function renames
> >>- Type corrections
> >>- Locking annotation
> >>
> >> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> >> Signed-off-by: Ashutosh Dixit 
> >> ---
> >>drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 +++
> >>drivers/gpu/drm/i915/i915_hwmon.c | 39 
> >> +++
> >>drivers/gpu/drm/i915/i915_hwmon.h |  7 +
> >>3 files changed, 55 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
> >> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> >> index 4ccb4be4c9cba..aa8e35a5636a0 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> >> @@ -18,6 +18,7 @@
> >>#include "intel_uc.h"
> >>  #include "i915_drv.h"
> >> +#include "i915_hwmon.h"
> >>  static const struct intel_uc_ops uc_ops_off;
> >>static const struct intel_uc_ops uc_ops_on;
> >> @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>struct intel_guc *guc = >guc;
> >>struct intel_huc *huc = >huc;
> >>int ret, attempts;
> >> +  bool pl1en;
> >
> > Init to 'false' here
> 
>  See next comment.
> 
> >
> >
> >>GEM_BUG_ON(!intel_uc_supports_guc(uc));
> >>GEM_BUG_ON(!intel_uc_wants_guc(uc));
> >> @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>else
> >>attempts = 1;
> >>+   /* Disable a potentially low PL1 power limit to allow freq to be
> >> raised */
> >> +  i915_hwmon_power_max_disable(gt->i915, );
> >> +
> >>intel_rps_raise_unslice(_to_gt(uc)->rps);
> >>while (attempts--) {
> >> @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>intel_rps_lower_unslice(_to_gt(uc)->rps);
> >>}
> >>+   i915_hwmon_power_max_restore(gt->i915, pl1en);
> >> +
> >>guc_info(guc, "submission %s\n", 
> >> str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
> >>guc_info(guc, "SLPC %s\n", 
> >> str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
> >>@@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>/* Return GT back to RPn */
> >>intel_rps_lower_unslice(_to_gt(uc)->rps);
> >>+   i915_hwmon_power_max_restore(gt->i915, pl1en);
> >
> > if (pl1en)
> >
> >      i915_hwmon_power_max_enable().
> 
>  IMO it's better not to have checks in the main __uc_init_hw() function 
>  (if
>  we do this we'll need to add 2 checks in __uc_init_hw()). If you really
>  want we could do something like this inside
>  i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I
>  am not making any changes.
> 
>  (I can send a patch with the changes if you want to take a look but IMO 
>  it
>  will add more logic/code but without real benefits (it will save a rmw if
>  the limit was already disabled, but IMO this code is called so 
>  infrequently
>  (only during GuC resets) as to not have any significant impact)).
> 
> >
> >> +
> >>__uc_sanitize(uc);
> >>if (!ret) {
> >> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
> >> b/drivers/gpu/drm/i915/i915_hwmon.c
> >> index ee63a8fd88fc1..769b5bda4d53f 100644
> >> --- 

Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Dixit, Ashutosh
On Mon, 27 Mar 2023 10:47:25 -0700, Rodrigo Vivi wrote:
>

Hi Rodrigo,

Sorry for the delay, I got pulled away into a couple of other things and
could only now get back to this.

>
> +Daniel
>
> On Mon, Mar 27, 2023 at 09:58:52AM -0700, Dixit, Ashutosh wrote:
> > On Sun, 26 Mar 2023 04:52:59 -0700, Rodrigo Vivi wrote:
> > >
> >
> > Hi Rodrigo,
> >
> > > On Fri, Mar 24, 2023 at 04:31:22PM -0700, Dixit, Ashutosh wrote:
> > > > On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote:
> > > > >
> > > >
> > > > Hi Vinay,
> > > >
> > > > Thanks for the review. Comments inline below.
> > > >
> > > > > On 3/15/2023 8:59 PM, Ashutosh Dixit wrote:
> > > > > > On dGfx, the PL1 power limit being enabled and set to a low value 
> > > > > > results
> > > > > > in a low GPU operating freq. It also negates the freq raise 
> > > > > > operation which
> > > > > > is done before GuC firmware load. As a result GuC firmware load can 
> > > > > > time
> > > > > > out. Such timeouts were seen in the GL #8062 bug below (where the 
> > > > > > PL1 power
> > > > > > limit was enabled and set to a low value). Therefore disable the 
> > > > > > PL1 power
> > > > > > limit when allowed by HW when loading GuC firmware.
> > > > > v3 label missing in subject.
> > > > > >
> > > > > > v2:
> > > > > >   - Take mutex (to disallow writes to power1_max) across GuC 
> > > > > > reset/fw load
> > > > > >   - Add hwm_power_max_restore to error return code path
> > > > > >
> > > > > > v3 (Jani N):
> > > > > >   - Add/remove explanatory comments
> > > > > >   - Function renames
> > > > > >   - Type corrections
> > > > > >   - Locking annotation
> > > > > >
> > > > > > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> > > > > > Signed-off-by: Ashutosh Dixit 
> > > > > > ---
> > > > > >   drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 +++
> > > > > >   drivers/gpu/drm/i915/i915_hwmon.c | 39 
> > > > > > +++
> > > > > >   drivers/gpu/drm/i915/i915_hwmon.h |  7 +
> > > > > >   3 files changed, 55 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
> > > > > > b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > > > index 4ccb4be4c9cba..aa8e35a5636a0 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > > > @@ -18,6 +18,7 @@
> > > > > >   #include "intel_uc.h"
> > > > > > #include "i915_drv.h"
> > > > > > +#include "i915_hwmon.h"
> > > > > > static const struct intel_uc_ops uc_ops_off;
> > > > > >   static const struct intel_uc_ops uc_ops_on;
> > > > > > @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > struct intel_guc *guc = >guc;
> > > > > > struct intel_huc *huc = >huc;
> > > > > > int ret, attempts;
> > > > > > +   bool pl1en;
> > > > >
> > > > > Init to 'false' here
> > > >
> > > > See next comment.
> > > >
> > > > >
> > > > >
> > > > > > GEM_BUG_ON(!intel_uc_supports_guc(uc));
> > > > > > GEM_BUG_ON(!intel_uc_wants_guc(uc));
> > > > > > @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > else
> > > > > > attempts = 1;
> > > > > >   + /* Disable a potentially low PL1 power limit to allow freq to be
> > > > > > raised */
> > > > > > +   i915_hwmon_power_max_disable(gt->i915, );
> > > > > > +
> > > > > > intel_rps_raise_unslice(_to_gt(uc)->rps);
> > > > > > while (attempts--) {
> > > > > > @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > intel_rps_lower_unslice(_to_gt(uc)->rps);
> > > > > > }
> > > > > >   + i915_hwmon_power_max_restore(gt->i915, pl1en);
> > > > > > +
> > > > > > guc_info(guc, "submission %s\n", 
> > > > > > str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
> > > > > > guc_info(guc, "SLPC %s\n", 
> > > > > > str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
> > > > > >   @@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > /* Return GT back to RPn */
> > > > > > intel_rps_lower_unslice(_to_gt(uc)->rps);
> > > > > >   + i915_hwmon_power_max_restore(gt->i915, pl1en);
> > > > >
> > > > > if (pl1en)
> > > > >
> > > > >     i915_hwmon_power_max_enable().
> > > >
> > > > IMO it's better not to have checks in the main __uc_init_hw() function 
> > > > (if
> > > > we do this we'll need to add 2 checks in __uc_init_hw()). If you really
> > > > want we could do something like this inside
> > > > i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I
> > > > am not making any changes.
> > > >
> > > > (I can send a patch with the changes if you want to take a look but IMO 
> > > > it
> > > > will add more logic/code but without real benefits (it will save a rmw 
> > > > if
> > > > the limit was already disabled, but IMO this code is called so 
> > > > infrequently
> > > > (only during GuC resets) as to not have any significant impact)).
> > > >
> > > > >
> > > > 

[Intel-gfx] [PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete

2023-04-05 Thread Ashutosh Dixit
Instead of erroring out when GuC reset is in progress, block waiting for
GuC reset to complete which is a more reasonable uapi behavior.

Signed-off-by: Ashutosh Dixit 
---
 drivers/gpu/drm/i915/i915_hwmon.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 9ab8971679fe3..4343efb48e61b 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -51,6 +51,7 @@ struct hwm_drvdata {
char name[12];
int gt_n;
bool reset_in_progress;
+   wait_queue_head_t wqh;
 };
 
 struct i915_hwmon {
@@ -400,10 +401,15 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
int ret = 0;
u32 nval;
 
+retry:
mutex_lock(>hwmon_lock);
if (hwmon->ddat.reset_in_progress) {
-   ret = -EAGAIN;
-   goto unlock;
+   mutex_unlock(>hwmon_lock);
+   ret = wait_event_interruptible(ddat->wqh,
+  !hwmon->ddat.reset_in_progress);
+   if (ret)
+   return ret;
+   goto retry;
}
wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
 
@@ -426,7 +432,6 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
 PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
 exit:
intel_runtime_pm_put(ddat->uncore->rpm, wakeref);
-unlock:
mutex_unlock(>hwmon_lock);
return ret;
 }
@@ -508,6 +513,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private 
*i915, bool old)
intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
 PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0);
hwmon->ddat.reset_in_progress = false;
+   wake_up_all(>ddat.wqh);
 
mutex_unlock(>hwmon_lock);
 }
@@ -784,6 +790,7 @@ void i915_hwmon_register(struct drm_i915_private *i915)
ddat->uncore = >uncore;
snprintf(ddat->name, sizeof(ddat->name), "i915");
ddat->gt_n = -1;
+   init_waitqueue_head(>wqh);
 
for_each_gt(gt, i915, i) {
ddat_gt = hwmon->ddat_gt + i;
-- 
2.38.0



[Intel-gfx] [PATCH v4 0/3] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Ashutosh Dixit
Split the v3 patch into 3 patches for easier review, can squash later if needed.

Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 

Ashutosh Dixit (3):
  drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_write
  drm/i915/guc: Disable PL1 power limit when loading GuC firmware
  drm/i915/hwmon: Block waiting for GuC reset to complete

 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 
 drivers/gpu/drm/i915/i915_hwmon.c | 75 ++-
 drivers/gpu/drm/i915/i915_hwmon.h |  7 +++
 3 files changed, 78 insertions(+), 13 deletions(-)

-- 
2.38.0



[Intel-gfx] [PATCH 2/3] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Ashutosh Dixit
On dGfx, the PL1 power limit being enabled and set to a low value results
in a low GPU operating freq. It also negates the freq raise operation which
is done before GuC firmware load. As a result GuC firmware load can time
out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power
limit was enabled and set to a low value). Therefore disable the PL1 power
limit when allowed by HW when loading GuC firmware.

v2:
 - Take mutex (to disallow writes to power1_max) across GuC reset/fw load
 - Add hwm_power_max_restore to error return code path

v3 (Jani N):
 - Add/remove explanatory comments
 - Function renames
 - Type corrections
 - Locking annotation

v4:
 - Don't hold the lock across GuC reset (Rodrigo)
 - New locking scheme (suggested by Rodrigo)
 - Eliminate rpm_get in power_max_disable/restore, not needed (Tvrtko)

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
Signed-off-by: Ashutosh Dixit 
---
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 ++
 drivers/gpu/drm/i915/i915_hwmon.c | 40 +++
 drivers/gpu/drm/i915/i915_hwmon.h |  7 +
 3 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 4ccb4be4c9cba..aa8e35a5636a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -18,6 +18,7 @@
 #include "intel_uc.h"
 
 #include "i915_drv.h"
+#include "i915_hwmon.h"
 
 static const struct intel_uc_ops uc_ops_off;
 static const struct intel_uc_ops uc_ops_on;
@@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
struct intel_guc *guc = >guc;
struct intel_huc *huc = >huc;
int ret, attempts;
+   bool pl1en;
 
GEM_BUG_ON(!intel_uc_supports_guc(uc));
GEM_BUG_ON(!intel_uc_wants_guc(uc));
@@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
else
attempts = 1;
 
+   /* Disable a potentially low PL1 power limit to allow freq to be raised 
*/
+   i915_hwmon_power_max_disable(gt->i915, );
+
intel_rps_raise_unslice(_to_gt(uc)->rps);
 
while (attempts--) {
@@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
intel_rps_lower_unslice(_to_gt(uc)->rps);
}
 
+   i915_hwmon_power_max_restore(gt->i915, pl1en);
+
guc_info(guc, "submission %s\n", 
str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
guc_info(guc, "SLPC %s\n", 
str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
 
@@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
/* Return GT back to RPn */
intel_rps_lower_unslice(_to_gt(uc)->rps);
 
+   i915_hwmon_power_max_restore(gt->i915, pl1en);
+
__uc_sanitize(uc);
 
if (!ret) {
diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 7f44e809ca155..9ab8971679fe3 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -50,6 +50,7 @@ struct hwm_drvdata {
struct hwm_energy_info ei;  /*  Energy info for 
energy1_input */
char name[12];
int gt_n;
+   bool reset_in_progress;
 };
 
 struct i915_hwmon {
@@ -400,6 +401,10 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
u32 nval;
 
mutex_lock(>hwmon_lock);
+   if (hwmon->ddat.reset_in_progress) {
+   ret = -EAGAIN;
+   goto unlock;
+   }
wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
 
/* Disable PL1 limit and verify, because the limit cannot be disabled 
on all platforms */
@@ -421,6 +426,7 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
 PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
 exit:
intel_runtime_pm_put(ddat->uncore->rpm, wakeref);
+unlock:
mutex_unlock(>hwmon_lock);
return ret;
 }
@@ -472,6 +478,40 @@ hwm_power_write(struct hwm_drvdata *ddat, u32 attr, int 
chan, long val)
}
 }
 
+void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old)
+{
+   struct i915_hwmon *hwmon = i915->hwmon;
+   u32 r;
+
+   if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
+   return;
+
+   mutex_lock(>hwmon_lock);
+
+   hwmon->ddat.reset_in_progress = true;
+   r = intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN, 0);
+   *old = !!(r & PKG_PWR_LIM_1_EN);
+
+   mutex_unlock(>hwmon_lock);
+}
+
+void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old)
+{
+   struct i915_hwmon *hwmon = i915->hwmon;
+
+   if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
+   return;
+
+   mutex_lock(>hwmon_lock);
+
+   intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0);
+   hwmon->ddat.reset_in_progress = 

[Intel-gfx] [PATCH 1/3] drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_write

2023-04-05 Thread Ashutosh Dixit
In preparation for follow-on patches, refactor hwm_power_max_write to take
hwmon_lock and runtime pm wakeref at start of the function and release them
at the end, therefore acquiring these just once each.

Signed-off-by: Ashutosh Dixit 
---
 drivers/gpu/drm/i915/i915_hwmon.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 8e7dccc8d3a0e..7f44e809ca155 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -396,31 +396,33 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
 {
struct i915_hwmon *hwmon = ddat->hwmon;
intel_wakeref_t wakeref;
+   int ret = 0;
u32 nval;
 
+   mutex_lock(>hwmon_lock);
+   wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
+
/* Disable PL1 limit and verify, because the limit cannot be disabled 
on all platforms */
if (val == PL1_DISABLE) {
-   mutex_lock(>hwmon_lock);
-   with_intel_runtime_pm(ddat->uncore->rpm, wakeref) {
-   intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
-PKG_PWR_LIM_1_EN, 0);
-   nval = intel_uncore_read(ddat->uncore, 
hwmon->rg.pkg_rapl_limit);
-   }
-   mutex_unlock(>hwmon_lock);
+   intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN, 0);
+   nval = intel_uncore_read(ddat->uncore, 
hwmon->rg.pkg_rapl_limit);
 
if (nval & PKG_PWR_LIM_1_EN)
-   return -ENODEV;
-   return 0;
+   ret = -ENODEV;
+   goto exit;
}
 
/* Computation in 64-bits to avoid overflow. Round to nearest. */
nval = DIV_ROUND_CLOSEST_ULL((u64)val << hwmon->scl_shift_power, 
SF_POWER);
nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval);
 
-   hwm_locked_with_pm_intel_uncore_rmw(ddat, hwmon->rg.pkg_rapl_limit,
-   PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1,
-   nval);
-   return 0;
+   intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
+exit:
+   intel_runtime_pm_put(ddat->uncore->rpm, wakeref);
+   mutex_unlock(>hwmon_lock);
+   return ret;
 }
 
 static int
-- 
2.38.0



Re: [Intel-gfx] [PATCH v9 4/4] drm/i915/selftests: skip comparison of power for discrete graphics

2023-04-05 Thread Dixit, Ashutosh
On Tue, 04 Apr 2023 23:59:30 -0700, Riana Tauro wrote:
>

Hi Riana,

> Hwmon reads the energy/power consumed by discrete soc,
> i.e. energy/power includes the power drawn from non-gfx discrete
> components
>
> This test uses the power consumed by GT to validate RC6
> power consumption.
> Skip comparison of power for discrete graphics
>
> TODO : measure power of GT in discrete graphics and modify the
> condition
>
> v2: update commit message (Anshuman)
>
> Signed-off-by: Riana Tauro 
> Reviewed-by: Anshuman Gupta 
> ---
>  drivers/gpu/drm/i915/gt/selftest_rc6.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c 
> b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> index 682f2fe67b3a..47165f490449 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_rc6.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c
> @@ -107,7 +107,15 @@ int live_rc6_manual(void *arg)
> ktime_to_ns(dt));
>   pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
>   rc0_power, rc6_power);
> - if (2 * rc6_power > rc0_power) {

So this condition is not being met for dGfx?

> +
> + /*
> +  * Condition valid for integrated graphics
> +  * On discrete graphics, hwwmon reads the energy/power from
> +  * discrete SOC that includes non-gfx components.

On dGfx, is this true even when we have per-gt level energy available? Or
only when we have device level energy but not per-gt level energy (when
total number of gt's is 1 and we only expose device level energy but not gt
level energy)?


> +  * TODO : Measure power of GT for discrete graphics and
> +  * modify the condition

If we are adding this TODO, how are we planning to do this?

> +  */
> + if (!IS_DGFX(gt->i915) && (2 * rc6_power > rc0_power)) {
>   pr_err("GPU leaked energy while in RC6!\n");
>   err = -EINVAL;
>   goto out_unlock;

Thanks.
--
Ashutosh


Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Jason Gunthorpe
On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote:

> > > QEMU can make a policy decision today because the kernel provides a
> > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > hot-reset is all but guaranteed to work.
> > 
> > And we don't change that with cdev. If qemu wants to make the policy
> > decision it keeps using the exact same _INFO interface to make that
> > decision same it has always made.
> > 
> > We weaken the actual reset action to only consider the security side.
> > 
> > Applications that want this exclusive reset group policy simply must
> > check it on their own. It is a reasonable API design.
> 
> I disagree, as I've argued before, the info ioctl becomes so weak and
> effectively arbitrary from a user perspective at being able to predict
> whether the hot-reset ioctl works that it becomes useless, diminishing
> the entire hot-reset info/execute API.

reset should be strictly more permissive than INFO. If INFO predicts
reset is permitted then reset should succeed.

We don't change INFO so it cannot "becomes so weak"  ??

We don't care about the cases where INFO says it will not succeed but
reset does (temporarily) succeed.

I don't get what argument you are trying to make or what you think is
diminished..

Again, userspace calls INFO, if info says yes then reset *always
works*, exactly just like today.

Userspace will call reset with a 0 length FD list and it uses a
security only check that is strictly more permissive than what
get_info will return. So the new check is simple in the kernel and
always works in the cases we need it to work.

What is getting things into trouble is insisting that RESET have
additional restrictions beyond the minimum checks required for
security.

> > I don't view it as a loophole, it is flexability to use the API in a
> > way that is different from what qemu wants - eg an app like dpdk may
> > be willing to tolerate a reset group that becomes unavailable after
> > startup. Who knows, why should we force this in the kernel?
> 
> Because look at all the problems it's causing to try to introduce these
> loopholes without also introducing subtle bugs.

These problems are coming from tring to do this integrated version,
not from my approach!

AFAICT there was nothing wrong with my original plan of using the
empty fd list for reset. What Yi has here is some mashup of what you
and I both suggested.

> > Remember the reason we started doing this is because we don't
> > have easy access to the BDF anymore.
> 
> We don't need it, the info ioctl provides the groups, the group
> association can be learned from the DEVICE_GET_INFO ioctl, the
> hot-reset ioctl only requires a single representative fd per affected
> group.  dev-ids not required.

I'm not talking about triggering the ioctl.

I'm talking about whatever else qemu needs to do so that the VM is
aware of the reset groups device-by-device on it's side so nested VFIO
in the VM reflects the same data as the hypervisor. Maybe it doesn't
do this right now, but the kernel API should continue to provide the
data.

> > I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> > return the dev_ids.
> 
> I don't see any justification for this.  We could add another PCI
> specific DEVICE_GET_INFO capability to report the bdf if we really need
> it, but reporting the group seems sufficient for this use case.

What I imagine is a single new ioctl 'get reset group 2' or something.
It returns a list of dev_ids in the reset group. It has an output flag
if the reset is reliable. This is the only ioctl user space needs to
call.

The reliable test is done by simply calling the ioctl and throwing
away the dev ids. The mapping of the VM's reset groups is done by
processing the dev_ids to vRIDs and flowing that into the VM somehow.

We don't expose group_ids, and we don't expose BDF. It is much simpler
and cleaner to use.

A BDF DEVICE_GET_INFO and the existing reset INFO will encode the same
data too, it is just not as elegant and requires userspace to do a lot
more work to keep track of the 3 different identifiers.

> > This looks like a very complex uapi compared to the empty list option,
> > but it seems like it would work.
>
> It's the same API that we have now.  What's complex is trying to figure
> out all the subtle side-effects from the loopholes that are being
> proposed in this series.  Thanks,

I might agree with you if we weren't now going backwards - 
ideas didn't work out and Yi has to throw stuff away. :(

Jason


Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-05 Thread Yang, Fei
>Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO 
>creation
>
>On 04/04/2023 19:04, Yang, Fei wrote:
>>> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set
>>> cache at BO creation
>>>
>>> On 01/04/2023 09:38, fei.y...@intel.com wrote:
 From: Fei Yang 

 To comply with the design that buffer objects shall have immutable
 cache setting through out its life cycle, {set, get}_caching ioctl's
 are no longer supported from MTL onward. With that change caching
 policy can only be set at object creation time. The current code
 applies a default (platform dependent) cache setting for all objects.
 However this is not optimal for performance tuning. The patch
 extends the existing gem_create uAPI to let user set PAT index for
 the object at creation time.
 The new extension is platform independent, so UMD's can switch to
 using this extension for older platforms as well, while {set,
 get}_caching are still supported on these legacy paltforms for
 compatibility reason.

 Cc: Chris Wilson 
 Cc: Matt Roper 
 Signed-off-by: Fei Yang 
 Reviewed-by: Andi Shyti 
>>>
>>> Just like the protected content uAPI, there is no way for userspace
>>> to tell this feature is available other than trying using it.
>>>
>>> Given the issues with protected content, is it not thing we could want to 
>>> add?
>> Sorry I'm not aware of the issues with protected content, could you 
>> elaborate?
>> There was a long discussion on teams uAPI channel, could you comment
>> there if any concerns?
>>
>> https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b92
>> 8d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed771
>> 2e255d=379f3ae1-d138-4205-bb65-d4c7d38cb481=16
>> 75860924675=GSE%20OSGC=i915%20uAPI%20changes
>> tedTime=1675860924675=false
>>
>> Thanks,
>> -Fei
>
>
> We wanted to have a getparam to detect protected support and were told
> to detect it by trying to create a context with it.
>
> Now it appears trying to create a protected context can block for several
> seconds.
>
> Since we have to report capabilities to the user even before it creates
> protected contexts, any app is at risk of blocking.

Can we detect this capability by creating a buffer object? This extension is
not blocking, it just provide a way to set caching policy, and should complete
very fast. There is a IGT test I created for this extension (not merged yet),
please take a look at http://intel-gfx-pw.fi.intel.com/series/19149/

I'm not familiar with getparam, will take a look there as well. But I think it
would be easier just create an object.

-Fei

>-Lionel
>
>
>>
>>> Thanks,
>>>
>>> -Lionel
>>>
>>>
 ---
drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
include/uapi/drm/i915_drm.h| 36 ++
tools/include/uapi/drm/i915_drm.h  | 36 ++
3 files changed, 105 insertions(+)



Re: [Intel-gfx] [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-05 Thread Matt Roper
On Wed, Apr 05, 2023 at 02:13:31PM -0700, John Harrison wrote:
> On 4/3/2023 17:34, Matt Roper wrote:
> > On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote:
> > > From: John Harrison 
> > > 
> > > A pair of pre-Gen12 registers were being included in the Gen12 capture
> > > list. GuC was rejecting those as being invalid and logging errors
> > > about them. So, stop doing it.
> > Looks like these registers existed from gen8-gen11.  With this change,
> > it looks like they also won't be included in the GuC error capture for
> > gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1]
> > rather than default_lists; do we care about that?  I assume not (since
> > those platforms don't use GuC submission unless you force it with the
> > enable_guc modparam and taint your kernel), but I figured I should point
> > it out.
> Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or it's
> own thing. I hadn't spotted that before. It certainly seems incorrect.
> 
> > 
> > Reviewed-by: Matt Roper 
> > 
> > 
> > [1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's
> >  display IP)?  It doesn't seem like we're doing anything with display
> >  registers here so using display IP naming seems really confusing.
> I think because no-one has a clue what name refers to what hardware any more
> :(.
> 
> What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55?

Yeah, the naming is a real mess.  :-(  For graphics IP, the official
terms are supposed to be:

12.00 = Xe_LP
12.10 = Xe_LP+ (basically the same as Xe_LP except for interrupts)
12.50 = Xe_HP
12.55 = Xe_HPG (it's nearly identical to Xe_HP)
12.7x = Xe_LPG

There are separate names for media, although we didn't really start
using them anywhere in the i915 until the separation of IPs started
becoming more important with MTL:

12.00 = Xe_M (or Xe_M+ for DG1, but we treat it the same in the KMD)
12.50 = Xe_XPM
12.55 = Xe_HPM
12.60 = Xe_XPM+
13.00 = Xe_LPM+

and display:

12.00 = Xe_D
13.00 = Xe_LPD (ADL-P) or Xe_HPD (DG2)
14.00 = Xe_LPD+


The pre-12 stuff predates the fancy new marketing-mandated names.  Even
though we're not using "gen" terminology going forward, those old ones
are grandfathered in, so it's still okay to refer to them as gen9,
gen11, etc.


Matt

> 
> John.
> 
> > 
> > 
> > Matt
> > 
> > > Signed-off-by: John Harrison 
> > > Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error 
> > > state capture.")
> > > Cc: Alan Previn 
> > > Cc: Umesh Nerlige Ramappa 
> > > Cc: Lucas De Marchi 
> > > Cc: John Harrison 
> > > Cc: Jani Nikula 
> > > Cc: Matt Roper 
> > > Cc: Balasubramani Vivekanandan 
> > > Cc: Daniele Ceraolo Spurio 
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > index cf49188db6a6e..e0e793167d61b 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > @@ -31,12 +31,14 @@
> > >   { FORCEWAKE_MT, 0,  0, "FORCEWAKE" }
> > >   #define COMMON_GEN9BASE_GLOBAL \
> > > - { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> > > - { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }, \
> > >   { ERROR_GEN6,   0,  0, "ERROR_GEN6" }, \
> > >   { DONE_REG, 0,  0, "DONE_REG" }, \
> > >   { HSW_GTT_CACHE_EN, 0,  0, "HSW_GTT_CACHE_EN" }
> > > +#define GEN9_GLOBAL \
> > > + { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> > > + { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }
> > > +
> > >   #define COMMON_GEN12BASE_GLOBAL \
> > >   { GEN12_FAULT_TLB_DATA0,0,  0, "GEN12_FAULT_TLB_DATA0" 
> > > }, \
> > >   { GEN12_FAULT_TLB_DATA1,0,  0, "GEN12_FAULT_TLB_DATA1" 
> > > }, \
> > > @@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr 
> > > xe_lpd_gsc_inst_regs[] = {
> > >   static const struct __guc_mmio_reg_descr default_global_regs[] = {
> > >   COMMON_BASE_GLOBAL,
> > >   COMMON_GEN9BASE_GLOBAL,
> > > + GEN9_GLOBAL,
> > >   };
> > >   static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {
> > > -- 
> > > 2.39.1
> > > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


[Intel-gfx] ✓ Fi.CI.BAT: success for drm/atomic-helper: Don't set deadline for modesets (rev2)

2023-04-05 Thread Patchwork
== Series Details ==

Series: drm/atomic-helper: Don't set deadline for modesets (rev2)
URL   : https://patchwork.freedesktop.org/series/116140/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12974 -> Patchwork_116140v2


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/index.html

Participating hosts (36 -> 35)
--

  Missing(1): fi-snb-2520m 

Known issues


  Here are the changes found in Patchwork_116140v2 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@i915_selftest@live@migrate:
- bat-dg2-11: [PASS][1] -> [DMESG-WARN][2] ([i915#7699])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-11/igt@i915_selftest@l...@migrate.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-dg2-11/igt@i915_selftest@l...@migrate.html

  * igt@i915_selftest@live@requests:
- bat-rpls-1: [PASS][3] -> [ABORT][4] ([i915#7911])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-rpls-1/igt@i915_selftest@l...@requests.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-rpls-1/igt@i915_selftest@l...@requests.html

  * igt@i915_selftest@live@workarounds:
- bat-dg1-5:  [PASS][5] -> [ABORT][6] ([i915#4983])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg1-5/igt@i915_selftest@l...@workarounds.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-dg1-5/igt@i915_selftest@l...@workarounds.html

  * igt@i915_suspend@basic-s3-without-i915:
- bat-dg2-8:  NOTRUN -> [SKIP][7] ([i915#6645])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-dg2-8/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
- bat-rpls-2: NOTRUN -> [SKIP][8] ([i915#7828])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-rpls-2/igt@kms_chamelium_...@common-hpd-after-suspend.html
- bat-dg2-8:  NOTRUN -> [SKIP][9] ([i915#7828])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-dg2-8/igt@kms_chamelium_...@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@suspend-read-crc:
- bat-rpls-2: NOTRUN -> [SKIP][10] ([i915#1845])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-rpls-2/igt@kms_pipe_crc_ba...@suspend-read-crc.html

  
 Possible fixes 

  * igt@i915_selftest@live@hangcheck:
- bat-dg2-8:  [ABORT][11] ([i915#7913] / [i915#7979]) -> [PASS][12]
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-8/igt@i915_selftest@l...@hangcheck.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-dg2-8/igt@i915_selftest@l...@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
- bat-rpls-2: [ABORT][13] ([i915#6687] / [i915#7978]) -> [PASS][14]
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-rpls-2/igt@i915_susp...@basic-s3-without-i915.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-rpls-2/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-c-dp-1:
- bat-dg2-8:  [FAIL][15] ([i915#7932]) -> [PASS][16]
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html

  
 Warnings 

  * igt@i915_selftest@live@slpc:
- bat-rpls-2: [DMESG-FAIL][17] ([i915#6367] / [i915#7913] / 
[i915#7996]) -> [DMESG-FAIL][18] ([i915#6367] / [i915#7913])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-rpls-2/igt@i915_selftest@l...@slpc.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116140v2/bat-rpls-2/igt@i915_selftest@l...@slpc.html

  
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#7979]: https://gitlab.freedesktop.org/drm/intel/issues/7979
  

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/atomic-helper: Don't set deadline for modesets (rev2)

2023-04-05 Thread Patchwork
== Series Details ==

Series: drm/atomic-helper: Don't set deadline for modesets (rev2)
URL   : https://patchwork.freedesktop.org/series/116140/
State : warning

== Summary ==

Error: dim checkpatch failed
0989a3e3ca58 drm/atomic-helper: Don't set deadline for modesets
-:29: WARNING:BAD_REPORTED_BY_LINK: Reported-by: should be immediately followed 
by Link: with a URL to the report
#29: 
Reported-by: Dmitry Baryshkov 
Tested-by: Dmitry Baryshkov  # test patch only

-:52: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address 
mismatch: 'From: Daniel Vetter ' != 'Signed-off-by: 
Daniel Vetter '

total: 0 errors, 2 warnings, 0 checks, 12 lines checked




Re: [Intel-gfx] [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-05 Thread John Harrison

On 4/3/2023 17:34, Matt Roper wrote:

On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote:

From: John Harrison 

A pair of pre-Gen12 registers were being included in the Gen12 capture
list. GuC was rejecting those as being invalid and logging errors
about them. So, stop doing it.

Looks like these registers existed from gen8-gen11.  With this change,
it looks like they also won't be included in the GuC error capture for
gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1]
rather than default_lists; do we care about that?  I assume not (since
those platforms don't use GuC submission unless you force it with the
enable_guc modparam and taint your kernel), but I figured I should point
it out.
Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or 
it's own thing. I hadn't spotted that before. It certainly seems incorrect.




Reviewed-by: Matt Roper 


[1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's
 display IP)?  It doesn't seem like we're doing anything with display
 registers here so using display IP naming seems really confusing.
I think because no-one has a clue what name refers to what hardware any 
more :(.


What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55?

John.




Matt


Signed-off-by: John Harrison 
Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state 
capture.")
Cc: Alan Previn 
Cc: Umesh Nerlige Ramappa 
Cc: Lucas De Marchi 
Cc: John Harrison 
Cc: Jani Nikula 
Cc: Matt Roper 
Cc: Balasubramani Vivekanandan 
Cc: Daniele Ceraolo Spurio 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index cf49188db6a6e..e0e793167d61b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -31,12 +31,14 @@
{ FORCEWAKE_MT, 0,  0, "FORCEWAKE" }
  
  #define COMMON_GEN9BASE_GLOBAL \

-   { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
-   { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }, \
{ ERROR_GEN6,   0,  0, "ERROR_GEN6" }, \
{ DONE_REG, 0,  0, "DONE_REG" }, \
{ HSW_GTT_CACHE_EN, 0,  0, "HSW_GTT_CACHE_EN" }
  
+#define GEN9_GLOBAL \

+   { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
+   { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }
+
  #define COMMON_GEN12BASE_GLOBAL \
{ GEN12_FAULT_TLB_DATA0,0,  0, "GEN12_FAULT_TLB_DATA0" }, \
{ GEN12_FAULT_TLB_DATA1,0,  0, "GEN12_FAULT_TLB_DATA1" }, \
@@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr 
xe_lpd_gsc_inst_regs[] = {
  static const struct __guc_mmio_reg_descr default_global_regs[] = {
COMMON_BASE_GLOBAL,
COMMON_GEN9BASE_GLOBAL,
+   GEN9_GLOBAL,
  };
  
  static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {

--
2.39.1





Re: [Intel-gfx] [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, Apr 05, 2023 at 07:42:08PM +0200, Javier Martinez Canillas wrote:

[...]

>> >> Ah, your patch adds it after that indeed. Please ignore my comment then.
>> >
>> > So rb: you?
>> >
>> 
>> Yes, I already provided it in my previous email and has been picked by
>> patchwork. I could do again but probably will confuse dim when applying.
>
> Yeah just wanted to confirm I cleared up all your questions. Merged the
> entire series to drm-misc-next, thanks for the review.
>

You are welcome.

>> The only patch from your series that is missing an {r,a}b is #1 right now:
>> 
>> https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both
>
> That's a different one :-)
>

Oh, sorry about that. Somehow I switched threads in my head in the middle
of the response.

> I'll respin with your comments and then let you duke it out about
> patch 1.
> -Daniel
>

Perfect, thanks! It would be good to finally have that issue fixed.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



[Intel-gfx] ✗ Fi.CI.BAT: failure for i915: Correct description of default value for enable_psr2_sel_fetch

2023-04-05 Thread Patchwork
== Series Details ==

Series: i915: Correct description of default value for enable_psr2_sel_fetch
URL   : https://patchwork.freedesktop.org/series/116150/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12974 -> Patchwork_116150v1


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_116150v1 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_116150v1, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/index.html

Participating hosts (36 -> 36)
--

  Additional (1): fi-kbl-soraka 
  Missing(1): fi-snb-2520m 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_116150v1:

### IGT changes ###

 Possible regressions 

  * igt@dmabuf@all-tests@dma_fence:
- bat-dg1-7:  [PASS][1] -> [FAIL][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg1-7/igt@dmabuf@all-tests@dma_fence.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-dg1-7/igt@dmabuf@all-tests@dma_fence.html

  * igt@dmabuf@all-tests@sanitycheck:
- bat-dg1-7:  [PASS][3] -> [ABORT][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg1-7/igt@dmabuf@all-te...@sanitycheck.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-dg1-7/igt@dmabuf@all-te...@sanitycheck.html

  
Known issues


  Here are the changes found in Patchwork_116150v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_huc_copy@huc-copy:
- fi-kbl-soraka:  NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/fi-kbl-soraka/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@basic:
- fi-kbl-soraka:  NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/fi-kbl-soraka/igt@gem_lmem_swapp...@basic.html

  * igt@i915_selftest@live@execlists:
- fi-bsw-n3050:   [PASS][7] -> [ABORT][8] ([i915#7911] / [i915#7913])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/fi-bsw-n3050/igt@i915_selftest@l...@execlists.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/fi-bsw-n3050/igt@i915_selftest@l...@execlists.html

  * igt@i915_selftest@live@gt_pm:
- fi-kbl-soraka:  NOTRUN -> [DMESG-FAIL][9] ([i915#1886])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@mman:
- bat-rpls-1: [PASS][10] -> [TIMEOUT][11] ([i915#6794])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-rpls-1/igt@i915_selftest@l...@mman.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-rpls-1/igt@i915_selftest@l...@mman.html

  * igt@i915_suspend@basic-s3-without-i915:
- bat-dg2-8:  NOTRUN -> [SKIP][12] ([i915#6645])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-dg2-8/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium_frames@hdmi-crc-fast:
- fi-kbl-soraka:  NOTRUN -> [SKIP][13] ([fdo#109271]) +16 similar issues
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/fi-kbl-soraka/igt@kms_chamelium_fra...@hdmi-crc-fast.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
- bat-rpls-2: NOTRUN -> [SKIP][14] ([i915#7828])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-rpls-2/igt@kms_chamelium_...@common-hpd-after-suspend.html
- bat-dg2-8:  NOTRUN -> [SKIP][15] ([i915#7828])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-dg2-8/igt@kms_chamelium_...@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@read-crc:
- bat-dg2-11: NOTRUN -> [SKIP][16] ([i915#5354])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-dg2-11/igt@kms_pipe_crc_ba...@read-crc.html

  * igt@kms_pipe_crc_basic@suspend-read-crc:
- bat-rpls-2: NOTRUN -> [SKIP][17] ([i915#1845])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-rpls-2/igt@kms_pipe_crc_ba...@suspend-read-crc.html

  
 Possible fixes 

  * igt@i915_pm_rps@basic-api:
- bat-dg2-11: [FAIL][18] ([i915#8308]) -> [PASS][19]
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-11/igt@i915_pm_...@basic-api.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116150v1/bat-dg2-11/igt@i915_pm_...@basic-api.html

  * igt@i915_selftest@live@hangcheck:
- bat-dg2-8:  [ABORT][20] 

Re: [Intel-gfx] [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 07:42:08PM +0200, Javier Martinez Canillas wrote:
> Daniel Vetter  writes:
> 
> > On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote:
> >> Daniel Vetter  writes:
> 
> [...]
> 
> >> >
> >> > The __fill_var is after this. I'm honestly not sure what the exact
> >> 
> >> Ah, your patch adds it after that indeed. Please ignore my comment then.
> >
> > So rb: you?
> >
> 
> Yes, I already provided it in my previous email and has been picked by
> patchwork. I could do again but probably will confuse dim when applying.

Yeah just wanted to confirm I cleared up all your questions. Merged the
entire series to drm-misc-next, thanks for the review.

> The only patch from your series that is missing an {r,a}b is #1 right now:
> 
> https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both

That's a different one :-)

I'll respin with your comments and then let you duke it out about
patch 1.
-Daniel

> 
> [...]
> 
> >> > What I'm wondering now is whether too small x/yres won't lead to problems
> >> > of some sorts ... For multi-screen we set the virtual size to be big
> >> > enough for all crtc, and then just set x/yres to be the smallest output.
> >> > That way fbcon knows to only draw as much as is visible on all screens.
> >> > But if you then pan that too much, the bigger screens might not have a 
> >> > big
> >> > enough buffer anymore and things fail (but shouldn't).
> >> >
> >> > Not sure how to fix that tbh.
> >> 
> >> Would this be a problem in practice?
> >
> > I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all
> > outputs, but only if you have userspace doing this intentionally.
> >
> > In a way it's just another artifact of the drm fbdev emulation not using
> > ATOMIC_TEST_ONLY in the various places where it should, and so doesn't
> > really know whether a configuration change will work out.
> >
> > We already have this in obscure mulit-monitor cases where adding another
> > screen kills fbcon, because the display hw is running out of fifo or
> > clocks or whatever, and because the drm fbdev code doesn't check but just
> > blindly commits the entire thing as an atomic commit, the overall commit
> > fails.
> >
> > This worked "better" with legacy kms because there we commit per-crtc, so
> > if any specific crtc runs into a limit check, only that one fails to light
> > up.
> >
> > Imo given that no one cared enough yet to write up atomic TEST_ONLY
> > support for fbdev emulation I think we can continue to just ignore this
> > problem.
> >
> 
> Agreed. If that ends being a problem for people in practice then I guess
> someone can type atomic TEST_ONLY support for the fbdev emulation layer.
> 
> > What should not happen is that fbcon code blows up drawing out of bounds
> > or something like that, resulting in a kernel crash. So from that pov I
> > think it's "safe" :-)
> 
> Great. Thanks a lot for your explanations.
> 
> > -Daniel
> 
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-05 Thread Jordan Justen
On 2023-04-05 00:45:24, Lionel Landwerlin wrote:
> On 04/04/2023 19:04, Yang, Fei wrote:
> >> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at 
> >> BO creation
> >>
> >> Just like the protected content uAPI, there is no way for userspace to tell
> >> this feature is available other than trying using it.
> >>
> >> Given the issues with protected content, is it not thing we could want to 
> >> add?
> > Sorry I'm not aware of the issues with protected content, could you 
> > elaborate?
> > There was a long discussion on teams uAPI channel, could you comment there 
> > if
> > any concerns?
> >
> 
> We wanted to have a getparam to detect protected support and were told 
> to detect it by trying to create a context with it.
> 

An extensions system where the detection mechanism is "just try it",
and assume it's not supported if it fails. ??

This seem likely to get more and more problematic as a detection
mechanism as more extensions are added.

> 
> Now it appears trying to create a protected context can block for 
> several seconds.
> 
> Since we have to report capabilities to the user even before it creates 
> protected contexts, any app is at risk of blocking.
> 

This failure path is not causing any re-thinking about using this as
the extension detection mechanism?

Doesn't the ioctl# + input-struct-size + u64-extension# identify the
extension such that the kernel could indicate if it is supported or
not. (Or, perhaps return an array of the supported extensions so the
umd doesn't have to potentially make many ioctls for each extension of
interest.)

-Jordan


[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/tc: demote a kernel-doc comment to a regular comment

2023-04-05 Thread Patchwork
== Series Details ==

Series: series starting with [1/2] drm/i915/tc: demote a kernel-doc comment to 
a regular comment
URL   : https://patchwork.freedesktop.org/series/116144/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12974 -> Patchwork_116144v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/index.html

Participating hosts (36 -> 34)
--

  Missing(2): fi-snb-2520m fi-pnv-d510 

Known issues


  Here are the changes found in Patchwork_116144v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@i915_selftest@live@gt_heartbeat:
- fi-apl-guc: [PASS][1] -> [DMESG-FAIL][2] ([i915#5334])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@mman:
- bat-rpls-1: [PASS][3] -> [TIMEOUT][4] ([i915#6794])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-rpls-1/igt@i915_selftest@l...@mman.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-rpls-1/igt@i915_selftest@l...@mman.html

  * igt@i915_suspend@basic-s3-without-i915:
- bat-dg2-8:  NOTRUN -> [SKIP][5] ([i915#6645])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-dg2-8/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
- bat-rpls-2: NOTRUN -> [SKIP][6] ([i915#7828])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-rpls-2/igt@kms_chamelium_...@common-hpd-after-suspend.html
- bat-dg2-8:  NOTRUN -> [SKIP][7] ([i915#7828])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-dg2-8/igt@kms_chamelium_...@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@read-crc:
- bat-adlp-9: NOTRUN -> [SKIP][8] ([i915#3546]) +1 similar issue
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-adlp-9/igt@kms_pipe_crc_ba...@read-crc.html

  * igt@kms_pipe_crc_basic@suspend-read-crc:
- bat-rpls-2: NOTRUN -> [SKIP][9] ([i915#1845])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-rpls-2/igt@kms_pipe_crc_ba...@suspend-read-crc.html

  
 Possible fixes 

  * igt@i915_pm_rps@basic-api:
- bat-dg2-11: [FAIL][10] ([i915#8308]) -> [PASS][11]
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-11/igt@i915_pm_...@basic-api.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-dg2-11/igt@i915_pm_...@basic-api.html

  * igt@i915_selftest@live@hangcheck:
- bat-dg2-8:  [ABORT][12] ([i915#7913] / [i915#7979]) -> [PASS][13]
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-8/igt@i915_selftest@l...@hangcheck.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-dg2-8/igt@i915_selftest@l...@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
- bat-rpls-2: [ABORT][14] ([i915#6687] / [i915#7978]) -> [PASS][15]
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-rpls-2/igt@i915_susp...@basic-s3-without-i915.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-rpls-2/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-c-dp-1:
- bat-dg2-8:  [FAIL][16] ([i915#7932]) -> [PASS][17]
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12974/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116144v1/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html

  
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#6794]: https://gitlab.freedesktop.org/drm/intel/issues/6794
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#7979]: https://gitlab.freedesktop.org/drm/intel/issues/7979
  [i915#8308]: https://gitlab.freedesktop.org/drm/intel/issues/8308


Build changes
-

  * Linux: CI_DRM_12974 -> Patchwork_116144v1

  CI-20190529: 20190529
  CI_DRM_12974: 3a48ece2386f032a86b2c25c0e059eb158aab17e @ 

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/2] drm/i915/tc: demote a kernel-doc comment to a regular comment

2023-04-05 Thread Patchwork
== Series Details ==

Series: series starting with [1/2] drm/i915/tc: demote a kernel-doc comment to 
a regular comment
URL   : https://patchwork.freedesktop.org/series/116144/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.




Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Rodrigo Vivi
On Wed, Apr 05, 2023 at 12:42:30PM -0700, Dixit, Ashutosh wrote:
> On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote:
> >
> 
> Hi Rodrigo,
> 
> > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> > > >
> > >
> > > Hi Vinay,
> > >
> > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct 
> > > > intel_guc_slpc *slpc, u32 val)
> > > > val > slpc->max_freq_softlimit)
> > > > return -EINVAL;
> > > >
> > > > +   /* Ignore efficient freq if lower min freq is requested */
> > > > +   ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < 
> > > > slpc->rp1_freq);
> > > > +   if (ret)
> > > > +   goto out;
> > > > +
> > >
> > > I don't agree with this. If we are now providing an interface explicitly 
> > > to
> > > ignore RPe, that should be /only/ way to ignore RPe. There should be no
> > > other "under the hood" ignoring of RPe. In other words, ignoring RPe 
> > > should
> > > be minimized unless explicitly requested.
> > >
> > > I don't clearly understand why this was done previously but it makes even
> > > less sense to me now after this patch.
> >
> > well, I had suggested this previously. And just because without this we 
> > would
> > be breaking API expectations.
> >
> > When user selects a minimal frequency it expect that to stick. But with the
> > efficient freq enabled in guc if minimal is less than the efficient one,
> > this request is likely ignored.
> >
> > Well, even worse is that we are actually caching the request in the soft 
> > values.
> > So we show a minimal, but the hardware without any workload is operating at
> > efficient.
> >
> > So, the thought process was: 'if user requested a very low minimal, we give 
> > them
> > the minimal requested, even if that means to disable the efficient freq.'
> 
> Hmm, I understand this even less now :)
> 
> * Why is RPe ignored when min < RPe? Since the freq can be between min and
>   max? Shouldn't the condition be min > RPe, that is turn RPe off if min
>   higher that RPe is requested?

that is not how guc efficient freq selection works. (unless my memory is
tricking me right now.)

So, if we select a min that is between RPe and RP0, guc will respect and
use the selected min. So we don't need to disable guc selection of the
efficient.

This is not true when we select a very low min like RPn. If we select RPn
as min and guc efficient freq selection is enabled guc will simply ignore
our request. So the only way to give the user what is asked, is to also
disable guc's efficient freq selection. (I probably confused you in the
previous email because I used 'RP0' when I meant 'RPn'. I hope it gets
clear now).

> 
> * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD?

Oh... yeap, this is an issue indeed. Specially with i915 where we have
the soft values cached instead of asking guc everytime.

That's a good point. The variance is not big, but we will hit corner cases.
One way is to keep checking and updating everytime a sysfs is touched.
Other way is do what you are suggesting and let's just accept and deal
with the reality that is: "we cannot guarantee a min freq selection if user
doesn't disable the efficient freq selection".

> 
> * Finally, we know that enabling RPe broke the kernel freq API because RPe
>   could go over max_freq. So it is actually the max freq which is not
>   obeyed after RPe is enabled.

Oh! so it was my bad memory indeed and everything is the other way around?
But I just looked to Xe code, my most recent memory, and I just needed
to toggle the efficient freq off on the case that I mentioned, when min
selection is below the efficient one. With that all the API expectation
that I coded in IGT works neatly.

> 
> So we ignore RPe in some select cases (which also I don't understand as
> mentioned above but maybe I am missing something) to claim that we are
> obeying the freq API, but let the freq API stay broken in other cases?

what cases it stays broken? This is why we need the IGT tests for all the
API behavior in place.

> 
> > So, that was introduced to avoid API breakage. Removing it now would mean
> > breaking API. (Not sure if the IGT tests for the API got merged already,
> > but think that as the API contract).
> 
> I think we should take this patch as an opportunity to fix this and give
> the user a clean interface to ignore RPe and remove this other implicit way
> to ignore RPe. All IGT changes are unmerged at present.

Yeap, the IGT needs to come with whatever we concluded here and we need to
stick with that afterwards, so let's think with care.

Vinay, Ashutosh's strongest argument is the variable RPe. Do you have thoughts
on that?

> 
> Thanks.
> --
> Ashutosh
> 
> 
> 
> >
> > But I do agree with you that having something selected from multiple places
> > also has the potential to cause some miss-expectations. So I was thinking
> > about multiple even orders where the user 

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 16:21:09 -0300
Jason Gunthorpe  wrote:

> On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> > Usability needs to be a consideration as well.  An interface where the
> > result is effectively arbitrary from a user perspective because the
> > kernel is solely focused on whether the operation is allowed,
> > evaluating constraints that the user is unaware of and cannot control,
> > is unusable.  
> 
> Considering this API is only invoked by qemu we might be overdoing
> this usability and 'no shoot in foot' view.

Ok, I'm not sure why we're diminishing the de facto vfio userspace...

> > > This is a good point that qemu needs to make a policy decision if it
> > > is happy about the VFIO configuration - but that is a policy decision
> > > that should not become entangled with the kernel's security checks.
> > > 
> > > Today qemu can make this policy choice the same way it does right now
> > > - call _INFO and check the group_ids. It gets the exact same outcome
> > > as today. We already discussed that we need to expose the group ID
> > > through an ioctl someplace.  
> > 
> > QEMU can make a policy decision today because the kernel provides a
> > sufficiently reliable interface, ie. based on the set of owned groups, a
> > hot-reset is all but guaranteed to work.
> 
> And we don't change that with cdev. If qemu wants to make the policy
> decision it keeps using the exact same _INFO interface to make that
> decision same it has always made.
> 
> We weaken the actual reset action to only consider the security side.
> 
> Applications that want this exclusive reset group policy simply must
> check it on their own. It is a reasonable API design.

I disagree, as I've argued before, the info ioctl becomes so weak and
effectively arbitrary from a user perspective at being able to predict
whether the hot-reset ioctl works that it becomes useless, diminishing
the entire hot-reset info/execute API.

> > > If this is too awkward we could add a query to the kernel if the cdev
> > > is "reset exclusive" - eg the iommufd covers all the groups that span
> > > the reset set.  
> > 
> > That's essentially what we have if there are valid dev-ids for each
> > affected device in the info ioctl.  
> 
> If you have dev-ids for everything, yes. If you don't, then you can't
> make the same policy choice using a dev-id interface.

Exactly, you can't make any policy choice because the success or
failure of the hot-reset ioctl can't be known.

> > I don't think it helps the user experience to create loopholes where
> > the hot-reset ioctl can still work in spite of those missing
> > devices.  
> 
> I disagree. The easy straightforward design is that the reset ioctl
> works if the process has security permissions. Mixing a policy check
> into the kernel on this path is creating complexity we don't really
> need.
> 
> I don't view it as a loophole, it is flexability to use the API in a
> way that is different from what qemu wants - eg an app like dpdk may
> be willing to tolerate a reset group that becomes unavailable after
> startup. Who knows, why should we force this in the kernel?

Because look at all the problems it's causing to try to introduce these
loopholes without also introducing subtle bugs.  There's an argument
that we're overly strict, which is better than the alternative, which
seems to be what we're dabbling with.  It is a straightforward
interface for the hot-reset ioctl to mirror the information provided
via the hot-reset info ioctl.

> > For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> > capability chains, we could add a capability that reports the group ID
> > for the device.
> 
> I was going to put that in an iommufd ioctl so it works with VDPA too,
> but sure, lets assume we can get the group ID from a cdev fd.
> 
> > The hot-reset info ioctl remains as it is today, reporting group-ids
> > and bdfs.  
> 
> Sure, but userspace still needs to know how to map the reset sets into
> dev-ids.

No, it doesn't. 

> Remember the reason we started doing this is because we don't
> have easy access to the BDF anymore.

We don't need it, the info ioctl provides the groups, the group
association can be learned from the DEVICE_GET_INFO ioctl, the
hot-reset ioctl only requires a single representative fd per affected
group.  dev-ids not required.

> I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> return the dev_ids.

I don't see any justification for this.  We could add another PCI
specific DEVICE_GET_INFO capability to report the bdf if we really need
it, but reporting the group seems sufficient for this use case.

> > The hot-reset ioctl itself is modified to transparently
> > support either group fds or device fds.  The user can now map cdevs
> > to group-ids and therefore follow the same rules as groups,
> > providing at least one representative device fd for each group.  
> 
> This looks like a very complex uapi compared to the empty list option,

Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Dixit, Ashutosh
On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote:
>

Hi Rodrigo,

> On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> > >
> >
> > Hi Vinay,
> >
> > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct 
> > > intel_guc_slpc *slpc, u32 val)
> > >   val > slpc->max_freq_softlimit)
> > >   return -EINVAL;
> > >
> > > + /* Ignore efficient freq if lower min freq is requested */
> > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < slpc->rp1_freq);
> > > + if (ret)
> > > + goto out;
> > > +
> >
> > I don't agree with this. If we are now providing an interface explicitly to
> > ignore RPe, that should be /only/ way to ignore RPe. There should be no
> > other "under the hood" ignoring of RPe. In other words, ignoring RPe should
> > be minimized unless explicitly requested.
> >
> > I don't clearly understand why this was done previously but it makes even
> > less sense to me now after this patch.
>
> well, I had suggested this previously. And just because without this we would
> be breaking API expectations.
>
> When user selects a minimal frequency it expect that to stick. But with the
> efficient freq enabled in guc if minimal is less than the efficient one,
> this request is likely ignored.
>
> Well, even worse is that we are actually caching the request in the soft 
> values.
> So we show a minimal, but the hardware without any workload is operating at
> efficient.
>
> So, the thought process was: 'if user requested a very low minimal, we give 
> them
> the minimal requested, even if that means to disable the efficient freq.'

Hmm, I understand this even less now :)

* Why is RPe ignored when min < RPe? Since the freq can be between min and
  max? Shouldn't the condition be min > RPe, that is turn RPe off if min
  higher that RPe is requested?

* Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD?

* Finally, we know that enabling RPe broke the kernel freq API because RPe
  could go over max_freq. So it is actually the max freq which is not
  obeyed after RPe is enabled.

So we ignore RPe in some select cases (which also I don't understand as
mentioned above but maybe I am missing something) to claim that we are
obeying the freq API, but let the freq API stay broken in other cases?

> So, that was introduced to avoid API breakage. Removing it now would mean
> breaking API. (Not sure if the IGT tests for the API got merged already,
> but think that as the API contract).

I think we should take this patch as an opportunity to fix this and give
the user a clean interface to ignore RPe and remove this other implicit way
to ignore RPe. All IGT changes are unmerged at present.

Thanks.
--
Ashutosh



>
> But I do agree with you that having something selected from multiple places
> also has the potential to cause some miss-expectations. So I was thinking
> about multiple even orders where the user select the RP0 as minimal, then
> enable the efficient or vice versa, but I couldn't think of a bad case.
> Or at least not as bad as the user asking to get RP0 as minimal and only
> getting RPe back.
>
> With this in mind, and having checked the code:
>
> Reviewed-by: Rodrigo Vivi 
>
> But I won't push this immediately because I'm still open to hear another
> side/angle.
>
> >
> > Thanks.
> > --
> > Ashutosh
> >
> >
> > >   /* Need a lock now since waitboost can be modifying min as well */
> > >   mutex_lock(>lock);
> > >   wakeref = intel_runtime_pm_get(>runtime_pm);
> > >
> > > - /* Ignore efficient freq if lower min freq is requested */
> > > - ret = slpc_set_param(slpc,
> > > -  SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
> > > -  val < slpc->rp1_freq);
> > > - if (ret) {
> > > - guc_probe_error(slpc_to_guc(slpc), "Failed to toggle efficient 
> > > freq: %pe\n",
> > > - ERR_PTR(ret));
> > > - goto out;
> > > - }
> > > -
> > >   ret = slpc_set_param(slpc,
> > >SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
> > >val);


[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/display: Increase AUX timeout for Type-C (rev2)

2023-04-05 Thread Patchwork
== Series Details ==

Series: drm/i915/display: Increase AUX timeout for Type-C (rev2)
URL   : https://patchwork.freedesktop.org/series/116010/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12973 -> Patchwork_116010v2


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/index.html

Participating hosts (36 -> 35)
--

  Missing(1): fi-snb-2520m 

Known issues


  Here are the changes found in Patchwork_116010v2 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_suspend@basic-s3@smem:
- bat-rpls-1: NOTRUN -> [ABORT][1] ([i915#6687] / [i915#7978])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/bat-rpls-1/igt@gem_exec_suspend@basic...@smem.html

  * igt@i915_selftest@live@reset:
- bat-rpls-2: [PASS][2] -> [ABORT][3] ([i915#4983] / [i915#7913] / 
[i915#7981])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-rpls-2/igt@i915_selftest@l...@reset.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/bat-rpls-2/igt@i915_selftest@l...@reset.html

  * igt@i915_selftest@live@slpc:
- bat-rpls-1: NOTRUN -> [DMESG-FAIL][4] ([i915#6367])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/bat-rpls-1/igt@i915_selftest@l...@slpc.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
- fi-cfl-8700k:   NOTRUN -> [SKIP][5] ([fdo#109271])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/fi-cfl-8700k/igt@kms_chamelium_...@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence:
- bat-dg2-11: NOTRUN -> [SKIP][6] ([i915#5354])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/bat-dg2-11/igt@kms_pipe_crc_ba...@nonblocking-crc-frame-sequence.html

  
 Possible fixes 

  * igt@i915_selftest@live@gt_heartbeat:
- fi-apl-guc: [DMESG-FAIL][7] ([i915#5334]) -> [PASS][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@reset:
- bat-rpls-1: [ABORT][9] ([i915#4983]) -> [PASS][10]
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-rpls-1/igt@i915_selftest@l...@reset.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/bat-rpls-1/igt@i915_selftest@l...@reset.html

  * igt@i915_suspend@basic-s2idle-without-i915:
- fi-cfl-8700k:   [ABORT][11] ([i915#8213] / [i915#8299]) -> [PASS][12]
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/fi-cfl-8700k/igt@i915_susp...@basic-s2idle-without-i915.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/fi-cfl-8700k/igt@i915_susp...@basic-s2idle-without-i915.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#7981]: https://gitlab.freedesktop.org/drm/intel/issues/7981
  [i915#8213]: https://gitlab.freedesktop.org/drm/intel/issues/8213
  [i915#8299]: https://gitlab.freedesktop.org/drm/intel/issues/8299


Build changes
-

  * Linux: CI_DRM_12973 -> Patchwork_116010v2

  CI-20190529: 20190529
  CI_DRM_12973: 152344c378a9b634ea6f2424f038e365ad2894f8 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7240: ef4550e3b7d3c11ba257006bc7d4f8e421667d46 @ 
https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_116010v2: 152344c378a9b634ea6f2424f038e365ad2894f8 @ 
git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

522edf922e66 drm/i915/display: Increase AUX timeout for Type-C

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116010v2/index.html


Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Jason Gunthorpe
On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> Usability needs to be a consideration as well.  An interface where the
> result is effectively arbitrary from a user perspective because the
> kernel is solely focused on whether the operation is allowed,
> evaluating constraints that the user is unaware of and cannot control,
> is unusable.

Considering this API is only invoked by qemu we might be overdoing
this usability and 'no shoot in foot' view.

> > This is a good point that qemu needs to make a policy decision if it
> > is happy about the VFIO configuration - but that is a policy decision
> > that should not become entangled with the kernel's security checks.
> > 
> > Today qemu can make this policy choice the same way it does right now
> > - call _INFO and check the group_ids. It gets the exact same outcome
> > as today. We already discussed that we need to expose the group ID
> > through an ioctl someplace.
> 
> QEMU can make a policy decision today because the kernel provides a
> sufficiently reliable interface, ie. based on the set of owned groups, a
> hot-reset is all but guaranteed to work.  

And we don't change that with cdev. If qemu wants to make the policy
decision it keeps using the exact same _INFO interface to make that
decision same it has always made.

We weaken the actual reset action to only consider the security side.

Applications that want this exclusive reset group policy simply must
check it on their own. It is a reasonable API design.

> > If this is too awkward we could add a query to the kernel if the cdev
> > is "reset exclusive" - eg the iommufd covers all the groups that span
> > the reset set.
> 
> That's essentially what we have if there are valid dev-ids for each
> affected device in the info ioctl.

If you have dev-ids for everything, yes. If you don't, then you can't
make the same policy choice using a dev-id interface.

> I don't think it helps the user experience to create loopholes where
> the hot-reset ioctl can still work in spite of those missing
> devices.

I disagree. The easy straightforward design is that the reset ioctl
works if the process has security permissions. Mixing a policy check
into the kernel on this path is creating complexity we don't really
need.

I don't view it as a loophole, it is flexability to use the API in a
way that is different from what qemu wants - eg an app like dpdk may
be willing to tolerate a reset group that becomes unavailable after
startup. Who knows, why should we force this in the kernel?

> For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> capability chains, we could add a capability that reports the group ID
> for the device.  

I was going to put that in an iommufd ioctl so it works with VDPA too,
but sure, lets assume we can get the group ID from a cdev fd.

> The hot-reset info ioctl remains as it is today, reporting group-ids
> and bdfs.

Sure, but userspace still needs to know how to map the reset sets into
dev-ids. Remember the reason we started doing this is because we don't
have easy access to the BDF anymore.

I like leaving this ioctl alone, lets go back to a dedicated ioctl to
return the dev_ids.

> The hot-reset ioctl itself is modified to transparently
> support either group fds or device fds.  The user can now map cdevs
> to group-ids and therefore follow the same rules as groups,
> providing at least one representative device fd for each group.

This looks like a very complex uapi compared to the empty list option,
but it seems like it would work.

Jason


Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 12:56:21 -0600
Alex Williamson  wrote:

> On Wed, 5 Apr 2023 14:23:43 -0300
> Jason Gunthorpe  wrote:
> 
> > On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:  
> > > On Wed, 5 Apr 2023 13:37:05 -0300
> > > Jason Gunthorpe  wrote:
> > > 
> > > > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> > > > 
> > > > > But that kind of brings to light the question of what does the user do
> > > > > when they encounter this situation.  
> > > > 
> > > > What does it do now when it encounters a group_id it doesn't
> > > > understand? Userspace already doesn't know if the foreign group is
> > > > open or not, right?
> > > 
> > > It's simple, there is currently no screwiness around opened devices.
> > > If the caller doesn't own all the groups mapping to the affected
> > > devices, hot-reset is not available.
> > 
> > That still has nasty edge cases. If the reset group spans beyond a
> > single iommu group you end up with qemu being unable to operate reset
> > at all, and it is unfixable from an API perspective as we can't pass
> > in groups that VFIO isn't going to use.  
> 
> Hmm, s/nasty/niche/?  Yes, QEMU currently has no way to own a group
> without assigning a device from the group, but technically that could
> be fixed within QEMU.  If QEMU doesn't own that affected group, then it
> can't very well count on that group to not be used in some other way
> when it comes time to actually do a hot-reset.
>  
> > I think you are right, the fact we'd have to return -1 dev_ids to this
> > modified API is pretty damaging, it doesn't seem like a good
> > direction.
> >   
> > > This leads to scenarios where the info ioctl indicates a hot-reset is
> > > initially available, perhaps only because one of the affected devices
> > > was not opened at the time, and now it fails when QEMU actually tries
> > > to use it.
> > 
> > I would like it if the APIs toward the kernel were only about the
> > kernel's security apparatus. It is makes it easier to reason about the
> > kernel side and gives nice simple well defined APIs.  
> 
> Usability needs to be a consideration as well.  An interface where the
> result is effectively arbitrary from a user perspective because the
> kernel is solely focused on whether the operation is allowed,
> evaluating constraints that the user is unaware of and cannot control,
> is unusable.
> 
> > This is a good point that qemu needs to make a policy decision if it
> > is happy about the VFIO configuration - but that is a policy decision
> > that should not become entangled with the kernel's security checks.
> > 
> > Today qemu can make this policy choice the same way it does right now
> > - call _INFO and check the group_ids. It gets the exact same outcome
> > as today. We already discussed that we need to expose the group ID
> > through an ioctl someplace.  
> 
> QEMU can make a policy decision today because the kernel provides a
> sufficiently reliable interface, ie. based on the set of owned groups, a
> hot-reset is all but guaranteed to work.  If we focus only on whether a
> given reset is allowed from a kernel perspective and ignore that
> userspace needs some predictability of the kernel behavior, then QEMU
> cannot reasonable make that policy decision.
> 
> > If this is too awkward we could add a query to the kernel if the cdev
> > is "reset exclusive" - eg the iommufd covers all the groups that span
> > the reset set.  
> 
> That's essentially what we have if there are valid dev-ids for each
> affected device in the info ioctl.  I don't think it helps the user
> experience to create loopholes where the hot-reset ioctl can still work
> in spite of those missing devices.  The group interface uses the fact
> that ownership of the group implies ownership of all devices within the
> group such that the user only needs to prove group ownership.
> 
> But we still have underlying groups even with the cdev model, with the
> same ownership principles, so don't we just need to prove group
> ownership based on a device fd rather than a group fd?
> 
> For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> capability chains, we could add a capability that reports the group ID
> for the device.  The hot-reset info ioctl remains as it is today,
> reporting group-ids and bdfs.  The hot-reset ioctl itself is modified to
> transparently support either group fds or device fds.  The user can now
> map cdevs to group-ids and therefore follow the same rules as groups,
> providing at least one representative device fd for each group.  We've
> essentially already enabled this by allowing the limit of user provided
> fds equal to the number of affected devices.

If I'm not mistaken, I think this resolves cdev no-iommu to work
equivalently to groups as well.  Thanks,

Alex



Re: [Intel-gfx] [PULL] drm-misc-fixes

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 08:28:55PM +0200, Thomas Zimmermann wrote:
> Hi Dave and Daniel,
> 
> here's this week's PR for drm-misc-fixes. As requested, it comes
> a day earlier than usual due to Easter holidays.
> 
> Best regards
> Thomas
> 
> drm-misc-fixes-2023-04-05:
> Short summary of fixes pull:
> 
>  * ivpu: DMA fence and suspend fixes
>  * nouveau: Color-depth fixes
>  * panfrost: Fix mmap error handling
> The following changes since commit 25bbe844ef5c4fb4d7d8dcaa0080f922b7cd3a16:
> 
>   drm: test: Fix 32-bit issue in drm_buddy_test (2023-03-29 17:14:15 +0200)
> 
> are available in the Git repository at:
> 
>   git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2023-04-05
> 
> for you to fetch changes up to 0ec8671837a61d841462179686c5819d951d3b10:
> 
>   accel/ivpu: Fix S3 system suspend when not idle (2023-04-05 09:07:26 +0200)

Pulled, thanks.

> 
> 
> Short summary of fixes pull:
> 
>  * ivpu: DMA fence and suspend fixes
>  * nouveau: Color-depth fixes
>  * panfrost: Fix mmap error handling
> 
> 
> Boris Brezillon (1):
>   drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
> 
> Jacek Lawrynowicz (1):
>   accel/ivpu: Fix S3 system suspend when not idle
> 
> Karol Herbst (1):
>   drm/nouveau/disp: Support more modes by checking with lower bpc
> 
> Karol Wachowski (1):
>   accel/ivpu: Add dma fence to command buffers only
> 
>  drivers/accel/ivpu/ivpu_job.c   | 18 +++---
>  drivers/accel/ivpu/ivpu_pm.c| 26 +++---
>  drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 
>  drivers/gpu/drm/nouveau/nouveau_dp.c|  8 +---
>  drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
>  5 files changed, 56 insertions(+), 29 deletions(-)
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 14:23:43 -0300
Jason Gunthorpe  wrote:

> On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:
> > On Wed, 5 Apr 2023 13:37:05 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> > >   
> > > > But that kind of brings to light the question of what does the user do
> > > > when they encounter this situation.
> > > 
> > > What does it do now when it encounters a group_id it doesn't
> > > understand? Userspace already doesn't know if the foreign group is
> > > open or not, right?  
> > 
> > It's simple, there is currently no screwiness around opened devices.
> > If the caller doesn't own all the groups mapping to the affected
> > devices, hot-reset is not available.  
> 
> That still has nasty edge cases. If the reset group spans beyond a
> single iommu group you end up with qemu being unable to operate reset
> at all, and it is unfixable from an API perspective as we can't pass
> in groups that VFIO isn't going to use.

Hmm, s/nasty/niche/?  Yes, QEMU currently has no way to own a group
without assigning a device from the group, but technically that could
be fixed within QEMU.  If QEMU doesn't own that affected group, then it
can't very well count on that group to not be used in some other way
when it comes time to actually do a hot-reset.
 
> I think you are right, the fact we'd have to return -1 dev_ids to this
> modified API is pretty damaging, it doesn't seem like a good
> direction.
> 
> > This leads to scenarios where the info ioctl indicates a hot-reset is
> > initially available, perhaps only because one of the affected devices
> > was not opened at the time, and now it fails when QEMU actually tries
> > to use it.  
> 
> I would like it if the APIs toward the kernel were only about the
> kernel's security apparatus. It is makes it easier to reason about the
> kernel side and gives nice simple well defined APIs.

Usability needs to be a consideration as well.  An interface where the
result is effectively arbitrary from a user perspective because the
kernel is solely focused on whether the operation is allowed,
evaluating constraints that the user is unaware of and cannot control,
is unusable.

> This is a good point that qemu needs to make a policy decision if it
> is happy about the VFIO configuration - but that is a policy decision
> that should not become entangled with the kernel's security checks.
> 
> Today qemu can make this policy choice the same way it does right now
> - call _INFO and check the group_ids. It gets the exact same outcome
> as today. We already discussed that we need to expose the group ID
> through an ioctl someplace.

QEMU can make a policy decision today because the kernel provides a
sufficiently reliable interface, ie. based on the set of owned groups, a
hot-reset is all but guaranteed to work.  If we focus only on whether a
given reset is allowed from a kernel perspective and ignore that
userspace needs some predictability of the kernel behavior, then QEMU
cannot reasonable make that policy decision.

> If this is too awkward we could add a query to the kernel if the cdev
> is "reset exclusive" - eg the iommufd covers all the groups that span
> the reset set.

That's essentially what we have if there are valid dev-ids for each
affected device in the info ioctl.  I don't think it helps the user
experience to create loopholes where the hot-reset ioctl can still work
in spite of those missing devices.  The group interface uses the fact
that ownership of the group implies ownership of all devices within the
group such that the user only needs to prove group ownership.

But we still have underlying groups even with the cdev model, with the
same ownership principles, so don't we just need to prove group
ownership based on a device fd rather than a group fd?

For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
capability chains, we could add a capability that reports the group ID
for the device.  The hot-reset info ioctl remains as it is today,
reporting group-ids and bdfs.  The hot-reset ioctl itself is modified to
transparently support either group fds or device fds.  The user can now
map cdevs to group-ids and therefore follow the same rules as groups,
providing at least one representative device fd for each group.  We've
essentially already enabled this by allowing the limit of user provided
fds equal to the number of affected devices.

Does that work?  Thanks,

Alex



[Intel-gfx] ✓ Fi.CI.BAT: success for Add hwmon support for dgfx selftests

2023-04-05 Thread Patchwork
== Series Details ==

Series: Add hwmon support for dgfx selftests
URL   : https://patchwork.freedesktop.org/series/116136/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12973 -> Patchwork_116136v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/index.html

Participating hosts (36 -> 32)
--

  Missing(4): fi-glk-j4005 fi-pnv-d510 bat-adlp-6 fi-snb-2520m 

Known issues


  Here are the changes found in Patchwork_116136v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_suspend@basic-s3@smem:
- bat-rpls-1: NOTRUN -> [ABORT][1] ([i915#6687] / [i915#7978])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-rpls-1/igt@gem_exec_suspend@basic...@smem.html

  * igt@i915_selftest@live@reset:
- bat-rpls-2: [PASS][2] -> [ABORT][3] ([i915#4983] / [i915#7913])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-rpls-2/igt@i915_selftest@l...@reset.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-rpls-2/igt@i915_selftest@l...@reset.html

  * igt@i915_selftest@live@slpc:
- bat-dg2-9:  [PASS][4] -> [DMESG-FAIL][5] ([i915#7913])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-dg2-9/igt@i915_selftest@l...@slpc.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-dg2-9/igt@i915_selftest@l...@slpc.html
- bat-dg2-11: [PASS][6] -> [DMESG-FAIL][7] ([i915#7913])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-dg2-11/igt@i915_selftest@l...@slpc.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-dg2-11/igt@i915_selftest@l...@slpc.html
- bat-dg2-8:  [PASS][8] -> [DMESG-FAIL][9] ([i915#7913])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-dg2-8/igt@i915_selftest@l...@slpc.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-dg2-8/igt@i915_selftest@l...@slpc.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
- fi-cfl-8700k:   NOTRUN -> [SKIP][10] ([fdo#109271])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/fi-cfl-8700k/igt@kms_chamelium_...@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence:
- bat-dg2-11: NOTRUN -> [SKIP][11] ([i915#5354])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-dg2-11/igt@kms_pipe_crc_ba...@nonblocking-crc-frame-sequence.html

  
 Possible fixes 

  * igt@i915_selftest@live@gt_heartbeat:
- fi-apl-guc: [DMESG-FAIL][12] ([i915#5334]) -> [PASS][13]
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@reset:
- bat-rpls-1: [ABORT][14] ([i915#4983]) -> [PASS][15]
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-rpls-1/igt@i915_selftest@l...@reset.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-rpls-1/igt@i915_selftest@l...@reset.html

  * igt@i915_suspend@basic-s2idle-without-i915:
- fi-cfl-8700k:   [ABORT][16] ([i915#8213] / [i915#8299]) -> [PASS][17]
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/fi-cfl-8700k/igt@i915_susp...@basic-s2idle-without-i915.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/fi-cfl-8700k/igt@i915_susp...@basic-s2idle-without-i915.html

  * igt@kms_pipe_crc_basic@nonblocking-crc@pipe-c-dp-1:
- bat-dg2-8:  [FAIL][18] ([i915#7932]) -> [PASS][19]
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12973/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-...@pipe-c-dp-1.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116136v1/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-...@pipe-c-dp-1.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#8213]: https://gitlab.freedesktop.org/drm/intel/issues/8213
  [i915#8299]: https://gitlab.freedesktop.org/drm/intel/issues/8299


Build changes
-

  * Linux: CI_DRM_12973 -> Patchwork_116136v1

  CI-20190529: 20190529
  CI_DRM_12973: 152344c378a9b634ea6f2424f038e365ad2894f8 

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add hwmon support for dgfx selftests

2023-04-05 Thread Patchwork
== Series Details ==

Series: Add hwmon support for dgfx selftests
URL   : https://patchwork.freedesktop.org/series/116136/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:237:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:237:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:239:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:239:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:100:17: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:100:17: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:100:23: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:100:23: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:100:9: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:100:9: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:105:1: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:105:1: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:109:9: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:109:9: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:111:10: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:111:10: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:111:14: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:111:14: warning: unreplaced 
symbol 'old'

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add hwmon support for dgfx selftests

2023-04-05 Thread Patchwork
== Series Details ==

Series: Add hwmon support for dgfx selftests
URL   : https://patchwork.freedesktop.org/series/116136/
State : warning

== Summary ==

Error: dim checkpatch failed
083f9b0b94c1 drm/i915/selftests: Rename librapl library to libpower
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in 
from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:125: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#125: 
rename from drivers/gpu/drm/i915/selftests/librapl.c

total: 0 errors, 1 warnings, 0 checks, 127 lines checked
fc1cf0619833 drm/i915/hwmon: Add helper function to obtain energy values
209385ddb0c7 drm/i915/selftests: Add hwmon support in libpower for dgfx
079196ddb2bd drm/i915/selftests: skip comparison of power for discrete graphics




[Intel-gfx] [PULL] drm-misc-fixes

2023-04-05 Thread Thomas Zimmermann
Hi Dave and Daniel,

here's this week's PR for drm-misc-fixes. As requested, it comes
a day earlier than usual due to Easter holidays.

Best regards
Thomas

drm-misc-fixes-2023-04-05:
Short summary of fixes pull:

 * ivpu: DMA fence and suspend fixes
 * nouveau: Color-depth fixes
 * panfrost: Fix mmap error handling
The following changes since commit 25bbe844ef5c4fb4d7d8dcaa0080f922b7cd3a16:

  drm: test: Fix 32-bit issue in drm_buddy_test (2023-03-29 17:14:15 +0200)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2023-04-05

for you to fetch changes up to 0ec8671837a61d841462179686c5819d951d3b10:

  accel/ivpu: Fix S3 system suspend when not idle (2023-04-05 09:07:26 +0200)


Short summary of fixes pull:

 * ivpu: DMA fence and suspend fixes
 * nouveau: Color-depth fixes
 * panfrost: Fix mmap error handling


Boris Brezillon (1):
  drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path

Jacek Lawrynowicz (1):
  accel/ivpu: Fix S3 system suspend when not idle

Karol Herbst (1):
  drm/nouveau/disp: Support more modes by checking with lower bpc

Karol Wachowski (1):
  accel/ivpu: Add dma fence to command buffers only

 drivers/accel/ivpu/ivpu_job.c   | 18 +++---
 drivers/accel/ivpu/ivpu_pm.c| 26 +++---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 
 drivers/gpu/drm/nouveau/nouveau_dp.c|  8 +---
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 5 files changed, 56 insertions(+), 29 deletions(-)

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Re: [Intel-gfx] [PATCH v9 04/25] vfio: Accept vfio device file in the KVM facing kAPI

2023-04-05 Thread Eric Auger
Hi Yi,

On 4/1/23 17:18, Yi Liu wrote:
> This makes the vfio file kAPIs to accept vfio device files, also a
> preparation for vfio device cdev support.
>
> For the kvm set with vfio device file, kvm pointer is stored in struct
> vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
> pointer usage within VFIO. This kvm pointer will be set to vfio_device
> after device file is bound to iommufd in the cdev path.
>
> Reviewed-by: Kevin Tian 
> Reviewed-by: Jason Gunthorpe 
> Tested-by: Terrence Xu 
> Tested-by: Nicolin Chen 
> Tested-by: Matthew Rosato 
> Tested-by: Yanting Jiang 
> Signed-off-by: Yi Liu 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/vfio/vfio.h  |  2 ++
>  drivers/vfio/vfio_main.c | 18 ++
>  2 files changed, 20 insertions(+)
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 56ad127ac618..e4672d91a6f7 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,8 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>   struct vfio_device *device;
> + spinlock_t kvm_ref_lock; /* protect kvm field */
> + struct kvm *kvm;
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 748bde4d74d9..cb543791b28b 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -414,6 +414,7 @@ vfio_allocate_device_file(struct vfio_device *device)
>   return ERR_PTR(-ENOMEM);
>  
>   df->device = device;
> + spin_lock_init(>kvm_ref_lock);
>  
>   return df;
>  }
> @@ -1246,6 +1247,20 @@ bool vfio_file_enforced_coherent(struct file *file)
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
>  
> +static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
> +{
> + struct vfio_device_file *df = file->private_data;
> +
> + /*
> +  * The kvm is first recorded in the vfio_device_file, and will
> +  * be propagated to vfio_device::kvm when the file is bound to
> +  * iommufd successfully in the vfio device cdev path.
> +  */
> + spin_lock(>kvm_ref_lock);
> + df->kvm = kvm;
> + spin_unlock(>kvm_ref_lock);
> +}
> +
>  /**
>   * vfio_file_set_kvm - Link a kvm with VFIO drivers
>   * @file: VFIO group file or VFIO device file
> @@ -1259,6 +1274,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm 
> *kvm)
>   group = vfio_group_from_file(file);
>   if (group)
>   vfio_group_set_kvm(group, kvm);
> +
> + if (vfio_device_from_file(file))
> + vfio_device_file_set_kvm(file, kvm);
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
>  



Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Eric Auger



On 4/5/23 18:25, Alex Williamson wrote:
> On Wed, 5 Apr 2023 14:04:51 +
> "Liu, Yi L"  wrote:
>
>> Hi Eric,
>>
>>> From: Eric Auger 
>>> Sent: Wednesday, April 5, 2023 8:20 PM
>>>
>>> Hi Yi,
>>> On 4/1/23 16:44, Yi Liu wrote:  
 for the users that accept device fds passed from management stacks to be
 able to figure out the host reset affected devices among the devices
 opened by the user. This is needed as such users do not have BDF (bus,
 devfn) knowledge about the devices it has opened, hence unable to use
 the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
 to figure out the affected devices.

 Signed-off-by: Yi Liu 
 ---
  drivers/vfio/pci/vfio_pci_core.c | 58 
  include/uapi/linux/vfio.h| 24 -
  2 files changed, 74 insertions(+), 8 deletions(-)

 diff --git a/drivers/vfio/pci/vfio_pci_core.c 
 b/drivers/vfio/pci/vfio_pci_core.c
 index 19f5b075d70a..a5a7e148dce1 100644
 --- a/drivers/vfio/pci/vfio_pci_core.c
 +++ b/drivers/vfio/pci/vfio_pci_core.c
 @@ -30,6 +30,7 @@
  #if IS_ENABLED(CONFIG_EEH)
  #include 
  #endif
 +#include 

  #include "vfio_pci_priv.h"

 @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct  
>>> vfio_pci_core_device *vdev, int irq_typ  
return 0;
  }

 +static struct vfio_device *
 +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
 + struct pci_dev *pdev)
 +{
 +  struct vfio_device *cur;
 +
 +  lockdep_assert_held(_set->lock);
 +
 +  list_for_each_entry(cur, _set->device_list, dev_set_list)
 +  if (cur->dev == >dev)
 +  return cur;
 +  return NULL;
 +}
 +
  static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
  {
(*(int *)data)++;
 @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, 
 void  
>>> *data)  
  struct vfio_pci_fill_info {
int max;
int cur;
 +  bool require_devid;
 +  struct iommufd_ctx *iommufd;
 +  struct vfio_device_set *dev_set;
struct vfio_pci_dependent_device *devices;
  };

  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
  {
struct vfio_pci_fill_info *fill = data;
 +  struct vfio_device_set *dev_set = fill->dev_set;
struct iommu_group *iommu_group;
 +  struct vfio_device *vdev;
 +
 +  lockdep_assert_held(_set->lock);

if (fill->cur == fill->max)
return -EAGAIN; /* Something changed, try again */
 @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
 void  
>>> *data)  
if (!iommu_group)
return -EPERM; /* Cannot reset non-isolated devices */

 -  fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
 +  if (fill->require_devid) {
 +  /*
 +   * Report dev_id of the devices that are opened as cdev
 +   * and have the same iommufd with the fill->iommufd.
 +   * Otherwise, just fill IOMMUFD_INVALID_ID.
 +   */
 +  vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
 +  if (vdev && vfio_device_cdev_opened(vdev) &&
 +  fill->iommufd == vfio_iommufd_physical_ictx(vdev))
 +  vfio_iommufd_physical_devid(vdev, >devices[fill-
 cur].dev_id);
 +  else
 +  fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
 +  } else {
 +  fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
 +  }
fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
fill->devices[fill->cur].bus = pdev->bus->number;
fill->devices[fill->cur].devfn = pdev->devfn;
 @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
return -ENOMEM;

fill.devices = devices;
 +  fill.dev_set = vdev->vdev.dev_set;

 +  mutex_lock(>vdev.dev_set->lock);
 +  if (vfio_device_cdev_opened(>vdev)) {
 +  fill.require_devid = true;
 +  fill.iommufd = vfio_iommufd_physical_ictx(>vdev);
 +  }
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
, slot);
 +  mutex_unlock(>vdev.dev_set->lock);

/*
 * If a device was removed between counting and filling, we may come up
 * short of fill.max.  If a device was added, we'll have a return of
 * -EAGAIN above.
 */
 -  if (!ret)
 +  if (!ret) {
hdr.count = fill.cur;
 +  if (fill.require_devid)
 +  hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
 +  }

  reset_info_exit:
if (copy_to_user(arg, , minsz))
 

Re: [Intel-gfx] [PATCH i-g-t 8/8] gputop: Basic vendor agnostic GPU top tool

2023-04-05 Thread Rob Clark
On Tue, Jan 31, 2023 at 3:33 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Rudimentary vendor agnostic example of how lib_igt_drm_clients can be used
> to display a sorted by card and usage list of processes using GPUs.
>
> Borrows a bit of code from intel_gpu_top but for now omits the fancy
> features like interactive functionality, card selection, client
> aggregation, sort modes, JSON output  and pretty engine names. Also no
> support for global GPU or system metrics.
>
> On the other hand it shows clients from all DRM cards which
> intel_gpu_top does not do.
>
> Signed-off-by: Tvrtko Ursulin 
> Cc: Rob Clark 
> Cc: Christian König 
> Acked-by: Christian König 

Reviewed-by: Rob Clark 

> ---
>  tools/gputop.c| 260 ++
>  tools/meson.build |   5 +
>  2 files changed, 265 insertions(+)
>  create mode 100644 tools/gputop.c
>
> diff --git a/tools/gputop.c b/tools/gputop.c
> new file mode 100644
> index ..d259cac1ab17
> --- /dev/null
> +++ b/tools/gputop.c
> @@ -0,0 +1,260 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "igt_drm_clients.h"
> +#include "igt_drm_fdinfo.h"
> +
> +static const char *bars[] = { " ", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█" };
> +
> +static void n_spaces(const unsigned int n)
> +{
> +   unsigned int i;
> +
> +   for (i = 0; i < n; i++)
> +   putchar(' ');
> +}
> +
> +static void print_percentage_bar(double percent, int max_len)
> +{
> +   int bar_len, i, len = max_len - 2;
> +   const int w = 8;
> +
> +   assert(max_len > 0);
> +
> +   bar_len = ceil(w * percent * len / 100.0);
> +   if (bar_len > w * len)
> +   bar_len = w * len;
> +
> +   putchar('|');
> +
> +   for (i = bar_len; i >= w; i -= w)
> +   printf("%s", bars[w]);
> +   if (i)
> +   printf("%s", bars[i]);
> +
> +   len -= (bar_len + (w - 1)) / w;
> +   n_spaces(len);
> +
> +   putchar('|');
> +}
> +
> +static int
> +print_client_header(struct igt_drm_client *c, int lines, int con_w, int 
> con_h,
> +   int *engine_w)
> +{
> +   const char *pidname = "PID   NAME ";
> +   int ret, len = strlen(pidname);
> +
> +   if (lines++ >= con_h || len >= con_w)
> +   return lines;
> +   printf("\033[7m");
> +   ret = printf("DRM minor %u", c->drm_minor);
> +   n_spaces(con_w - ret);
> +
> +   if (lines++ >= con_h)
> +   return lines;
> +   printf("\n%s", pidname);
> +
> +   if (c->engines->num_engines) {
> +   unsigned int i;
> +   int width;
> +
> +   *engine_w = width = (con_w - len) / c->engines->num_engines;
> +
> +   for (i = 0; i <= c->engines->max_engine_id; i++) {
> +   const char *name = c->engines->names[i];
> +   int name_len = strlen(name);
> +   int pad = (width - name_len) / 2;
> +   int spaces = width - pad - name_len;
> +
> +   if (!name)
> +   continue;
> +
> +   if (pad < 0 || spaces < 0)
> +   continue;
> +
> +   n_spaces(pad);
> +   printf("%s", name);
> +   n_spaces(spaces);
> +   len += pad + name_len + spaces;
> +   }
> +   }
> +
> +   n_spaces(con_w - len);
> +   printf("\033[0m\n");
> +
> +   return lines;
> +}
> +
> +
> +static bool
> +newheader(const struct igt_drm_client *c, const struct igt_drm_client *pc)
> +{
> +   return !pc || c->drm_minor != pc->drm_minor;
> +}
> +
> +static int
> +print_client(struct igt_drm_client *c, struct igt_drm_client **prevc,
> +double t, int lines, int con_w, int con_h,
> +unsigned int period_us, int *engine_w)
> +{
> +   unsigned int i;
> +
> +   /* Filter out idle clients. */
> +   if (!c->total_runtime || c->samples < 2)
> +   return lines;
> +
> +   /* Print header when moving to a different DRM card. */
> +   if (newheader(c, *prevc)) {
> +   lines = print_client_header(c, lines, con_w, con_h, engine_w);
> +   if (lines >= con_h)
> +   return lines;
> +   }
> +
> +   *prevc = c;
> +
> +   printf("%8u %17s ", c->pid, c->print_name);
> +   lines++;
> +
> +   for (i = 0; c->samples > 1 && i <= c->engines->max_engine_id; i++) {
> +   double pct;
> +
> +   if (!c->engines->capacity[i])
> +   

Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Patrik Jakobsson
On Wed, Apr 5, 2023 at 7:15 PM Daniel Vetter  wrote:
>
> On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas
>  wrote:
> >
> > Daniel Vetter  writes:
> >
> > > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:
> >
> > [...]
> >
> > >> > > >/*
> > >> > > > * WARNING: Apparently we must kick fbdev drivers before 
> > >> > > > vgacon,
> > >> > > > * otherwise the vga fbdev driver falls over.
> > >> > > > */
> > >> > > >ret = vga_remove_vgacon(pdev);
> > >> >
> > >> > This isn't enough, we also nuke stuff that's mapping the vga fb range.
> >
> > Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) 
> > then.
> >
> > [...]
> >
> > >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
> > >> {
> > >>  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> > >>
> > >>  return vga_remove_vgacon(pdev);
> > >> }
> > >>
> > >> And that can be called from gma500 and the pci aperture helper.
> > >
> > > But you still pass a pci_dev to that helper. Which just doesn't make any
> > > sense to me (assuming your entire point is that this isn't just a normal
> > > pci device but some special legacy vga thing), but if we go with (void)
> > > then there's more refactoring to do because the vga_remove_vgacon also
> > > wants a pdev.
> > >
> > > All so that we don't call aperture_detach_devices() on a bunch of pci
> > > bars, which apparently is not problem for any other driver, but absolutely
> > > is a huge problem for gma500 somehow.
> > >
> > > I don't understand why.
> > >
> >
> > Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
> > is needed then starts to get a little silly. Maybe one option is to add a
> > 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
> > to iterate over PCI bars and call aperture_remove_conflicting_devices() ?
>
> The thing I don't get: Why does this matter for gma500 and not any of
> the other pci devices? Look at your gpu, realize there's a lot more
> than the one pci bar for vram or stolen memory, realize that we're
> nuking bars that cannot possible contain the framebuffer for everyone
> else too. Like the entire "gpus have a lot of bars" thing is the
> reason why I pulled the sysfb_disable one level up, because we've been
> doing that quite a few times before this patch (yes it's not the main
> thing, but the side-effect cleanup is why I've gone down this rabbit
> hole and wrote the entire series here):
>
> https://lore.kernel.org/dri-devel/20230404201842.567344-7-daniel.vet...@ffwll.ch/
>
> But somehow for gma500 it's a problem, while for everyone else it's
> fine. That's the part I dont get, or Thomas have been talking past
> each another and there's another issue that I'm missing.
> -Daniel

I'm also getting confused here.

AFAIK the stolen memory works the same for gma500 hardware as other
Intel GPUs. Are you saying that there is a difference in how gma500
hardware works? I always assumed that i915 got away with not dealing
much with stolen memory because it simply doesn't use it for
allocations. In gma500 we use it for fbdev and cursors. The actual
pages reserved by the bios can be accessed through a pci bar if you
map it so (which IIRC we do) but I suppose that doesn't help
identifying it as a range reserved by other drivers.

The reason I've kept the stolen allocation logic is because some
gma500 systems don't have a lot of memory. But that is mostly the old
Pouslbo systems. Perhaps it is time to ditch the stolen allocation
code?

-Patrik

>
> > > Consider this me throwing in the towel. If you are convinced this
> > > makes sense please type it up and merge it, but I'm not going to type
> > > something that just doesn't make sense to me.
> >
> > Honestly, I would just go with the double drm_aperture_remove_*() helper
> > calls (your original patch) unless that causes real issues. There is no
> > point on blocking all your series just for this IMO.
> >
> > Then latter if Thomas has strong opinions can send a follow-up patch for
> > the gma500 driver and the aperture helpers.
> >
> > > -Daniel
> > >
> >
> > --
> > Best regards,
> >
> > Javier Martinez Canillas
> > Core Platforms
> > Red Hat
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas
>  wrote:
>>
>> Daniel Vetter  writes:

[...]

>>
>> Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
>> is needed then starts to get a little silly. Maybe one option is to add a
>> 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
>> to iterate over PCI bars and call aperture_remove_conflicting_devices() ?
>
> The thing I don't get: Why does this matter for gma500 and not any of
> the other pci devices? Look at your gpu, realize there's a lot more

Yes, I don't know why gma500 is special in that sense but I'm not familiar
with that hardware to have an opinion on this.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Intel-gfx] [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote:
>> Daniel Vetter  writes:

[...]

>> >
>> > The __fill_var is after this. I'm honestly not sure what the exact
>> 
>> Ah, your patch adds it after that indeed. Please ignore my comment then.
>
> So rb: you?
>

Yes, I already provided it in my previous email and has been picked by
patchwork. I could do again but probably will confuse dim when applying.

The only patch from your series that is missing an {r,a}b is #1 right now:

https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both

[...]

>> > What I'm wondering now is whether too small x/yres won't lead to problems
>> > of some sorts ... For multi-screen we set the virtual size to be big
>> > enough for all crtc, and then just set x/yres to be the smallest output.
>> > That way fbcon knows to only draw as much as is visible on all screens.
>> > But if you then pan that too much, the bigger screens might not have a big
>> > enough buffer anymore and things fail (but shouldn't).
>> >
>> > Not sure how to fix that tbh.
>> 
>> Would this be a problem in practice?
>
> I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all
> outputs, but only if you have userspace doing this intentionally.
>
> In a way it's just another artifact of the drm fbdev emulation not using
> ATOMIC_TEST_ONLY in the various places where it should, and so doesn't
> really know whether a configuration change will work out.
>
> We already have this in obscure mulit-monitor cases where adding another
> screen kills fbcon, because the display hw is running out of fifo or
> clocks or whatever, and because the drm fbdev code doesn't check but just
> blindly commits the entire thing as an atomic commit, the overall commit
> fails.
>
> This worked "better" with legacy kms because there we commit per-crtc, so
> if any specific crtc runs into a limit check, only that one fails to light
> up.
>
> Imo given that no one cared enough yet to write up atomic TEST_ONLY
> support for fbdev emulation I think we can continue to just ignore this
> problem.
>

Agreed. If that ends being a problem for people in practice then I guess
someone can type atomic TEST_ONLY support for the fbdev emulation layer.

> What should not happen is that fbcon code blows up drawing out of bounds
> or something like that, resulting in a kernel crash. So from that pov I
> think it's "safe" :-)

Great. Thanks a lot for your explanations.

> -Daniel

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Jason Gunthorpe
On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:
> On Wed, 5 Apr 2023 13:37:05 -0300
> Jason Gunthorpe  wrote:
> 
> > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> > 
> > > But that kind of brings to light the question of what does the user do
> > > when they encounter this situation.  
> > 
> > What does it do now when it encounters a group_id it doesn't
> > understand? Userspace already doesn't know if the foreign group is
> > open or not, right?
> 
> It's simple, there is currently no screwiness around opened devices.
> If the caller doesn't own all the groups mapping to the affected
> devices, hot-reset is not available.

That still has nasty edge cases. If the reset group spans beyond a
single iommu group you end up with qemu being unable to operate reset
at all, and it is unfixable from an API perspective as we can't pass
in groups that VFIO isn't going to use.

I think you are right, the fact we'd have to return -1 dev_ids to this
modified API is pretty damaging, it doesn't seem like a good
direction.

> This leads to scenarios where the info ioctl indicates a hot-reset is
> initially available, perhaps only because one of the affected devices
> was not opened at the time, and now it fails when QEMU actually tries
> to use it.

I would like it if the APIs toward the kernel were only about the
kernel's security apparatus. It is makes it easier to reason about the
kernel side and gives nice simple well defined APIs.

This is a good point that qemu needs to make a policy decision if it
is happy about the VFIO configuration - but that is a policy decision
that should not become entangled with the kernel's security checks.

Today qemu can make this policy choice the same way it does right now
- call _INFO and check the group_ids. It gets the exact same outcome
as today. We already discussed that we need to expose the group ID
through an ioctl someplace.

If this is too awkward we could add a query to the kernel if the cdev
is "reset exclusive" - eg the iommufd covers all the groups that span
the reset set.

Jason


Re: [Intel-gfx] [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote:
> Daniel Vetter  writes:
> 
> [...]
> 
> >> 
> >> but only the 'var->xres > fb->width || var->yres > fb->height' from the
> >> conditions checked could be false after your __fill_var() call above.
> >> 
> >> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual >
> >> fb->width' and 'var->yres_virtual > fb->height' checks I believe since
> >> those will always be true.
> >
> > The __fill_var is after this. I'm honestly not sure what the exact
> 
> Ah, your patch adds it after that indeed. Please ignore my comment then.

So rb: you?

> > semantics are supposed to be, but essentially if userspace asks for too
> > big virtual size, we reject it. And for anything else we then tell it
> > (with __fill_var) how big the actually available space is.
> >
> > What I'm wondering now is whether too small x/yres won't lead to problems
> > of some sorts ... For multi-screen we set the virtual size to be big
> > enough for all crtc, and then just set x/yres to be the smallest output.
> > That way fbcon knows to only draw as much as is visible on all screens.
> > But if you then pan that too much, the bigger screens might not have a big
> > enough buffer anymore and things fail (but shouldn't).
> >
> > Not sure how to fix that tbh.
> 
> Would this be a problem in practice?

I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all
outputs, but only if you have userspace doing this intentionally.

In a way it's just another artifact of the drm fbdev emulation not using
ATOMIC_TEST_ONLY in the various places where it should, and so doesn't
really know whether a configuration change will work out.

We already have this in obscure mulit-monitor cases where adding another
screen kills fbcon, because the display hw is running out of fifo or
clocks or whatever, and because the drm fbdev code doesn't check but just
blindly commits the entire thing as an atomic commit, the overall commit
fails.

This worked "better" with legacy kms because there we commit per-crtc, so
if any specific crtc runs into a limit check, only that one fails to light
up.

Imo given that no one cared enough yet to write up atomic TEST_ONLY
support for fbdev emulation I think we can continue to just ignore this
problem.

What should not happen is that fbcon code blows up drawing out of bounds
or something like that, resulting in a kernel crash. So from that pov I
think it's "safe" :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Daniel Vetter
On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas
 wrote:
>
> Daniel Vetter  writes:
>
> > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:
>
> [...]
>
> >> > > >/*
> >> > > > * WARNING: Apparently we must kick fbdev drivers before 
> >> > > > vgacon,
> >> > > > * otherwise the vga fbdev driver falls over.
> >> > > > */
> >> > > >ret = vga_remove_vgacon(pdev);
> >> >
> >> > This isn't enough, we also nuke stuff that's mapping the vga fb range.
>
> Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) 
> then.
>
> [...]
>
> >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
> >> {
> >>  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> >>
> >>  return vga_remove_vgacon(pdev);
> >> }
> >>
> >> And that can be called from gma500 and the pci aperture helper.
> >
> > But you still pass a pci_dev to that helper. Which just doesn't make any
> > sense to me (assuming your entire point is that this isn't just a normal
> > pci device but some special legacy vga thing), but if we go with (void)
> > then there's more refactoring to do because the vga_remove_vgacon also
> > wants a pdev.
> >
> > All so that we don't call aperture_detach_devices() on a bunch of pci
> > bars, which apparently is not problem for any other driver, but absolutely
> > is a huge problem for gma500 somehow.
> >
> > I don't understand why.
> >
>
> Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
> is needed then starts to get a little silly. Maybe one option is to add a
> 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
> to iterate over PCI bars and call aperture_remove_conflicting_devices() ?

The thing I don't get: Why does this matter for gma500 and not any of
the other pci devices? Look at your gpu, realize there's a lot more
than the one pci bar for vram or stolen memory, realize that we're
nuking bars that cannot possible contain the framebuffer for everyone
else too. Like the entire "gpus have a lot of bars" thing is the
reason why I pulled the sysfb_disable one level up, because we've been
doing that quite a few times before this patch (yes it's not the main
thing, but the side-effect cleanup is why I've gone down this rabbit
hole and wrote the entire series here):

https://lore.kernel.org/dri-devel/20230404201842.567344-7-daniel.vet...@ffwll.ch/

But somehow for gma500 it's a problem, while for everyone else it's
fine. That's the part I dont get, or Thomas have been talking past
each another and there's another issue that I'm missing.
-Daniel

> > Consider this me throwing in the towel. If you are convinced this
> > makes sense please type it up and merge it, but I'm not going to type
> > something that just doesn't make sense to me.
>
> Honestly, I would just go with the double drm_aperture_remove_*() helper
> calls (your original patch) unless that causes real issues. There is no
> point on blocking all your series just for this IMO.
>
> Then latter if Thomas has strong opinions can send a follow-up patch for
> the gma500 driver and the aperture helpers.
>
> > -Daniel
> >
>
> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] ✓ Fi.CI.BAT: success for Add hwmon support for dgfx selftests (rev9)

2023-04-05 Thread Patchwork
== Series Details ==

Series: Add hwmon support for dgfx selftests (rev9)
URL   : https://patchwork.freedesktop.org/series/109850/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12972 -> Patchwork_109850v9


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/index.html

Participating hosts (35 -> 35)
--

  Additional (1): fi-apl-guc 
  Missing(1): fi-snb-2520m 

Known issues


  Here are the changes found in Patchwork_109850v9 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_lmem_swapping@basic:
- fi-apl-guc: NOTRUN -> [SKIP][1] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/fi-apl-guc/igt@gem_lmem_swapp...@basic.html

  * igt@i915_selftest@live@gt_lrc:
- bat-rpls-2: [PASS][2] -> [INCOMPLETE][3] ([i915#4983] / 
[i915#7913])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12972/bat-rpls-2/igt@i915_selftest@live@gt_lrc.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-rpls-2/igt@i915_selftest@live@gt_lrc.html

  * igt@i915_selftest@live@guc:
- bat-rpls-1: NOTRUN -> [DMESG-WARN][4] ([i915#7852])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-rpls-1/igt@i915_selftest@l...@guc.html

  * igt@i915_selftest@live@slpc:
- bat-dg2-9:  [PASS][5] -> [DMESG-FAIL][6] ([i915#7913])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12972/bat-dg2-9/igt@i915_selftest@l...@slpc.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-dg2-9/igt@i915_selftest@l...@slpc.html
- bat-rpls-1: NOTRUN -> [DMESG-FAIL][7] ([i915#6367] / [i915#6997])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-rpls-1/igt@i915_selftest@l...@slpc.html
- bat-dg2-8:  [PASS][8] -> [DMESG-FAIL][9] ([i915#7913])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12972/bat-dg2-8/igt@i915_selftest@l...@slpc.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-dg2-8/igt@i915_selftest@l...@slpc.html

  * igt@kms_chamelium_hpd@vga-hpd-fast:
- fi-apl-guc: NOTRUN -> [SKIP][10] ([fdo#109271]) +22 similar issues
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/fi-apl-guc/igt@kms_chamelium_...@vga-hpd-fast.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence@pipe-c-dp-1:
- bat-dg2-8:  [PASS][11] -> [FAIL][12] ([i915#7932]) +1 similar 
issue
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12972/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-dg2-8/igt@kms_pipe_crc_basic@nonblocking-crc-frame-seque...@pipe-c-dp-1.html

  * igt@kms_pipe_crc_basic@read-crc:
- bat-dg2-11: NOTRUN -> [SKIP][13] ([i915#5354])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-dg2-11/igt@kms_pipe_crc_ba...@read-crc.html

  
 Possible fixes 

  * igt@i915_selftest@live@migrate:
- bat-adlp-6: [DMESG-FAIL][14] ([i915#7699]) -> [PASS][15]
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12972/bat-adlp-6/igt@i915_selftest@l...@migrate.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-adlp-6/igt@i915_selftest@l...@migrate.html

  * igt@i915_selftest@live@mman:
- bat-rpls-1: [TIMEOUT][16] ([i915#6794]) -> [PASS][17]
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12972/bat-rpls-1/igt@i915_selftest@l...@mman.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109850v9/bat-rpls-1/igt@i915_selftest@l...@mman.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6794]: https://gitlab.freedesktop.org/drm/intel/issues/6794
  [i915#6997]: https://gitlab.freedesktop.org/drm/intel/issues/6997
  [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
  [i915#7852]: https://gitlab.freedesktop.org/drm/intel/issues/7852
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932


Build changes
-

  * Linux: CI_DRM_12972 -> Patchwork_109850v9

  CI-20190529: 20190529
  CI_DRM_12972: fc3082f44d3bfc20f364535b978fa0ba053a00f0 @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7240: ef4550e3b7d3c11ba257006bc7d4f8e421667d46 @ 
https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_109850v9: 

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add hwmon support for dgfx selftests (rev9)

2023-04-05 Thread Patchwork
== Series Details ==

Series: Add hwmon support for dgfx selftests (rev9)
URL   : https://patchwork.freedesktop.org/series/109850/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:117:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:148:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:150:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:154:26: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:156:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:174:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:176:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:180:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:182:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:186:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:188:9: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:192:35: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:16: warning: unreplaced symbol 'oldbit'
+./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:195:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:237:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:237:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:239:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:239:9: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:66:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return'
+./arch/x86/include/asm/bitops.h:92:1: warning: unreplaced symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:100:17: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:100:17: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:100:23: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:100:23: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:100:9: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:100:9: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:105:1: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:105:1: warning: unreplaced 
symbol 'return'
+./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:107:9: warning: unreplaced 
symbol 'mask'
+./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:108:9: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:109:9: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:109:9: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:111:10: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:111:10: warning: unreplaced 
symbol 'p'
+./include/asm-generic/bitops/generic-non-atomic.h:111:14: warning: unreplaced 
symbol 'old'
+./include/asm-generic/bitops/generic-non-atomic.h:111:14: warning: unreplaced 
symbol 'old'

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add hwmon support for dgfx selftests (rev9)

2023-04-05 Thread Patchwork
== Series Details ==

Series: Add hwmon support for dgfx selftests (rev9)
URL   : https://patchwork.freedesktop.org/series/109850/
State : warning

== Summary ==

Error: dim checkpatch failed
7b4acee0a7ff drm/i915/selftests: Rename librapl library to libpower
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in 
from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:125: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#125: 
rename from drivers/gpu/drm/i915/selftests/librapl.c

total: 0 errors, 1 warnings, 0 checks, 127 lines checked
cd357b4afb3a drm/i915/hwmon: Add helper function to obtain energy values
20492dd6cf8c drm/i915/selftests: Add hwmon support in libpower for dgfx
87e91aa76fd5 drm/i915/selftests: skip comparison of power for discrete graphics




Re: [Intel-gfx] [PATCH v9 25/25] docs: vfio: Add vfio device cdev description

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 14:00:00 +
"Liu, Yi L"  wrote:

> Hi Eric,
> 
> > From: Eric Auger 
> > Sent: Wednesday, April 5, 2023 9:46 PM
> > 
> > Hi Yi,
> > 
> > On 4/1/23 17:18, Yi Liu wrote:  
> > > This gives notes for userspace applications on device cdev usage.
> > >
> > > Reviewed-by: Kevin Tian 
> > > Signed-off-by: Yi Liu 
> > > ---
> > >  Documentation/driver-api/vfio.rst | 132 ++
> > >  1 file changed, 132 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/vfio.rst 
> > > b/Documentation/driver-api/vfio.rst
> > > index 363e12c90b87..4f21be7bda8a 100644
> > > --- a/Documentation/driver-api/vfio.rst
> > > +++ b/Documentation/driver-api/vfio.rst
> > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > >   /* Gratuitous device reset and go... */
> > >   ioctl(device, VFIO_DEVICE_RESET);
> > >
> > > +IOMMUFD and vfio_iommu_type1
> > > +
> > > +
> > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > +It intends to be the portal of delivering advanced userspace DMA
> > > +features (nested translation [5], PASID [6], etc.) while also providing
> > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > +vfio container and group model is intended to be deprecated.
> > > +
> > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > +In the first method, the kernel can be configured with
> > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > +transparently provides the entire infrastructure for the VFIO
> > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > +compatibility mode is not entirely feature complete relative to
> > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > +it is not generally advisable at this time to switch from native VFIO
> > > +implementations to the IOMMUFD compatibility interfaces.
> > > +
> > > +Long term, VFIO users should migrate to device access through the cdev
> > > +interface described below, and native access through the IOMMUFD
> > > +provided interfaces.
> > > +
> > > +VFIO Device cdev
> > > +
> > > +
> > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > +in a VFIO group.
> > > +
> > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > +For noiommu devices, the character device would be named with "noiommu-"
> > > +prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
> > > +
> > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > +must adapt to the new cdev security model which requires using
> > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > +be fully accessed by the user.
> > > +
> > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > +Hence those modules can be fully compiled out in an environment
> > > +where no legacy VFIO application exists.
> > > +
> > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > +cdev neither.
> > > +
> > > +Device cdev Example
> > > +---
> > > +
> > > +Assume user wants to access PCI device :6a:01.0::
> > > +
> > > + $ ls /sys/bus/pci/devices/:6a:01.0/vfio-dev/
> > > + vfio0
> > > +
> > > +This device is therefore represented as vfio0.  The user can verify
> > > +its existence::
> > > +
> > > + $ ls -l /dev/vfio/devices/vfio0
> > > + crw--- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> > > + $ cat /sys/bus/pci/devices/:6a:01.0/vfio-dev/vfio0/dev  
> > you mentionned in the pci hot reset series that the BDF couldn't be used
> > if cdev is being used. According to the above, it could, no?  
> 
> It should be the device passing case, otherwise, BDF can be used. But
> from kernel p.o.v., it has no idea how user gets the device fd, so it needs
> to assume user may not have BDF knowledge. 
> 
> > > + 511:0
> > > + $ ls -l /dev/char/511\:0
> > > + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -
> > > ../vfio/devices/vfio0
> > > +
> > > +Then provide the user with access to the device if unprivileged
> > > +operation is desired::
> > > +
> > > + $ chown user:user /dev/vfio/devices/vfio0
> > > +
> > > +Finally the user could get cdev fd by::
> > > +
> > > + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> > > +
> > > +An opened cdev_fd doesn't give the user any 

Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:

[...]

>> > > >/*
>> > > > * WARNING: Apparently we must kick fbdev drivers before vgacon,
>> > > > * otherwise the vga fbdev driver falls over.
>> > > > */
>> > > >ret = vga_remove_vgacon(pdev);
>> > 
>> > This isn't enough, we also nuke stuff that's mapping the vga fb range.

Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) then.

[...]

>> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
>> {
>>  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
>> 
>>  return vga_remove_vgacon(pdev);
>> }
>> 
>> And that can be called from gma500 and the pci aperture helper.
>
> But you still pass a pci_dev to that helper. Which just doesn't make any
> sense to me (assuming your entire point is that this isn't just a normal
> pci device but some special legacy vga thing), but if we go with (void)
> then there's more refactoring to do because the vga_remove_vgacon also
> wants a pdev.
>
> All so that we don't call aperture_detach_devices() on a bunch of pci
> bars, which apparently is not problem for any other driver, but absolutely
> is a huge problem for gma500 somehow.
>
> I don't understand why.
>

Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
is needed then starts to get a little silly. Maybe one option is to add a
3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
to iterate over PCI bars and call aperture_remove_conflicting_devices() ?

> Consider this me throwing in the towel. If you are convinced this
> makes sense please type it up and merge it, but I'm not going to type
> something that just doesn't make sense to me.

Honestly, I would just go with the double drm_aperture_remove_*() helper
calls (your original patch) unless that causes real issues. There is no
point on blocking all your series just for this IMO.

Then latter if Thomas has strong opinions can send a follow-up patch for
the gma500 driver and the aperture helpers.

> -Daniel
>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 13:37:05 -0300
Jason Gunthorpe  wrote:

> On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> 
> > But that kind of brings to light the question of what does the user do
> > when they encounter this situation.  
> 
> What does it do now when it encounters a group_id it doesn't
> understand? Userspace already doesn't know if the foreign group is
> open or not, right?

It's simple, there is currently no screwiness around opened devices.
If the caller doesn't own all the groups mapping to the affected
devices, hot-reset is not available.

> > reset can complete.  If the device is opened by a different user, the
> > reset is blocked.  The only logical conclusion is that the user should
> > try the reset regardless of the result of the info ioctl, which the  
> 
> IMHO my suggested version is still the overall saner uAPI.
> 
> An info that basically returns success/fail if reset is security
> authorized and information about the reset groupings.
> 
> Actual reset follows the returned groupings automatically.
> 
> Easy for qemu. Call the info at startup to confirm reset can be
> emulated, use the returned information to propogate the reset groups
> to the guest. Trigger the reset with no fuss when the guest asks for
> it.
> 
> Less weird corner cases.

This leads to scenarios where the info ioctl indicates a hot-reset is
initially available, perhaps only because one of the affected devices
was not opened at the time, and now it fails when QEMU actually tries
to use it.  In the group model, QEMU can know the set of affected
devices and the required groups, confirm it owns those, and for all
practical purposes guarantee that a hot-reset is available (yes, there
might be some exceptionally rare topology changes).

This goofiness around unopened devices and null-arrays is killing this
API.  Thanks,

Alex



Re: [Intel-gfx] [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-04-05 Thread Jason Gunthorpe
On Wed, Apr 05, 2023 at 09:36:46AM -0600, Alex Williamson wrote:

> > If we don't support singletion dev_set hot-reset, noiommu devices in cdev
> > path shall fail the hot-reset if empty-fd array is provided. But we may just
> > document that empty-fd array does not work for noiommu. User should
> > use the device fd array.
> 
> I don't see any replies to my comment on 08/12 where I again question
> why we need an empty array option.

I was pressing we'd do empty-fd only and not do the device fd array at
all since it is such an ugly fit for the use cases we have.

But it is such a minor detail if you don't want it then take it out.

> This singleton dev-set notion seems equally unjustified.  Do we just
> need to deal with hot-reset being unsupported for no-iommu devices
> with iommufd?

It was to support no-iommu, if you want to de-support it then it can
go away too. AFAIK dpdk doesn't use this feature and it is the only
user we know of that has support for no-iommu so it is probably safe.

Jason


Re: [Intel-gfx] [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'

2023-04-05 Thread Lee Jones
On Wed, 05 Apr 2023, Jani Nikula wrote:

> On Wed, 05 Apr 2023, Lee Jones  wrote:
> > On Tue, 04 Apr 2023, Jani Nikula wrote:
> >
> >> On Mon, 03 Apr 2023, Lee Jones  wrote:
> >> > On Mon, 03 Apr 2023, Jani Nikula wrote:
> >> >
> >> >> On Fri, 31 Mar 2023, Lee Jones  wrote:
> >> >> > Fixes the following W=1 kernel build warning(s):
> >> >> >
> >> >> >  drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function 
> >> >> > parameter or member 'size' not described in 'i915_refct_sgt_init'
> >> >> >
> >> >> > Cc: Jani Nikula 
> >> >> > Cc: Joonas Lahtinen 
> >> >> > Cc: Rodrigo Vivi 
> >> >> > Cc: Tvrtko Ursulin 
> >> >> > Cc: David Airlie 
> >> >> > Cc: Daniel Vetter 
> >> >> > Cc: intel-gfx@lists.freedesktop.org
> >> >> > Cc: dri-de...@lists.freedesktop.org
> >> >> > Signed-off-by: Lee Jones 
> >> >>
> >> >> Thanks for the patches!
> >> >>
> >> >> Applied all but one of the drm/i915 patches to drm-intel-next or
> >> >> drm-intel-gt-next depending on the area. There were a couple of issues
> >> >> that I fixed while applying. There was a conflict with patch 5/19
> >> >> against drm-intel-gt-next so I left that one out.
> >> >
> >> > Thanks Jani.  I'll rebase and see what's left.
> >>
> >> We also took notice and aim to track this more aggressively [1].
> >
> > Thanks.
> >
> > I did clean-up all of the GPU warnings already a couple of years ago,
> > but they seem to have crept back over time.  It would be great if we
> > could put some extra checks in place to prevent them in the future.
>
> We are pretty zealous about warnings in general in i915. We have a bunch
> of extra warnings in our local Makefile and use -Werror in
> development. Inspired by this series, we added kernel-doc check to the
> build, and hope to add kernel-doc -Werror too once we're done.

Sounds good that you're on it.  At least in your part of GPU.

kernel-doc warnings are surfaced by enabling W=1.

> > My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the
> > kernel, then have them promoted to W=0, so they warn more loudly during
> > development, thus keeping them from reappearing.
>
> I wish it was easier to do the equivalent of W=1 on a driver or Makefile
> basis. I like to keep i915 clean, but I don't like to use W=1 because
> there are just so many warnings currently.

Well that's what I hope to improve (again). :)

> The other alternative is fixing and moving extra warnings from W=1 to
> W=0 one by one.

Right, that's where I'd like to end up eventually.

--
Lee Jones [李琼斯]


Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Jason Gunthorpe
On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:

> But that kind of brings to light the question of what does the user do
> when they encounter this situation.

What does it do now when it encounters a group_id it doesn't
understand? Userspace already doesn't know if the foreign group is
open or not, right?

> reset can complete.  If the device is opened by a different user, the
> reset is blocked.  The only logical conclusion is that the user should
> try the reset regardless of the result of the info ioctl, which the

IMHO my suggested version is still the overall saner uAPI.

An info that basically returns success/fail if reset is security
authorized and information about the reset groupings.

Actual reset follows the returned groupings automatically.

Easy for qemu. Call the info at startup to confirm reset can be
emulated, use the returned information to propogate the reset groups
to the guest. Trigger the reset with no fuss when the guest asks for
it.

Less weird corner cases.

Jason


Re: [Intel-gfx] [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

[...]

>> 
>> but only the 'var->xres > fb->width || var->yres > fb->height' from the
>> conditions checked could be false after your __fill_var() call above.
>> 
>> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual >
>> fb->width' and 'var->yres_virtual > fb->height' checks I believe since
>> those will always be true.
>
> The __fill_var is after this. I'm honestly not sure what the exact

Ah, your patch adds it after that indeed. Please ignore my comment then.

> semantics are supposed to be, but essentially if userspace asks for too
> big virtual size, we reject it. And for anything else we then tell it
> (with __fill_var) how big the actually available space is.
>
> What I'm wondering now is whether too small x/yres won't lead to problems
> of some sorts ... For multi-screen we set the virtual size to be big
> enough for all crtc, and then just set x/yres to be the smallest output.
> That way fbcon knows to only draw as much as is visible on all screens.
> But if you then pan that too much, the bigger screens might not have a big
> enough buffer anymore and things fail (but shouldn't).
>
> Not sure how to fix that tbh.

Would this be a problem in practice?

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 14:04:51 +
"Liu, Yi L"  wrote:

> Hi Eric,
> 
> > From: Eric Auger 
> > Sent: Wednesday, April 5, 2023 8:20 PM
> > 
> > Hi Yi,
> > On 4/1/23 16:44, Yi Liu wrote:  
> > > for the users that accept device fds passed from management stacks to be
> > > able to figure out the host reset affected devices among the devices
> > > opened by the user. This is needed as such users do not have BDF (bus,
> > > devfn) knowledge about the devices it has opened, hence unable to use
> > > the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > > to figure out the affected devices.
> > >
> > > Signed-off-by: Yi Liu 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_core.c | 58 
> > >  include/uapi/linux/vfio.h| 24 -
> > >  2 files changed, 74 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > > b/drivers/vfio/pci/vfio_pci_core.c
> > > index 19f5b075d70a..a5a7e148dce1 100644
> > > --- a/drivers/vfio/pci/vfio_pci_core.c
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -30,6 +30,7 @@
> > >  #if IS_ENABLED(CONFIG_EEH)
> > >  #include 
> > >  #endif
> > > +#include 
> > >
> > >  #include "vfio_pci_priv.h"
> > >
> > > @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct  
> > vfio_pci_core_device *vdev, int irq_typ  
> > >   return 0;
> > >  }
> > >
> > > +static struct vfio_device *
> > > +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> > > +struct pci_dev *pdev)
> > > +{
> > > + struct vfio_device *cur;
> > > +
> > > + lockdep_assert_held(_set->lock);
> > > +
> > > + list_for_each_entry(cur, _set->device_list, dev_set_list)
> > > + if (cur->dev == >dev)
> > > + return cur;
> > > + return NULL;
> > > +}
> > > +
> > >  static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> > >  {
> > >   (*(int *)data)++;
> > > @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev 
> > > *pdev, void  
> > *data)  
> > >  struct vfio_pci_fill_info {
> > >   int max;
> > >   int cur;
> > > + bool require_devid;
> > > + struct iommufd_ctx *iommufd;
> > > + struct vfio_device_set *dev_set;
> > >   struct vfio_pci_dependent_device *devices;
> > >  };
> > >
> > >  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > >  {
> > >   struct vfio_pci_fill_info *fill = data;
> > > + struct vfio_device_set *dev_set = fill->dev_set;
> > >   struct iommu_group *iommu_group;
> > > + struct vfio_device *vdev;
> > > +
> > > + lockdep_assert_held(_set->lock);
> > >
> > >   if (fill->cur == fill->max)
> > >   return -EAGAIN; /* Something changed, try again */
> > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> > > void  
> > *data)  
> > >   if (!iommu_group)
> > >   return -EPERM; /* Cannot reset non-isolated devices */
> > >
> > > - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > > + if (fill->require_devid) {
> > > + /*
> > > +  * Report dev_id of the devices that are opened as cdev
> > > +  * and have the same iommufd with the fill->iommufd.
> > > +  * Otherwise, just fill IOMMUFD_INVALID_ID.
> > > +  */
> > > + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> > > + if (vdev && vfio_device_cdev_opened(vdev) &&
> > > + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> > > + vfio_iommufd_physical_devid(vdev, >devices[fill-
> > >cur].dev_id);
> > > + else
> > > + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> > > + } else {
> > > + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > > + }
> > >   fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> > >   fill->devices[fill->cur].bus = pdev->bus->number;
> > >   fill->devices[fill->cur].devfn = pdev->devfn;
> > > @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > >   return -ENOMEM;
> > >
> > >   fill.devices = devices;
> > > + fill.dev_set = vdev->vdev.dev_set;
> > >
> > > + mutex_lock(>vdev.dev_set->lock);
> > > + if (vfio_device_cdev_opened(>vdev)) {
> > > + fill.require_devid = true;
> > > + fill.iommufd = vfio_iommufd_physical_ictx(>vdev);
> > > + }
> > >   ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> > >   , slot);
> > > + mutex_unlock(>vdev.dev_set->lock);
> > >
> > >   /*
> > >* If a device was removed between counting and filling, we may come up
> > >* short of fill.max.  If a device was added, we'll have a return of
> > >* -EAGAIN above.
> > >*/
> > > - if (!ret)
> > > + if (!ret) {
> > >   hdr.count = fill.cur;
> > > + if (fill.require_devid)
> > > + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> > > + }
> > >
> > >  reset_info_exit:
> > >   if (copy_to_user(arg, , 

Re: [Intel-gfx] [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'

2023-04-05 Thread Jani Nikula
On Wed, 05 Apr 2023, Lee Jones  wrote:
> On Tue, 04 Apr 2023, Jani Nikula wrote:
>
>> On Mon, 03 Apr 2023, Lee Jones  wrote:
>> > On Mon, 03 Apr 2023, Jani Nikula wrote:
>> >
>> >> On Fri, 31 Mar 2023, Lee Jones  wrote:
>> >> > Fixes the following W=1 kernel build warning(s):
>> >> >
>> >> >  drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function 
>> >> > parameter or member 'size' not described in 'i915_refct_sgt_init'
>> >> >
>> >> > Cc: Jani Nikula 
>> >> > Cc: Joonas Lahtinen 
>> >> > Cc: Rodrigo Vivi 
>> >> > Cc: Tvrtko Ursulin 
>> >> > Cc: David Airlie 
>> >> > Cc: Daniel Vetter 
>> >> > Cc: intel-gfx@lists.freedesktop.org
>> >> > Cc: dri-de...@lists.freedesktop.org
>> >> > Signed-off-by: Lee Jones 
>> >>
>> >> Thanks for the patches!
>> >>
>> >> Applied all but one of the drm/i915 patches to drm-intel-next or
>> >> drm-intel-gt-next depending on the area. There were a couple of issues
>> >> that I fixed while applying. There was a conflict with patch 5/19
>> >> against drm-intel-gt-next so I left that one out.
>> >
>> > Thanks Jani.  I'll rebase and see what's left.
>>
>> We also took notice and aim to track this more aggressively [1].
>
> Thanks.
>
> I did clean-up all of the GPU warnings already a couple of years ago,
> but they seem to have crept back over time.  It would be great if we
> could put some extra checks in place to prevent them in the future.

We are pretty zealous about warnings in general in i915. We have a bunch
of extra warnings in our local Makefile and use -Werror in
development. Inspired by this series, we added kernel-doc check to the
build, and hope to add kernel-doc -Werror too once we're done.

> My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the
> kernel, then have them promoted to W=0, so they warn more loudly during
> development, thus keeping them from reappearing.

I wish it was easier to do the equivalent of W=1 on a driver or Makefile
basis. I like to keep i915 clean, but I don't like to use W=1 because
there are just so many warnings currently.

The other alternative is fixing and moving extra warnings from W=1 to
W=0 one by one.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 05.04.23 um 15:18 schrieb Daniel Vetter:
> > On Wed, Apr 05, 2023 at 01:16:27PM +0200, Javier Martinez Canillas wrote:
> > > Thomas Zimmermann  writes:
> > > 
> > > [...]
> > > 
> > > > 
> > > > Your comment says that it calls a PCI function to clean up to vgacon.
> > > > That comment explains what is happening, not why. And how the PCI and
> > > > vgacon code work together is non-obvious.
> > 
> > Would a better comment help then:
> > 
> > /*
> >  * gma500 is a strange hybrid device, which both acts as a pci
> >  * device (for legacy vga functionality) but also more like an
> >  * integrated display on a SoC where the framebuffer simply
> >  * resides in main memory and not in a special pci bar (that
> >  * internally redirects to a stolen range of main memory) like all
> >  * other integrated pci display devices have.
> >  *
> >  * To catch all cases we need to both remove conflicting fw
> >  * drivers for the pci device and main memory.
> >  */
> 
> Together with the existing comment, this should be the comment to describe
> gma_remove_conflicting_framebuffers().
> 
> > > > 
> > > > Again, here's my proposal for gma500:
> > > > 
> > > > // call this from psb_pci_probe()
> > > > int gma_remove_conflicting_framebuffers(struct pci_dev *pdev, const
> > > > struct drm_driver *req_driver)
> > > > {
> > > > resource_size_t base = 0;
> > > > resource_size_t size = (resource_size_t)-1;
> > > > const char *name = req_driver->name;
> > > > int ret;
> > > > 
> > > > /*
> > > >  * We cannot yet easily find the framebuffer's location in
> > > >  * memory. So remove all framebuffers here.
> > > >  *
> > > >  * TODO: Refactor psb_driver_load() to map vdc_reg earlier. Then
> > > >  *   we might be able to read the framebuffer range from the
> > > >  *   device.
> > > >  */
> > > > ret = aperture_remove_conflicting_devices(base, size, name);
> > 
> > Why can't this be a call to drm_aperture_remove_framebuffers? At least as
> > long as we don't implement the "read out actual fb base and size" code,
> > which also none of the other soc drivers bother with?
> 
> It can. Feel free to use it.
> 
> But I have to say that those DRM helpers are somewhat empty and obsolete
> after the aperture code has been moved to drivers/video/. They exist mostly
> for convenience. As with other DRM helpers, if a driver needs something
> special, it can ignore them.
> 
> > 
> > > > if (ret)
> > > > return ret;
> > > > 
> > > > /*
> > > >  * WARNING: Apparently we must kick fbdev drivers before vgacon,
> > > >  * otherwise the vga fbdev driver falls over.
> > > >  */
> > > > ret = vga_remove_vgacon(pdev);
> > 
> > This isn't enough, we also nuke stuff that's mapping the vga fb range.
> > Which is really the reason I don't want to open code random stuff, pci is
> > self-describing, if it's decoding legacy vga it can figure this out and we
> > only have to implement the "how do I nuke legacy vga fw drivers from a pci
> > driver" once.
> 
> Sure, but it's really just one additional line:
> 
>   aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> 
> as you mention below, this and vgacon can be exported in a single VGA
> aperture helper.
> 
> > 
> > Not twice like this would result in, with the gma500 version being only
> > half the thing.
> > 
> > If it absolutely has to be a separate function for the gma500 pci legacy
> > vga (I still don't get why, it's just a pci vga device, there's absolutely
> > nothing special about that part at all) then I think it needs to be at
> > least a common "nuke a legacy vga device for me pls" function, which
> > shares the implementation with the pci one.
> 
> Sure
> 
> /**
>  * kerneldoc goes here
>  *
>  * WARNING: Apparently we must remove graphics drivers before calling
>  *  this helper. Otherwise the vga fbdev driver falls over if
>  *  we have vgacon configured.
>  */
> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
> {
>   aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> 
>   return vga_remove_vgacon(pdev);
> }
> 
> And that can be called from gma500 and the pci aperture helper.

But you still pass a pci_dev to that helper. Which just doesn't make any
sense to me (assuming your entire point is that this isn't just a normal
pci device but some special legacy vga thing), but if we go with (void)
then there's more refactoring to do because the vga_remove_vgacon also
wants a pdev.

All so that we don't call aperture_detach_devices() on a bunch of pci
bars, which apparently is not problem for any other driver, but absolutely
is a huge problem for gma500 somehow.

I don't understand why.

Consider this me throwing in 

Re: [Intel-gfx] [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 08:01:49 +
"Liu, Yi L"  wrote:

> > From: Liu, Yi L 
> > Sent: Wednesday, April 5, 2023 3:55 PM  
>  
> > >
> > > Therefore, I think as written, the singleton dev_set hot-reset is
> > > enabled for iommufd and (unintentionally?) for the group path, while
> > > also negating a requirement for a group fd or that a provided group fd
> > > actually matches the device in this latter case.  The null-array
> > > approach is not however extended to groups for more general use.
> > > Additionally, limiting no-iommu hot-reset to singleton dev_sets
> > > provides only a marginal functional difference vs VFIO_DEVICE_RESET.  
> > 
> > I think the singletion dev_set hot-reset is for iommufd (or more accurately
> > for the noiommu case in cdev path).  
> 
> but actually, singleton dev_set hot-reset can work for group path as well.
> Based on this, I'm also wondering do we really want to have singleton dev_set
> hot-reset only for cdev noiommu case? or we allow it generally or just
> don't support it as it is equivalent with VFIO_DEVICE_RESET?

I think you're taking the potential that VFIO_DEVICE_RESET and
hot-reset could do the same thing too far.  The former is more likely
to do an FLR, or even a PM reset.  QEMU even tries to guess what reset
VFIO_DEVICE_RESET might use in order to choose to do a hot-reset if it
seems like the device might only support a PM reset otherwise.

Changing the reset method of a device requires privilege, which is
maybe something we'd compromise on for no-iommu, but the general
expectation is that VFIO_DEVICE_RESET provides a device level scope and
hot-reset provides a... hot-reset, and sometimes those are the same
thing, but that doesn't mean we can lean on the former.

> If we don't support singletion dev_set hot-reset, noiommu devices in cdev
> path shall fail the hot-reset if empty-fd array is provided. But we may just
> document that empty-fd array does not work for noiommu. User should
> use the device fd array.

I don't see any replies to my comment on 08/12 where I again question
why we need an empty array option.  It's causing all sorts of headaches
and I don't see the justification for it beyond some hand waving that
it reduces complexity for the user.  This singleton dev-set notion
seems equally unjustified.  Do we just need to deal with hot-reset
being unsupported for no-iommu devices with iommufd?  Thanks,

Alex



Re: [Intel-gfx] [PATCH 0/4] log2: make is_power_of_2() more generic

2023-04-05 Thread Steven Price
On 31/03/2023 09:31, Jani Nikula wrote:
> On Thu, 30 Mar 2023, Andrew Morton  wrote:
>> On Thu, 30 Mar 2023 21:53:03 + David Laight  
>> wrote:
>>
 But wouldn't all these issues be addressed by simply doing

 #define is_power_of_2(n) (n != 0 && ((n & (n - 1)) == 0))

 ?

 (With suitable tweaks to avoid evaluating `n' more than once)
>>>
>>> I think you need to use the 'horrid tricks' from min() to get
>>> a constant expression from constant inputs.
>>
>> This
>>
>> --- a/include/linux/log2.h~a
>> +++ a/include/linux/log2.h
>> @@ -41,11 +41,11 @@ int __ilog2_u64(u64 n)
>>   * *not* considered a power of two.
>>   * Return: true if @n is a power of 2, otherwise false.
>>   */
>> -static inline __attribute__((const))
>> -bool is_power_of_2(unsigned long n)
>> -{
>> -return (n != 0 && ((n & (n - 1)) == 0));
>> -}
>> +#define is_power_of_2(_n)   \
>> +({  \
>> +typeof(_n) n = (_n);\
>> +n != 0 && ((n & (n - 1)) == 0); \
>> +})
>>  
>>  /**
>>   * __roundup_pow_of_two() - round up to nearest power of two
>> _
>>
>> worked for me in a simple test.
>>
>> --- a/fs/open.c~b
>> +++ a/fs/open.c
>> @@ -1564,3 +1564,10 @@ int stream_open(struct inode *inode, str
>>  }
>>  
>>  EXPORT_SYMBOL(stream_open);
>> +
>> +#include 
>> +
>> +int foo(void)
>> +{
>> +return is_power_of_2(43);
>> +}
>> _
>>
>>
>> foo:
>> # fs/open.c:1573: }
>>  xorl%eax, %eax  #
>>  ret 
>>
>>
>> Is there some more tricky situation where it breaks?
> 
> It doesn't work with BUILD_BUG_ON_ZERO().

Like most programming problems, you just need another layer of
indirection! The below works for me in all the cases I could think of
(including __uint128_t).


#define __IS_POWER_OF_2(n) (n != 0 && ((n & (n - 1)) == 0))

#define _IS_POWER_OF_2(n, unique_n) \
({  \
typeof(n) unique_n = (n);   \
__IS_POWER_OF_2(unique_n);  \
})

#define is_power_of_2(n)\
__builtin_choose_expr(__is_constexpr((n)),  \
  __IS_POWER_OF_2((n)), \
  _IS_POWER_OF_2(n, __UNIQUE_ID(_n)))


Although Jani's original might be easier to understand.

Steve


Re: [Intel-gfx] [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi

2023-04-05 Thread Liu, Yi L
> From: Alex Williamson 
> Sent: Wednesday, April 5, 2023 11:13 PM
> 
> On Wed, 5 Apr 2023 09:31:39 +
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Wednesday, April 5, 2023 5:01 AM
> > >
> > > On Sat,  1 Apr 2023 07:44:28 -0700
> > > Yi Liu  wrote:
> > >
> > > > as there are IOMMUFD users that want to know check if an ID generated
> > > > by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
> > > > dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> > > > needs to check if the ID is valid or not.
> > > >
> > > > IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> > > > starts from 0.
> > > >
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  include/uapi/linux/iommufd.h | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > > > index 98ebba80cfa1..aeae73a93833 100644
> > > > --- a/include/uapi/linux/iommufd.h
> > > > +++ b/include/uapi/linux/iommufd.h
> > > > @@ -9,6 +9,9 @@
> > > >
> > > >  #define IOMMUFD_TYPE (';')
> > > >
> > > > +/* IDs allocated by IOMMUFD starts from 0 */
> > > > +#define IOMMUFD_INVALID_ID 0
> > > > +
> > > >  /**
> > > >   * DOC: General ioctl format
> > > >   *
> > >
> > > If allocation "starts from 0" then 0 is a valid id, no?  Does allocation
> > > start from 1, ie. skip 0?  Thanks,
> >
> > yes, it starts from 1, that's why we can use 0 as invalid id.
> 
> So the comment is wrong, correct?

yes.

Regards
Yi Liu



Re: [Intel-gfx] [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset

2023-04-05 Thread Eric Auger
Hi Jason,

On 4/5/23 13:41, Jason Gunthorpe wrote:
> On Tue, Apr 04, 2023 at 05:59:01PM +0200, Eric Auger wrote:
>
>>> but the hot reset shall fail as the group is not owned by the user.
>> sure it shall but I fail to understand if the reset fails or the device
>> plug is somehow delayed until the reset completes.
> It is just racy today - vfio_pci_dev_set_resettable() doesn't hold any
> locks across the pci_walk_bus() check to prevent hot plug in while it is
> working on the reset.

OK thanks

Eric
>
> Jason
>



Re: [Intel-gfx] [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi

2023-04-05 Thread Alex Williamson
On Wed, 5 Apr 2023 09:31:39 +
"Liu, Yi L"  wrote:

> > From: Alex Williamson 
> > Sent: Wednesday, April 5, 2023 5:01 AM
> > 
> > On Sat,  1 Apr 2023 07:44:28 -0700
> > Yi Liu  wrote:
> >   
> > > as there are IOMMUFD users that want to know check if an ID generated
> > > by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
> > > dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> > > needs to check if the ID is valid or not.
> > >
> > > IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> > > starts from 0.
> > >
> > > Signed-off-by: Yi Liu 
> > > ---
> > >  include/uapi/linux/iommufd.h | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > > index 98ebba80cfa1..aeae73a93833 100644
> > > --- a/include/uapi/linux/iommufd.h
> > > +++ b/include/uapi/linux/iommufd.h
> > > @@ -9,6 +9,9 @@
> > >
> > >  #define IOMMUFD_TYPE (';')
> > >
> > > +/* IDs allocated by IOMMUFD starts from 0 */
> > > +#define IOMMUFD_INVALID_ID 0
> > > +
> > >  /**
> > >   * DOC: General ioctl format
> > >   *  
> > 
> > If allocation "starts from 0" then 0 is a valid id, no?  Does allocation
> > start from 1, ie. skip 0?  Thanks,  
> 
> yes, it starts from 1, that's why we can use 0 as invalid id.

So the comment is wrong, correct?



Re: [Intel-gfx] [PATCH] drm/i915/mtl: Add Wa_14017856879

2023-04-05 Thread Matt Roper
On Tue, Apr 04, 2023 at 03:29:15PM -0300, Gustavo Sousa wrote:
> Quoting Haridhar Kalvala (2023-04-04 14:32:20)
> > Wa_14017856879 implementation for mtl.
> > 
> > Bspec: 46046
> > 
> > Signed-off-by: Haridhar Kalvala 
> 
> Reviewed-by: Gustavo Sousa 

Applied to drm-intel-gt-next.  Thanks for the patch and review.


Matt

> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++
> >  drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +
> >  2 files changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
> > b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > index 35a4cfac2d20..492b3de6678d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > @@ -1177,7 +1177,9 @@
> >  #define   THREAD_EX_ARB_MODE_RR_AFTER_DEP  
> > REG_FIELD_PREP(THREAD_EX_ARB_MODE, 0x2)
> >  
> >  #define HSW_ROW_CHICKEN3   _MMIO(0xe49c)
> > +#define GEN9_ROW_CHICKEN3  MCR_REG(0xe49c)
> >  #define   HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE   (1 << 6)
> > +#define   MTL_DISABLE_FIX_FOR_EOT_FLUSHREG_BIT(9)
> >  
> >  #define GEN8_ROW_CHICKEN   MCR_REG(0xe4f0)
> >  #define   FLOW_CONTROL_ENABLE  REG_BIT(15)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> > b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index 1c8e0e91a2fe..6ea453ddd011 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -2971,6 +2971,11 @@ general_render_compute_wa_init(struct 
> > intel_engine_cs *engine, struct i915_wa_li
> >  
> > add_render_compute_tuning_settings(i915, wal);
> >  
> > +   if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_B0, STEP_FOREVER) ||
> > +   IS_MTL_GRAPHICS_STEP(i915, P, STEP_B0, STEP_FOREVER))
> > +   /* Wa_14017856879 */
> > +   wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN3, 
> > MTL_DISABLE_FIX_FOR_EOT_FLUSH);
> > +
> > if (IS_MTL_GRAPHICS_STEP(i915, M, STEP_A0, STEP_B0) ||
> > IS_MTL_GRAPHICS_STEP(i915, P, STEP_A0, STEP_B0))
> > /*
> > -- 
> > 2.25.1
> >

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [Intel-gfx] [PATCH v9 16/25] iommufd/device: Add iommufd_access_detach() API

2023-04-05 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Wednesday, April 5, 2023 10:29 PM
> > > >
> > > > Does this need to go in via iommufd first?  There seems to be quite a
> > > > bit of churn in iommufd/device.c vs the vfio_mdev_ops branch (ie. it
> > > > doesn't apply). Thanks,
> > >
> > > I think it is best to stay with this series, Yi has to rebase it
> >
> > The rebased version is here. Shall I resend a version which is rebased on
> > top of vfio_mdev_ops?
> >
> >
> https://github.com/yiliu1765/iommufd/commit/d3d8f65c82fe2ca2a7b1a635f4b40b2a
> 0971daa9
> 
> When you post the v10 it should be based on top of the vfio_mdev_ops
> and the hot reset series.

yes. At least, I see the hot reset series needs to be refreshed w.r.t. the
comments from Alex and Eric.

Regards,
Yi Liu


Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Thomas Zimmermann

Hi

Am 05.04.23 um 15:18 schrieb Daniel Vetter:

On Wed, Apr 05, 2023 at 01:16:27PM +0200, Javier Martinez Canillas wrote:

Thomas Zimmermann  writes:

[...]



Your comment says that it calls a PCI function to clean up to vgacon.
That comment explains what is happening, not why. And how the PCI and
vgacon code work together is non-obvious.


Would a better comment help then:

/*
 * gma500 is a strange hybrid device, which both acts as a pci
 * device (for legacy vga functionality) but also more like an
 * integrated display on a SoC where the framebuffer simply
 * resides in main memory and not in a special pci bar (that
 * internally redirects to a stolen range of main memory) like all
 * other integrated pci display devices have.
 *
 * To catch all cases we need to both remove conflicting fw
 * drivers for the pci device and main memory.
 */


Together with the existing comment, this should be the comment to 
describe gma_remove_conflicting_framebuffers().




Again, here's my proposal for gma500:

// call this from psb_pci_probe()
int gma_remove_conflicting_framebuffers(struct pci_dev *pdev, const
struct drm_driver *req_driver)
{
resource_size_t base = 0;
resource_size_t size = (resource_size_t)-1;
const char *name = req_driver->name;
int ret;

/*
 * We cannot yet easily find the framebuffer's location in
 * memory. So remove all framebuffers here.
 *
 * TODO: Refactor psb_driver_load() to map vdc_reg earlier. Then
 *   we might be able to read the framebuffer range from the
 *   device.
 */
ret = aperture_remove_conflicting_devices(base, size, name);


Why can't this be a call to drm_aperture_remove_framebuffers? At least as
long as we don't implement the "read out actual fb base and size" code,
which also none of the other soc drivers bother with?


It can. Feel free to use it.

But I have to say that those DRM helpers are somewhat empty and obsolete 
after the aperture code has been moved to drivers/video/. They exist 
mostly for convenience. As with other DRM helpers, if a driver needs 
something special, it can ignore them.





if (ret)
return ret;

/*
 * WARNING: Apparently we must kick fbdev drivers before vgacon,
 * otherwise the vga fbdev driver falls over.
 */
ret = vga_remove_vgacon(pdev);


This isn't enough, we also nuke stuff that's mapping the vga fb range.
Which is really the reason I don't want to open code random stuff, pci is
self-describing, if it's decoding legacy vga it can figure this out and we
only have to implement the "how do I nuke legacy vga fw drivers from a pci
driver" once.


Sure, but it's really just one additional line:

  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);

as you mention below, this and vgacon can be exported in a single VGA 
aperture helper.




Not twice like this would result in, with the gma500 version being only
half the thing.

If it absolutely has to be a separate function for the gma500 pci legacy
vga (I still don't get why, it's just a pci vga device, there's absolutely
nothing special about that part at all) then I think it needs to be at
least a common "nuke a legacy vga device for me pls" function, which
shares the implementation with the pci one.


Sure

/**
 * kerneldoc goes here
 *
 * WARNING: Apparently we must remove graphics drivers before calling
 *  this helper. Otherwise the vga fbdev driver falls over if
 *  we have vgacon configured.
 */
int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
{
aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);

return vga_remove_vgacon(pdev);
}

And that can be called from gma500 and the pci aperture helper.

Best regards
Thomas



But not open-coding just half of it only.


if (ret)
return ret;

return 0;
}



If this is enough I agree that is much more easier code to understand.


It's still two calls and more code with more bugs? I'm not seeing the
point.
-Daniel


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [Intel-gfx] [PATCH v9 16/25] iommufd/device: Add iommufd_access_detach() API

2023-04-05 Thread Jason Gunthorpe
On Wed, Apr 05, 2023 at 02:10:19PM +, Liu, Yi L wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, April 5, 2023 7:56 PM
> > 
> > On Tue, Apr 04, 2023 at 04:45:12PM -0600, Alex Williamson wrote:
> > > On Sat,  1 Apr 2023 08:18:24 -0700
> > > Yi Liu  wrote:
> > >
> > > > From: Nicolin Chen 
> > > >
> > > > Previously, the detach routine is only done by the destroy(). And it was
> > > > called by vfio_iommufd_emulated_unbind() when the device runs close(), 
> > > > so
> > > > all the mappings in iopt were cleaned in that setup, when the call trace
> > > > reaches this detach() routine.
> > > >
> > > > Now, there's a need of a detach uAPI, meaning that it does not only need
> > > > a new iommufd_access_detach() API, but also requires 
> > > > access->ops->unmap()
> > > > call as a cleanup. So add one.
> > > >
> > > > However, leaving that unprotected can introduce some potential of a race
> > > > condition during the pin_/unpin_pages() call, where access->ioas->iopt 
> > > > is
> > > > getting referenced. So, add an ioas_lock to protect the context of iopt
> > > > referencings.
> > > >
> > > > Also, to allow the iommufd_access_unpin_pages() callback to happen via
> > > > this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
> > > > be affected by the "access->ioas = NULL" trick.
> > > >
> > > > Reviewed-by: Kevin Tian 
> > > > Tested-by: Terrence Xu 
> > > > Tested-by: Yanting Jiang 
> > > > Signed-off-by: Nicolin Chen 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  drivers/iommu/iommufd/device.c  | 76 +++--
> > > >  drivers/iommu/iommufd/iommufd_private.h |  2 +
> > > >  include/linux/iommufd.h |  1 +
> > > >  3 files changed, 74 insertions(+), 5 deletions(-)
> > >
> > > Does this need to go in via iommufd first?  There seems to be quite a
> > > bit of churn in iommufd/device.c vs the vfio_mdev_ops branch (ie. it
> > > doesn't apply). Thanks,
> > 
> > I think it is best to stay with this series, Yi has to rebase it
> 
> The rebased version is here. Shall I resend a version which is rebased on
> top of vfio_mdev_ops?
> 
> https://github.com/yiliu1765/iommufd/commit/d3d8f65c82fe2ca2a7b1a635f4b40b2a0971daa9

When you post the v10 it should be based on top of the vfio_mdev_ops
and the hot reset series.

Jason


Re: [Intel-gfx] [PATCH v2] drm/i915/gt: Hold a wakeref for the active VM

2023-04-05 Thread Andrzej Hajda

On 31.03.2023 16:16, Andrzej Hajda wrote:

From: Chris Wilson 

There may be a disconnect between the GT used by the engine and the GT
used for the VM, requiring us to hold a wakeref on both while the GPU is
active with this request.

v2: added explanation to __queue_and_release_pm

Signed-off-by: Chris Wilson 
[ahajda: removed not-yet-upstremed wakeref tracking bits]
Signed-off-by: Andrzej Hajda 



Queued.

Regards
Andrzej



Re: [Intel-gfx] [PATCH 4/5] drm/i915/display: Add helper func to get intel_fbdev from drm_fb_helper

2023-04-05 Thread Andrzej Hajda




On 05.04.2023 16:13, Andrzej Hajda wrote:



On 04.04.2023 16:30, Nirmoy Das wrote:

Add a helper function to retrieve struct intel_fbdev from
struct drm_fb_helper.

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 
Reviewed-by: Jani Nikula 
Reviewed-by: Andi Shyti 
Reviewed-by: Andrzej Hajda 


Reviewed-by: Andrzej Hajda 



Ups, please ignore :)



Regards
Andrzej


---
  drivers/gpu/drm/i915/display/intel_fbdev.c | 23 ++
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c

index f7d48d00ae4b..2ac9e9f8a128 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -67,6 +67,11 @@ struct intel_fbdev {
  struct mutex hpd_lock;
  };
  +static struct intel_fbdev *to_intel_fbdev(struct drm_fb_helper 
*fb_helper)

+{
+    return container_of(fb_helper, struct intel_fbdev, helper);
+}
+
  static struct intel_frontbuffer *to_frontbuffer(struct intel_fbdev 
*ifbdev)

  {
  return ifbdev->fb->frontbuffer;
@@ -79,9 +84,7 @@ static void intel_fbdev_invalidate(struct 
intel_fbdev *ifbdev)

    static int intel_fbdev_set_par(struct fb_info *info)
  {
-    struct drm_fb_helper *fb_helper = info->par;
-    struct intel_fbdev *ifbdev =
-    container_of(fb_helper, struct intel_fbdev, helper);
+    struct intel_fbdev *ifbdev = to_intel_fbdev(info->par);
  int ret;
    ret = drm_fb_helper_set_par(info);
@@ -93,9 +96,7 @@ static int intel_fbdev_set_par(struct fb_info *info)
    static int intel_fbdev_blank(int blank, struct fb_info *info)
  {
-    struct drm_fb_helper *fb_helper = info->par;
-    struct intel_fbdev *ifbdev =
-    container_of(fb_helper, struct intel_fbdev, helper);
+    struct intel_fbdev *ifbdev = to_intel_fbdev(info->par);
  int ret;
    ret = drm_fb_helper_blank(blank, info);
@@ -108,9 +109,7 @@ static int intel_fbdev_blank(int blank, struct 
fb_info *info)

  static int intel_fbdev_pan_display(struct fb_var_screeninfo *var,
 struct fb_info *info)
  {
-    struct drm_fb_helper *fb_helper = info->par;
-    struct intel_fbdev *ifbdev =
-    container_of(fb_helper, struct intel_fbdev, helper);
+    struct intel_fbdev *ifbdev = to_intel_fbdev(info->par);
  int ret;
    ret = drm_fb_helper_pan_display(var, info);
@@ -136,8 +135,7 @@ static const struct fb_ops intelfb_ops = {
  static int intelfb_alloc(struct drm_fb_helper *helper,
   struct drm_fb_helper_surface_size *sizes)
  {
-    struct intel_fbdev *ifbdev =
-    container_of(helper, struct intel_fbdev, helper);
+    struct intel_fbdev *ifbdev = to_intel_fbdev(helper);
  struct drm_framebuffer *fb;
  struct drm_device *dev = helper->dev;
  struct drm_i915_private *dev_priv = to_i915(dev);
@@ -194,8 +192,7 @@ static int intelfb_alloc(struct drm_fb_helper 
*helper,

  static int intelfb_create(struct drm_fb_helper *helper,
    struct drm_fb_helper_surface_size *sizes)
  {
-    struct intel_fbdev *ifbdev =
-    container_of(helper, struct intel_fbdev, helper);
+    struct intel_fbdev *ifbdev = to_intel_fbdev(helper);
  struct intel_framebuffer *intel_fb = ifbdev->fb;
  struct drm_device *dev = helper->dev;
  struct drm_i915_private *dev_priv = to_i915(dev);






Re: [Intel-gfx] [PATCH 4/5] drm/i915/display: Add helper func to get intel_fbdev from drm_fb_helper

2023-04-05 Thread Andrzej Hajda




On 04.04.2023 16:30, Nirmoy Das wrote:

Add a helper function to retrieve struct intel_fbdev from
struct drm_fb_helper.

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 
Reviewed-by: Jani Nikula 
Reviewed-by: Andi Shyti 
Reviewed-by: Andrzej Hajda 


Reviewed-by: Andrzej Hajda 

Regards
Andrzej


---
  drivers/gpu/drm/i915/display/intel_fbdev.c | 23 ++
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index f7d48d00ae4b..2ac9e9f8a128 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -67,6 +67,11 @@ struct intel_fbdev {
struct mutex hpd_lock;
  };
  
+static struct intel_fbdev *to_intel_fbdev(struct drm_fb_helper *fb_helper)

+{
+   return container_of(fb_helper, struct intel_fbdev, helper);
+}
+
  static struct intel_frontbuffer *to_frontbuffer(struct intel_fbdev *ifbdev)
  {
return ifbdev->fb->frontbuffer;
@@ -79,9 +84,7 @@ static void intel_fbdev_invalidate(struct intel_fbdev *ifbdev)
  
  static int intel_fbdev_set_par(struct fb_info *info)

  {
-   struct drm_fb_helper *fb_helper = info->par;
-   struct intel_fbdev *ifbdev =
-   container_of(fb_helper, struct intel_fbdev, helper);
+   struct intel_fbdev *ifbdev = to_intel_fbdev(info->par);
int ret;
  
  	ret = drm_fb_helper_set_par(info);

@@ -93,9 +96,7 @@ static int intel_fbdev_set_par(struct fb_info *info)
  
  static int intel_fbdev_blank(int blank, struct fb_info *info)

  {
-   struct drm_fb_helper *fb_helper = info->par;
-   struct intel_fbdev *ifbdev =
-   container_of(fb_helper, struct intel_fbdev, helper);
+   struct intel_fbdev *ifbdev = to_intel_fbdev(info->par);
int ret;
  
  	ret = drm_fb_helper_blank(blank, info);

@@ -108,9 +109,7 @@ static int intel_fbdev_blank(int blank, struct fb_info 
*info)
  static int intel_fbdev_pan_display(struct fb_var_screeninfo *var,
   struct fb_info *info)
  {
-   struct drm_fb_helper *fb_helper = info->par;
-   struct intel_fbdev *ifbdev =
-   container_of(fb_helper, struct intel_fbdev, helper);
+   struct intel_fbdev *ifbdev = to_intel_fbdev(info->par);
int ret;
  
  	ret = drm_fb_helper_pan_display(var, info);

@@ -136,8 +135,7 @@ static const struct fb_ops intelfb_ops = {
  static int intelfb_alloc(struct drm_fb_helper *helper,
 struct drm_fb_helper_surface_size *sizes)
  {
-   struct intel_fbdev *ifbdev =
-   container_of(helper, struct intel_fbdev, helper);
+   struct intel_fbdev *ifbdev = to_intel_fbdev(helper);
struct drm_framebuffer *fb;
struct drm_device *dev = helper->dev;
struct drm_i915_private *dev_priv = to_i915(dev);
@@ -194,8 +192,7 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
  static int intelfb_create(struct drm_fb_helper *helper,
  struct drm_fb_helper_surface_size *sizes)
  {
-   struct intel_fbdev *ifbdev =
-   container_of(helper, struct intel_fbdev, helper);
+   struct intel_fbdev *ifbdev = to_intel_fbdev(helper);
struct intel_framebuffer *intel_fb = ifbdev->fb;
struct drm_device *dev = helper->dev;
struct drm_i915_private *dev_priv = to_i915(dev);




Re: [Intel-gfx] [PATCH 2/5] drm/i915/display: Set I915_BO_ALLOC_USER for fb

2023-04-05 Thread Andrzej Hajda




On 04.04.2023 16:30, Nirmoy Das wrote:

Framebuffer is exposed to userspace so make sure we set
proper flags for it. Set I915_BO_PREALLOC for prealloced
fb so that ttm won't clear existing data.

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 

Reviewed-by: Andrzej Hajda 

Regards
Andrzej

---
  drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++-
  drivers/gpu/drm/i915/display/intel_plane_initial.c | 4 +++-
  2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 673bcdfb7ff6..f7d48d00ae4b 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
obj = ERR_PTR(-ENODEV);
if (HAS_LMEM(dev_priv)) {
obj = i915_gem_object_create_lmem(dev_priv, size,
- I915_BO_ALLOC_CONTIGUOUS);
+ I915_BO_ALLOC_CONTIGUOUS |
+ I915_BO_ALLOC_USER);
} else {
/*
 * If the FB is too big, just don't use it since fbdev is not 
very
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index bb6ea7de5c61..736072a8b2b0 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -110,7 +110,9 @@ initial_plane_vma(struct drm_i915_private *i915,
size * 2 > i915->dsm.usable_size)
return NULL;
  
-	obj = i915_gem_object_create_region_at(mem, phys_base, size, 0);

+   obj = i915_gem_object_create_region_at(mem, phys_base, size,
+  I915_BO_ALLOC_USER |
+  I915_BO_PREALLOC);
if (IS_ERR(obj))
return NULL;
  




Re: [Intel-gfx] [PATCH v9 16/25] iommufd/device: Add iommufd_access_detach() API

2023-04-05 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Wednesday, April 5, 2023 7:56 PM
> 
> On Tue, Apr 04, 2023 at 04:45:12PM -0600, Alex Williamson wrote:
> > On Sat,  1 Apr 2023 08:18:24 -0700
> > Yi Liu  wrote:
> >
> > > From: Nicolin Chen 
> > >
> > > Previously, the detach routine is only done by the destroy(). And it was
> > > called by vfio_iommufd_emulated_unbind() when the device runs close(), so
> > > all the mappings in iopt were cleaned in that setup, when the call trace
> > > reaches this detach() routine.
> > >
> > > Now, there's a need of a detach uAPI, meaning that it does not only need
> > > a new iommufd_access_detach() API, but also requires access->ops->unmap()
> > > call as a cleanup. So add one.
> > >
> > > However, leaving that unprotected can introduce some potential of a race
> > > condition during the pin_/unpin_pages() call, where access->ioas->iopt is
> > > getting referenced. So, add an ioas_lock to protect the context of iopt
> > > referencings.
> > >
> > > Also, to allow the iommufd_access_unpin_pages() callback to happen via
> > > this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
> > > be affected by the "access->ioas = NULL" trick.
> > >
> > > Reviewed-by: Kevin Tian 
> > > Tested-by: Terrence Xu 
> > > Tested-by: Yanting Jiang 
> > > Signed-off-by: Nicolin Chen 
> > > Signed-off-by: Yi Liu 
> > > ---
> > >  drivers/iommu/iommufd/device.c  | 76 +++--
> > >  drivers/iommu/iommufd/iommufd_private.h |  2 +
> > >  include/linux/iommufd.h |  1 +
> > >  3 files changed, 74 insertions(+), 5 deletions(-)
> >
> > Does this need to go in via iommufd first?  There seems to be quite a
> > bit of churn in iommufd/device.c vs the vfio_mdev_ops branch (ie. it
> > doesn't apply). Thanks,
> 
> I think it is best to stay with this series, Yi has to rebase it

The rebased version is here. Shall I resend a version which is rebased on
top of vfio_mdev_ops?

https://github.com/yiliu1765/iommufd/commit/d3d8f65c82fe2ca2a7b1a635f4b40b2a0971daa9

Regards,
Yi Liu


Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Liu, Yi L
Hi Eric,

> From: Eric Auger 
> Sent: Wednesday, April 5, 2023 8:20 PM
> 
> Hi Yi,
> On 4/1/23 16:44, Yi Liu wrote:
> > for the users that accept device fds passed from management stacks to be
> > able to figure out the host reset affected devices among the devices
> > opened by the user. This is needed as such users do not have BDF (bus,
> > devfn) knowledge about the devices it has opened, hence unable to use
> > the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > to figure out the affected devices.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 58 
> >  include/uapi/linux/vfio.h| 24 -
> >  2 files changed, 74 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index 19f5b075d70a..a5a7e148dce1 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -30,6 +30,7 @@
> >  #if IS_ENABLED(CONFIG_EEH)
> >  #include 
> >  #endif
> > +#include 
> >
> >  #include "vfio_pci_priv.h"
> >
> > @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct
> vfio_pci_core_device *vdev, int irq_typ
> > return 0;
> >  }
> >
> > +static struct vfio_device *
> > +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> > +  struct pci_dev *pdev)
> > +{
> > +   struct vfio_device *cur;
> > +
> > +   lockdep_assert_held(_set->lock);
> > +
> > +   list_for_each_entry(cur, _set->device_list, dev_set_list)
> > +   if (cur->dev == >dev)
> > +   return cur;
> > +   return NULL;
> > +}
> > +
> >  static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> >  {
> > (*(int *)data)++;
> > @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, 
> > void
> *data)
> >  struct vfio_pci_fill_info {
> > int max;
> > int cur;
> > +   bool require_devid;
> > +   struct iommufd_ctx *iommufd;
> > +   struct vfio_device_set *dev_set;
> > struct vfio_pci_dependent_device *devices;
> >  };
> >
> >  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> >  {
> > struct vfio_pci_fill_info *fill = data;
> > +   struct vfio_device_set *dev_set = fill->dev_set;
> > struct iommu_group *iommu_group;
> > +   struct vfio_device *vdev;
> > +
> > +   lockdep_assert_held(_set->lock);
> >
> > if (fill->cur == fill->max)
> > return -EAGAIN; /* Something changed, try again */
> > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> > void
> *data)
> > if (!iommu_group)
> > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > -   fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +   if (fill->require_devid) {
> > +   /*
> > +* Report dev_id of the devices that are opened as cdev
> > +* and have the same iommufd with the fill->iommufd.
> > +* Otherwise, just fill IOMMUFD_INVALID_ID.
> > +*/
> > +   vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> > +   if (vdev && vfio_device_cdev_opened(vdev) &&
> > +   fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> > +   vfio_iommufd_physical_devid(vdev, >devices[fill-
> >cur].dev_id);
> > +   else
> > +   fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> > +   } else {
> > +   fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +   }
> > fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> > fill->devices[fill->cur].bus = pdev->bus->number;
> > fill->devices[fill->cur].devfn = pdev->devfn;
> > @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > return -ENOMEM;
> >
> > fill.devices = devices;
> > +   fill.dev_set = vdev->vdev.dev_set;
> >
> > +   mutex_lock(>vdev.dev_set->lock);
> > +   if (vfio_device_cdev_opened(>vdev)) {
> > +   fill.require_devid = true;
> > +   fill.iommufd = vfio_iommufd_physical_ictx(>vdev);
> > +   }
> > ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> > , slot);
> > +   mutex_unlock(>vdev.dev_set->lock);
> >
> > /*
> >  * If a device was removed between counting and filling, we may come up
> >  * short of fill.max.  If a device was added, we'll have a return of
> >  * -EAGAIN above.
> >  */
> > -   if (!ret)
> > +   if (!ret) {
> > hdr.count = fill.cur;
> > +   if (fill.require_devid)
> > +   hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> > +   }
> >
> >  reset_info_exit:
> > if (copy_to_user(arg, , minsz))
> > @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct
> vfio_pci_core_device *vdev,
> >  static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> >  {
> > struct 

Re: [Intel-gfx] [PATCH v9 25/25] docs: vfio: Add vfio device cdev description

2023-04-05 Thread Liu, Yi L
Hi Eric,

> From: Eric Auger 
> Sent: Wednesday, April 5, 2023 9:46 PM
> 
> Hi Yi,
> 
> On 4/1/23 17:18, Yi Liu wrote:
> > This gives notes for userspace applications on device cdev usage.
> >
> > Reviewed-by: Kevin Tian 
> > Signed-off-by: Yi Liu 
> > ---
> >  Documentation/driver-api/vfio.rst | 132 ++
> >  1 file changed, 132 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst 
> > b/Documentation/driver-api/vfio.rst
> > index 363e12c90b87..4f21be7bda8a 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,130 @@ group and can access them as follows::
> > /* Gratuitous device reset and go... */
> > ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMUFD and vfio_iommu_type1
> > +
> > +
> > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > +It intends to be the portal of delivering advanced userspace DMA
> > +features (nested translation [5], PASID [6], etc.) while also providing
> > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > +vfio container and group model is intended to be deprecated.
> > +
> > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > +In the first method, the kernel can be configured with
> > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > +transparently provides the entire infrastructure for the VFIO
> > +container and IOMMU backend interfaces.  The compatibility mode can
> > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > +compatibility mode is not entirely feature complete relative to
> > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > +it is not generally advisable at this time to switch from native VFIO
> > +implementations to the IOMMUFD compatibility interfaces.
> > +
> > +Long term, VFIO users should migrate to device access through the cdev
> > +interface described below, and native access through the IOMMUFD
> > +provided interfaces.
> > +
> > +VFIO Device cdev
> > +
> > +
> > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > +in a VFIO group.
> > +
> > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > +by directly opening a character device /dev/vfio/devices/vfioX where
> > +"X" is the number allocated uniquely by VFIO for registered devices.
> > +For noiommu devices, the character device would be named with "noiommu-"
> > +prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
> > +
> > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > +must adapt to the new cdev security model which requires using
> > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > +actually use the device.  Once BIND succeeds then a VFIO device can
> > +be fully accessed by the user.
> > +
> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > +cdev neither.
> > +
> > +Device cdev Example
> > +---
> > +
> > +Assume user wants to access PCI device :6a:01.0::
> > +
> > +   $ ls /sys/bus/pci/devices/:6a:01.0/vfio-dev/
> > +   vfio0
> > +
> > +This device is therefore represented as vfio0.  The user can verify
> > +its existence::
> > +
> > +   $ ls -l /dev/vfio/devices/vfio0
> > +   crw--- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> > +   $ cat /sys/bus/pci/devices/:6a:01.0/vfio-dev/vfio0/dev
> you mentionned in the pci hot reset series that the BDF couldn't be used
> if cdev is being used. According to the above, it could, no?

It should be the device passing case, otherwise, BDF can be used. But
from kernel p.o.v., it has no idea how user gets the device fd, so it needs
to assume user may not have BDF knowledge. 

> > +   511:0
> > +   $ ls -l /dev/char/511\:0
> > +   lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -
> > ../vfio/devices/vfio0
> > +
> > +Then provide the user with access to the device if unprivileged
> > +operation is desired::
> > +
> > +   $ chown user:user /dev/vfio/devices/vfio0
> > +
> > +Finally the user could get cdev fd by::
> > +
> > +   cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> > +
> > +An opened cdev_fd doesn't give the user any permission of accessing
> > +the device except binding the cdev_fd to an iommufd.  After that point
> > +then the device is fully accessible including attaching it to an
> > +IOMMUFD IOAS/HWPT to enable userspace DMA::
> > +
> > +   struct vfio_device_bind_iommufd 

Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Rodrigo Vivi
On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> >
> 
> Hi Vinay,
> 
> > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct intel_guc_slpc 
> > *slpc, u32 val)
> > val > slpc->max_freq_softlimit)
> > return -EINVAL;
> >
> > +   /* Ignore efficient freq if lower min freq is requested */
> > +   ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < slpc->rp1_freq);
> > +   if (ret)
> > +   goto out;
> > +
> 
> I don't agree with this. If we are now providing an interface explicitly to
> ignore RPe, that should be /only/ way to ignore RPe. There should be no
> other "under the hood" ignoring of RPe. In other words, ignoring RPe should
> be minimized unless explicitly requested.
> 
> I don't clearly understand why this was done previously but it makes even
> less sense to me now after this patch.

well, I had suggested this previously. And just because without this we would
be breaking API expectations.

When user selects a minimal frequency it expect that to stick. But with the
efficient freq enabled in guc if minimal is less than the efficient one,
this request is likely ignored.

Well, even worse is that we are actually caching the request in the soft values.
So we show a minimal, but the hardware without any workload is operating at
efficient.

So, the thought process was: 'if user requested a very low minimal, we give them
the minimal requested, even if that means to disable the efficient freq.'

So, that was introduced to avoid API breakage. Removing it now would mean
breaking API. (Not sure if the IGT tests for the API got merged already,
but think that as the API contract).

But I do agree with you that having something selected from multiple places
also has the potential to cause some miss-expectations. So I was thinking
about multiple even orders where the user select the RP0 as minimal, then
enable the efficient or vice versa, but I couldn't think of a bad case.
Or at least not as bad as the user asking to get RP0 as minimal and only
getting RPe back.

With this in mind, and having checked the code:

Reviewed-by: Rodrigo Vivi 

But I won't push this immediately because I'm still open to hear another
side/angle.

> 
> Thanks.
> --
> Ashutosh
> 
> 
> > /* Need a lock now since waitboost can be modifying min as well */
> > mutex_lock(>lock);
> > wakeref = intel_runtime_pm_get(>runtime_pm);
> >
> > -   /* Ignore efficient freq if lower min freq is requested */
> > -   ret = slpc_set_param(slpc,
> > -SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
> > -val < slpc->rp1_freq);
> > -   if (ret) {
> > -   guc_probe_error(slpc_to_guc(slpc), "Failed to toggle efficient 
> > freq: %pe\n",
> > -   ERR_PTR(ret));
> > -   goto out;
> > -   }
> > -
> > ret = slpc_set_param(slpc,
> >  SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
> >  val);


Re: [Intel-gfx] [PATCH 1/5] drm/i915/ttm: Add I915_BO_PREALLOC

2023-04-05 Thread Andi Shyti
> Add a mechanism to preserve existing data when creating a TTM
> object with the I915_BO_ALLOC_USER flag. This will be used in the subsequent
> patch where the I915_BO_ALLOC_USER flag will be applied to the framebuffer
> object. For a pre-allocated framebuffer without the I915_BO_PREALLOC flag,
> TTM would clear the content, which is not desirable.

ack!

Andi


Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Rob Clark
On Wed, Apr 5, 2023 at 6:31 AM Daniel Vetter  wrote:
>
> If the crtc is being switched on or off then the semantics of
> computing the timestampe of the next vblank is somewhat ill-defined.
> And indeed, the code splats with a warning in the timestamp
> computation code. Specifically it hits the check to make sure that
> atomic drivers have full set up the timing constants in the drm_vblank
> structure, and that's just not the case before the crtc is actually
> on.
>
> For robustness it seems best to just not set deadlines for modesets.
>
> v2: Also skip on inactive crtc (Ville)
>
> Link: 
> https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> Cc: Ville Syrjälä 
> Cc: Rob Clark 
> Cc: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Reported-by: Dmitry Baryshkov 
> Tested-by: Dmitry Baryshkov  # test patch only
> Cc: Dmitry Baryshkov 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index f21b5a74176c..d44fb9b87ef8 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device *dev,
> for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> ktime_t v;
>
> +   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> +   continue;
> +
> +   if (!new_crtc_state->active)
> +   continue;
> +
> if (drm_crtc_next_vblank_start(crtc, ))
> continue;
>
> --
> 2.40.0
>


Re: [Intel-gfx] [PATCH v9 25/25] docs: vfio: Add vfio device cdev description

2023-04-05 Thread Eric Auger
Hi Yi,

On 4/1/23 17:18, Yi Liu wrote:
> This gives notes for userspace applications on device cdev usage.
>
> Reviewed-by: Kevin Tian 
> Signed-off-by: Yi Liu 
> ---
>  Documentation/driver-api/vfio.rst | 132 ++
>  1 file changed, 132 insertions(+)
>
> diff --git a/Documentation/driver-api/vfio.rst 
> b/Documentation/driver-api/vfio.rst
> index 363e12c90b87..4f21be7bda8a 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -239,6 +239,130 @@ group and can access them as follows::
>   /* Gratuitous device reset and go... */
>   ioctl(device, VFIO_DEVICE_RESET);
>  
> +IOMMUFD and vfio_iommu_type1
> +
> +
> +IOMMUFD is the new user API to manage I/O page tables from userspace.
> +It intends to be the portal of delivering advanced userspace DMA
> +features (nested translation [5], PASID [6], etc.) while also providing
> +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> +vfio container and group model is intended to be deprecated.
> +
> +The IOMMUFD backwards compatibility interface can be enabled two ways.
> +In the first method, the kernel can be configured with
> +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> +transparently provides the entire infrastructure for the VFIO
> +container and IOMMU backend interfaces.  The compatibility mode can
> +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> +compatibility mode is not entirely feature complete relative to
> +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> +it is not generally advisable at this time to switch from native VFIO
> +implementations to the IOMMUFD compatibility interfaces.
> +
> +Long term, VFIO users should migrate to device access through the cdev
> +interface described below, and native access through the IOMMUFD
> +provided interfaces.
> +
> +VFIO Device cdev
> +
> +
> +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> +in a VFIO group.
> +
> +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> +by directly opening a character device /dev/vfio/devices/vfioX where
> +"X" is the number allocated uniquely by VFIO for registered devices.
> +For noiommu devices, the character device would be named with "noiommu-"
> +prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
> +
> +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> +must adapt to the new cdev security model which requires using
> +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> +actually use the device.  Once BIND succeeds then a VFIO device can
> +be fully accessed by the user.
> +
> +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> +Hence those modules can be fully compiled out in an environment
> +where no legacy VFIO application exists.
> +
> +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> +cdev neither.
> +
> +Device cdev Example
> +---
> +
> +Assume user wants to access PCI device :6a:01.0::
> +
> + $ ls /sys/bus/pci/devices/:6a:01.0/vfio-dev/
> + vfio0
> +
> +This device is therefore represented as vfio0.  The user can verify
> +its existence::
> +
> + $ ls -l /dev/vfio/devices/vfio0
> + crw--- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> + $ cat /sys/bus/pci/devices/:6a:01.0/vfio-dev/vfio0/dev
you mentionned in the pci hot reset series that the BDF couldn't be used
if cdev is being used. According to the above, it could, no?
> + 511:0
> + $ ls -l /dev/char/511\:0
> + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> 
> ../vfio/devices/vfio0
> +
> +Then provide the user with access to the device if unprivileged
> +operation is desired::
> +
> + $ chown user:user /dev/vfio/devices/vfio0
> +
> +Finally the user could get cdev fd by::
> +
> + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> +
> +An opened cdev_fd doesn't give the user any permission of accessing
> +the device except binding the cdev_fd to an iommufd.  After that point
> +then the device is fully accessible including attaching it to an
> +IOMMUFD IOAS/HWPT to enable userspace DMA::
> +
> + struct vfio_device_bind_iommufd bind = {
> + .argsz = sizeof(bind),
> + .flags = 0,
> + };
> + struct iommu_ioas_alloc alloc_data  = {
> + .size = sizeof(alloc_data),
> + .flags = 0,
> + };
> + struct vfio_device_attach_iommufd_pt attach_data = {
> + .argsz = sizeof(attach_data),
> + .flags = 0,
> + };
> + struct iommu_ioas_map map = {
> + .size = sizeof(map),
> + 

Re: [Intel-gfx] [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'

2023-04-05 Thread Lee Jones
On Tue, 04 Apr 2023, Jani Nikula wrote:

> On Mon, 03 Apr 2023, Lee Jones  wrote:
> > On Mon, 03 Apr 2023, Jani Nikula wrote:
> >
> >> On Fri, 31 Mar 2023, Lee Jones  wrote:
> >> > Fixes the following W=1 kernel build warning(s):
> >> >
> >> >  drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function parameter 
> >> > or member 'size' not described in 'i915_refct_sgt_init'
> >> >
> >> > Cc: Jani Nikula 
> >> > Cc: Joonas Lahtinen 
> >> > Cc: Rodrigo Vivi 
> >> > Cc: Tvrtko Ursulin 
> >> > Cc: David Airlie 
> >> > Cc: Daniel Vetter 
> >> > Cc: intel-gfx@lists.freedesktop.org
> >> > Cc: dri-de...@lists.freedesktop.org
> >> > Signed-off-by: Lee Jones 
> >>
> >> Thanks for the patches!
> >>
> >> Applied all but one of the drm/i915 patches to drm-intel-next or
> >> drm-intel-gt-next depending on the area. There were a couple of issues
> >> that I fixed while applying. There was a conflict with patch 5/19
> >> against drm-intel-gt-next so I left that one out.
> >
> > Thanks Jani.  I'll rebase and see what's left.
>
> We also took notice and aim to track this more aggressively [1].

Thanks.

I did clean-up all of the GPU warnings already a couple of years ago,
but they seem to have crept back over time.  It would be great if we
could put some extra checks in place to prevent them in the future.

My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the
kernel, then have them promoted to W=0, so they warn more loudly during
development, thus keeping them from reappearing.

--
Lee Jones [李琼斯]


Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Ville Syrjälä
On Wed, Apr 05, 2023 at 03:31:05PM +0200, Daniel Vetter wrote:
> If the crtc is being switched on or off then the semantics of
> computing the timestampe of the next vblank is somewhat ill-defined.
> And indeed, the code splats with a warning in the timestamp
> computation code. Specifically it hits the check to make sure that
> atomic drivers have full set up the timing constants in the drm_vblank
> structure, and that's just not the case before the crtc is actually
> on.
> 
> For robustness it seems best to just not set deadlines for modesets.
> 
> v2: Also skip on inactive crtc (Ville)
> 
> Link: 
> https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> Cc: Ville Syrjälä 
> Cc: Rob Clark 
> Cc: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Reported-by: Dmitry Baryshkov 
> Tested-by: Dmitry Baryshkov  # test patch only
> Cc: Dmitry Baryshkov 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Ville Syrjälä 

> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index f21b5a74176c..d44fb9b87ef8 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device *dev,
>   for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
>   ktime_t v;
>  
> + if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> + continue;
> +
> + if (!new_crtc_state->active)
> + continue;
> +
>   if (drm_crtc_next_vblank_start(crtc, ))
>   continue;
>  
> -- 
> 2.40.0

-- 
Ville Syrjälä
Intel


[Intel-gfx] ✗ Fi.CI.IGT: failure for Update DSC Bigjoiner BW check (rev3)

2023-04-05 Thread Patchwork
== Series Details ==

Series: Update DSC Bigjoiner BW check (rev3)
URL   : https://patchwork.freedesktop.org/series/115773/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12967_full -> Patchwork_115773v3_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_115773v3_full absolutely need 
to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_115773v3_full, please notify your bug team to allow 
them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (7 -> 7)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_115773v3_full:

### IGT changes ###

 Possible regressions 

  * 
igt@kms_atomic_transition@plane-all-modeset-transition-fencing@pipe-a-hdmi-a-1:
- shard-glk:  NOTRUN -> [INCOMPLETE][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-glk7/igt@kms_atomic_transition@plane-all-modeset-transition-fenc...@pipe-a-hdmi-a-1.html

  
Known issues


  Here are the changes found in Patchwork_115773v3_full that come from known 
issues:

### CI changes ###

 Issues hit 

  * boot:
- shard-apl:  ([PASS][2], [PASS][3], [PASS][4], [PASS][5], 
[PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], 
[PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], 
[PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], 
[PASS][25], [PASS][26]) -> ([PASS][27], [PASS][28], [PASS][29], [PASS][30], 
[PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], 
[PASS][37], [PASS][38], [PASS][39], [PASS][40], [FAIL][41], [PASS][42], 
[PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], 
[PASS][49], [PASS][50], [PASS][51]) ([i915#4386] / [i915#8293])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl7/boot.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl7/boot.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl7/boot.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl7/boot.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl7/boot.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl1/boot.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl1/boot.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl1/boot.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl2/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl2/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl2/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl2/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl3/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl3/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl3/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl3/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl4/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl4/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl4/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl4/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl6/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl6/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl6/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl6/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl6/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl2/boot.html
   [28]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl2/boot.html
   [29]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl2/boot.html
   [30]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl1/boot.html
   [31]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl1/boot.html
   [32]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl1/boot.html
   [33]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl1/boot.html
   [34]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_115773v3/shard-apl1/boot.html
   [35]: 

[Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Daniel Vetter
If the crtc is being switched on or off then the semantics of
computing the timestampe of the next vblank is somewhat ill-defined.
And indeed, the code splats with a warning in the timestamp
computation code. Specifically it hits the check to make sure that
atomic drivers have full set up the timing constants in the drm_vblank
structure, and that's just not the case before the crtc is actually
on.

For robustness it seems best to just not set deadlines for modesets.

v2: Also skip on inactive crtc (Ville)

Link: 
https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
Cc: Ville Syrjälä 
Cc: Rob Clark 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Reported-by: Dmitry Baryshkov 
Tested-by: Dmitry Baryshkov  # test patch only
Cc: Dmitry Baryshkov 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_atomic_helper.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index f21b5a74176c..d44fb9b87ef8 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device *dev,
for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
ktime_t v;
 
+   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
+   continue;
+
+   if (!new_crtc_state->active)
+   continue;
+
if (drm_crtc_next_vblank_start(crtc, ))
continue;
 
-- 
2.40.0



Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 03:25:15PM +0300, Ville Syrjälä wrote:
> On Wed, Apr 05, 2023 at 10:16:50AM +0200, Daniel Vetter wrote:
> > If the crtc is being switched on or off then the semantics of
> > computing the timestampe of the next vblank is somewhat ill-defined.
> > And indeed, the code splats with a warning in the timestamp
> > computation code. Specifically it hits the check to make sure that
> > atomic drivers have full set up the timing constants in the drm_vblank
> > structure, and that's just not the case before the crtc is actually
> > on.
> > 
> > For robustness it seems best to just not set deadlines for modesets.
> > 
> > Link: 
> > https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> > Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> > Cc: Rob Clark 
> > Cc: Daniel Vetter 
> > Cc: Maarten Lankhorst 
> > Cc: Maxime Ripard 
> > Cc: Thomas Zimmermann 
> > Reported-by: Dmitry Baryshkov 
> > Tested-by: Dmitry Baryshkov  # test patch only
> > Cc: Dmitry Baryshkov 
> > Signed-off-by: Daniel Vetter 
> > ---
> >  drivers/gpu/drm/drm_atomic_helper.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> > b/drivers/gpu/drm/drm_atomic_helper.c
> > index f21b5a74176c..6640d80d84f3 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
> > for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> > ktime_t v;
> >  
> > +   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> > +   continue;
> 
> Should this stuff also be skipped when !new_crtc_state->active?
> I didn't actually check what drm_crtc_next_vblank_start() ends
> up doing in that case.

Uh yes, I'll spin v2.
-Daniel
> 
> > +
> > if (drm_crtc_next_vblank_start(crtc, ))
> > continue;
> >  
> > -- 
> > 2.40.0
> 
> -- 
> Ville Syrjälä
> Intel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 1/3] drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 12:21:11PM +0200, Javier Martinez Canillas wrote:
> Daniel Vetter  writes:
> 
> > Drivers are supposed to fix this up if needed if they don't outright
> > reject it. Uncovered by 6c11df58fd1a ("fbmem: Check virtual screen
> > sizes in fb_set_var()").
> >
> 
> Should have a Fixes: tag ? I understand what was uncovered by that commit
> but it help distros to figure out if something has to be cherry-picked by
> them. So I believe that would be useful to have it.
> 
> The patch looks good to me.

The cc: stable should go far enough back for that. Or that was at least my
idea ... I can add the Fixes: back since I had it but dropped it
intentionally because it's not really a bug in the fbmem patch.
-Daniel

> Reviewed-by: Javier Martinez Canillas 
> 
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 12:52:12PM +0200, Javier Martinez Canillas wrote:
> Daniel Vetter  writes:
> 
> > Apparently drivers need to check all this stuff themselves, which for
> > most things makes sense I guess. And for everything else we luck out,
> > because modern distros stopped supporting any other fbdev drivers than
> > drm ones and I really don't want to argue anymore about who needs to
> > check stuff. Therefore fixing all this just for drm fbdev emulation is
> > good enough.
> >
> 
> Agreed.
> 
> > Note that var->active is not set or validated. This is just control
> > flow for fbmem.c and needs to be validated in there as needed.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Maarten Lankhorst 
> > Cc: Maxime Ripard 
> > Cc: Thomas Zimmermann 
> > ---
> 
> [...]
> 
> >  
> > +static void __fill_var(struct fb_var_screeninfo *var,
> > +  struct drm_framebuffer *fb)
> > +{
> > +   int i;
> > +
> > +   var->xres_virtual = fb->width;
> > +   var->yres_virtual = fb->height;
> > +   var->accel_flags = FB_ACCELF_TEXT;
> > +   var->bits_per_pixel = drm_format_info_bpp(fb->format, 0);
> > +
> > +   var->height = var->width = 0;
> > +   var->left_margin = var->right_margin = 0;
> > +   var->upper_margin = var->lower_margin = 0;
> > +   var->hsync_len = var->vsync_len = 0;
> > +   var->sync = var->vmode = 0;
> > +   var->rotate = 0;
> > +   var->colorspace = 0;
> > +   for (i = 0; i < 4; i++)
> > +   var->reserved[i] = 0;
> > +}
> > +
> >  /**
> >   * drm_fb_helper_check_var - implementation for _ops.fb_check_var
> >   * @var: screeninfo to check
> > @@ -1595,8 +1616,22 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo 
> > *var,
> > return -EINVAL;
> > }
> >  
> > -   var->xres_virtual = fb->width;
> > -   var->yres_virtual = fb->height;
> > +   __fill_var(var, fb);
> > +
> 
> [...]
> 
> There is the following here (in latest drm-misc/drm-misc-next at least):
> 
>   /*
>* Changes struct fb_var_screeninfo are currently not pushed back
>* to KMS, hence fail if different settings are requested.
>*/
>   bpp = drm_format_info_bpp(format, 0);
>   if (var->bits_per_pixel > bpp ||
>   var->xres > fb->width || var->yres > fb->height ||
>   var->xres_virtual > fb->width || var->yres_virtual > fb->height) {
>   drm_dbg_kms(dev, "fb requested width/height/bpp can't fit in 
> current fb "
> "request %dx%d-%d (virtual %dx%d) > %dx%d-%d\n",
> var->xres, var->yres, var->bits_per_pixel,
> var->xres_virtual, var->yres_virtual,
> fb->width, fb->height, bpp);
>   return -EINVAL;
>   }
> 
> but only the 'var->xres > fb->width || var->yres > fb->height' from the
> conditions checked could be false after your __fill_var() call above.
> 
> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual >
> fb->width' and 'var->yres_virtual > fb->height' checks I believe since
> those will always be true.

The __fill_var is after this. I'm honestly not sure what the exact
semantics are supposed to be, but essentially if userspace asks for too
big virtual size, we reject it. And for anything else we then tell it
(with __fill_var) how big the actually available space is.

What I'm wondering now is whether too small x/yres won't lead to problems
of some sorts ... For multi-screen we set the virtual size to be big
enough for all crtc, and then just set x/yres to be the smallest output.
That way fbcon knows to only draw as much as is visible on all screens.
But if you then pan that too much, the bigger screens might not have a big
enough buffer anymore and things fail (but shouldn't).

Not sure how to fix that tbh.
-Daniel

> 
> > +   /*
> > +* fb_pan_display() validates this, but fb_set_par() doesn't and just
> > +* falls over. Note that __fill_var above adjusts y/res_virtual.
> > +*/
> > +   if (var->yoffset > var->yres_virtual - var->yres ||
> > +   var->xoffset > var->xres_virtual - var->xres)
> > +   return -EINVAL;
> > +
> > +   /* We neither support grayscale nor FOURCC (also stored in here). */
> > +   if (var->grayscale > 0)
> > +   return -EINVAL;
> > +
> > +   if (var->nonstd)
> > +   return -EINVAL;
> >  
> > /*
> >  * Workaround for SDL 1.2, which is known to be setting all pixel format
> > @@ -1612,11 +1647,6 @@ int drm_fb_helper_check_var(struct fb_var_screeninfo 
> > *var,
> > drm_fb_helper_fill_pixel_fmt(var, format);
> > }
> >  
> 
> Other than what I mentioned, the patch makes sense to me.
> 
> Reviewed-by: Javier Martinez Canillas 
> 
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 01:16:27PM +0200, Javier Martinez Canillas wrote:
> Thomas Zimmermann  writes:
> 
> [...]
> 
> >
> > Your comment says that it calls a PCI function to clean up to vgacon. 
> > That comment explains what is happening, not why. And how the PCI and 
> > vgacon code work together is non-obvious.

Would a better comment help then:

/*
 * gma500 is a strange hybrid device, which both acts as a pci
 * device (for legacy vga functionality) but also more like an
 * integrated display on a SoC where the framebuffer simply
 * resides in main memory and not in a special pci bar (that
 * internally redirects to a stolen range of main memory) like all
 * other integrated pci display devices have.
 *
 * To catch all cases we need to both remove conflicting fw
 * drivers for the pci device and main memory.
 */
> >
> > Again, here's my proposal for gma500:
> >
> > // call this from psb_pci_probe()
> > int gma_remove_conflicting_framebuffers(struct pci_dev *pdev, const
> > struct drm_driver *req_driver)
> > {
> > resource_size_t base = 0;
> > resource_size_t size = (resource_size_t)-1;
> > const char *name = req_driver->name;
> > int ret;
> >
> > /*
> >  * We cannot yet easily find the framebuffer's location in
> >  * memory. So remove all framebuffers here.
> >  *
> >  * TODO: Refactor psb_driver_load() to map vdc_reg earlier. Then
> >  *   we might be able to read the framebuffer range from the
> >  *   device.
> >  */
> > ret = aperture_remove_conflicting_devices(base, size, name);

Why can't this be a call to drm_aperture_remove_framebuffers? At least as
long as we don't implement the "read out actual fb base and size" code,
which also none of the other soc drivers bother with?

> > if (ret)
> > return ret;
> >
> > /*
> >  * WARNING: Apparently we must kick fbdev drivers before vgacon,
> >  * otherwise the vga fbdev driver falls over.
> >  */
> > ret = vga_remove_vgacon(pdev);

This isn't enough, we also nuke stuff that's mapping the vga fb range.
Which is really the reason I don't want to open code random stuff, pci is
self-describing, if it's decoding legacy vga it can figure this out and we
only have to implement the "how do I nuke legacy vga fw drivers from a pci
driver" once.

Not twice like this would result in, with the gma500 version being only
half the thing.

If it absolutely has to be a separate function for the gma500 pci legacy
vga (I still don't get why, it's just a pci vga device, there's absolutely
nothing special about that part at all) then I think it needs to be at
least a common "nuke a legacy vga device for me pls" function, which
shares the implementation with the pci one.

But not open-coding just half of it only.

> > if (ret)
> > return ret;
> >
> > return 0;
> > }
> >
> 
> If this is enough I agree that is much more easier code to understand.

It's still two calls and more code with more bugs? I'm not seeing the
point.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 3/8] drm/aperture: Remove primary argument

2023-04-05 Thread Martin Blumenstingl
On Tue, Apr 4, 2023 at 10:18 PM Daniel Vetter  wrote:
>
> Only really pci devices have a business setting this - it's for
> figuring out whether the legacy vga stuff should be nuked too. And
> with the preceeding two patches those are all using the pci version of
I think it's spelled "preceding"

[...]
>  drivers/gpu/drm/meson/meson_drv.c   |  2 +-
for the meson driver:
Acked-by: Martin Blumenstingl 


Thank you and best regards,
Martin


[Intel-gfx] [PATCH] i915: Correct description of default value for enable_psr2_sel_fetch

2023-04-05 Thread Qiyu Yan
The default value of i915.enable_psr2_sel_fetch is true while the
description given in i915_params.c is 0. Changing to correct the
description.

Signed-off-by: Qiyu Yan 
---
 drivers/gpu/drm/i915/i915_params.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index ade744ccc..fa9ddcbe8 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -119,7 +119,7 @@ i915_param_named(psr_safest_params, bool, 0400,
 i915_param_named_unsafe(enable_psr2_sel_fetch, bool, 0400,
"Enable PSR2 selective fetch "
"(0=disabled, 1=enabled) "
-   "Default: 0");
+   "Default: 1");
 
 i915_param_named_unsafe(force_probe, charp, 0400,
"Force probe options for specified supported devices. "
-- 
2.40.0



[Intel-gfx] ✓ Fi.CI.IGT: success for fdinfo: Enable some support for GuC based client busyness

2023-04-05 Thread Patchwork
== Series Details ==

Series: fdinfo: Enable some support for GuC based client busyness
URL   : https://patchwork.freedesktop.org/series/116120/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12967_full -> Patchwork_116120v1_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Participating hosts (7 -> 7)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_116120v1_full:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@drm_fdinfo@isolation@vecs0:
- {shard-dg1}:NOTRUN -> [SKIP][1] +26 similar issues
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-dg1-16/igt@drm_fdinfo@isolat...@vecs0.html

  * igt@drm_fdinfo@virtual-busy-idle-all:
- {shard-dg1}:[SKIP][2] ([i915#5563]) -> [SKIP][3] +2 similar issues
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-dg1-14/igt@drm_fdi...@virtual-busy-idle-all.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-dg1-14/igt@drm_fdi...@virtual-busy-idle-all.html

  
Known issues


  Here are the changes found in Patchwork_116120v1_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_fair@basic-none-solo@rcs0:
- shard-apl:  [PASS][4] -> [FAIL][5] ([i915#2842])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl2/igt@gem_exec_fair@basic-none-s...@rcs0.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-apl4/igt@gem_exec_fair@basic-none-s...@rcs0.html

  * igt@gem_huc_copy@huc-copy:
- shard-apl:  NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#2190])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-apl4/igt@gem_huc_c...@huc-copy.html
- shard-glk:  NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#2190])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-glk7/igt@gem_huc_c...@huc-copy.html

  * igt@gen9_exec_parse@allowed-single:
- shard-apl:  [PASS][8] -> [ABORT][9] ([i915#5566])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-apl2/igt@gen9_exec_pa...@allowed-single.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-apl3/igt@gen9_exec_pa...@allowed-single.html

  * igt@i915_pm_rpm@system-suspend-devices:
- shard-snb:  NOTRUN -> [SKIP][10] ([fdo#109271]) +22 similar issues
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-snb5/igt@i915_pm_...@system-suspend-devices.html

  * igt@i915_selftest@mock@sanitycheck:
- shard-snb:  [PASS][11] -> [ABORT][12] ([i915#4528])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12967/shard-snb4/igt@i915_selftest@m...@sanitycheck.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-snb4/igt@i915_selftest@m...@sanitycheck.html

  * igt@kms_ccs@pipe-b-random-ccs-data-4_tiled_dg2_rc_ccs:
- shard-apl:  NOTRUN -> [SKIP][13] ([fdo#109271]) +12 similar issues
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-apl2/igt@kms_ccs@pipe-b-random-ccs-data-4_tiled_dg2_rc_ccs.html

  * igt@kms_cursor_crc@cursor-random-max-size:
- shard-glk:  NOTRUN -> [SKIP][14] ([fdo#109271]) +10 similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-glk9/igt@kms_cursor_...@cursor-random-max-size.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-a-dp-1:
- shard-apl:  NOTRUN -> [FAIL][15] ([i915#7862]) +1 similar issue
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-apl4/igt@kms_plane_alpha_blend@alpha-ba...@pipe-a-dp-1.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-c-hdmi-a-1:
- shard-glk:  NOTRUN -> [FAIL][16] ([i915#7862]) +1 similar issue
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-glk7/igt@kms_plane_alpha_blend@alpha-ba...@pipe-c-hdmi-a-1.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf:
- shard-apl:  NOTRUN -> [SKIP][17] ([fdo#109271] / [i915#658])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-apl7/igt@kms_psr2...@cursor-plane-move-continuous-exceed-fully-sf.html
- shard-glk:  NOTRUN -> [SKIP][18] ([fdo#109271] / [i915#658])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116120v1/shard-glk6/igt@kms_psr2...@cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@perf@stress-open-close@0-rcs0:
- shard-glk:  [PASS][19] -> [ABORT][20] ([i915#5213])
   [19]: 

Re: [Intel-gfx] [PATCH RESEND v3 0/3] drm/ttm: Small fixes / cleanups in prep for shrinking

2023-04-05 Thread Thomas Hellström



On 4/5/23 14:32, Christian König wrote:

Am 04.04.23 um 22:06 schrieb Thomas Hellström:

I collected the, from my POW, uncontroversial patches from V1 of the TTM
shrinker series, some corrected after the initial patch submission, one
patch added from the Xe RFC ("drm/ttm: Don't print error message if
eviction was interrupted"). It would be nice to have these reviewed and
merged while reworking the rest.

v2:
- Simplify __ttm_pool_free().
- Fix the TTM_TT_FLAG bit numbers.
- Keep all allocation orders for TTM pages at or below PMD order

v3:
- Rename __tm_pool_free() to ttm_pool_free_range(). Document.
- Compile-fix.


Reviewed-by: Christian König  for the series.


Thanks, Christian.

/Thomas






Thomas Hellström (3):
   drm/ttm/pool: Fix ttm_pool_alloc error path
   drm/ttm: Reduce the number of used allocation orders for TTM pages
   drm/ttm: Make the call to ttm_tt_populate() interruptible when
 faulting

  drivers/gpu/drm/ttm/ttm_bo_vm.c |  13 +++-
  drivers/gpu/drm/ttm/ttm_pool.c  | 111 
  2 files changed, 80 insertions(+), 44 deletions(-)





Re: [Intel-gfx] [PATCH 1/5] drm/i915/ttm: Add I915_BO_PREALLOC

2023-04-05 Thread Das, Nirmoy

Hi Andi,

On 4/5/2023 1:53 PM, Andi Shyti wrote:

Hi Nirmoy,


Add a mechanism to keep existing data when creating
a ttm object with I915_BO_ALLOC_USER flag.

why do we need this mechanism? What was the logic behind? These
are all questions people might have when checking this commit.
Please be a bit more explicative.


Agree, the commit message is bit short. I will add more content in next
revision.

you don't need to send a new version just for this commit log.

You could just propose a new commit log in the reply and if it's
OK, add it before pushing it.


Let me know what do you think about:

Add a mechanism to preserve existing data when creating a TTM

object with the I915_BO_ALLOC_USER flag. This will be used in the subsequent

patch where the I915_BO_ALLOC_USER flag will be applied to the framebuffer

object. For a pre-allocated framebuffer without the I915_BO_PREALLOC flag,

TTM would clear the content, which is not desirable.

Thanks,

Nirmoy



As you wish.

Andi


Cc: Matthew Auld
Cc: Andi Shyti
Cc: Andrzej Hajda
Cc: Ville Syrjälä
Cc: Jani Nikula
Cc: Imre Deak
Signed-off-by: Nirmoy Das

Reviewed-by: Andi Shyti


Thanks,

Nirmoy


Thanks,
Andi

Re: [Intel-gfx] [PATCH RESEND v3 0/3] drm/ttm: Small fixes / cleanups in prep for shrinking

2023-04-05 Thread Christian König

Am 04.04.23 um 22:06 schrieb Thomas Hellström:

I collected the, from my POW, uncontroversial patches from V1 of the TTM
shrinker series, some corrected after the initial patch submission, one
patch added from the Xe RFC ("drm/ttm: Don't print error message if
eviction was interrupted"). It would be nice to have these reviewed and
merged while reworking the rest.

v2:
- Simplify __ttm_pool_free().
- Fix the TTM_TT_FLAG bit numbers.
- Keep all allocation orders for TTM pages at or below PMD order

v3:
- Rename __tm_pool_free() to ttm_pool_free_range(). Document.
- Compile-fix.


Reviewed-by: Christian König  for the series.



Thomas Hellström (3):
   drm/ttm/pool: Fix ttm_pool_alloc error path
   drm/ttm: Reduce the number of used allocation orders for TTM pages
   drm/ttm: Make the call to ttm_tt_populate() interruptible when
 faulting

  drivers/gpu/drm/ttm/ttm_bo_vm.c |  13 +++-
  drivers/gpu/drm/ttm/ttm_pool.c  | 111 
  2 files changed, 80 insertions(+), 44 deletions(-)





Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Ville Syrjälä
On Wed, Apr 05, 2023 at 10:16:50AM +0200, Daniel Vetter wrote:
> If the crtc is being switched on or off then the semantics of
> computing the timestampe of the next vblank is somewhat ill-defined.
> And indeed, the code splats with a warning in the timestamp
> computation code. Specifically it hits the check to make sure that
> atomic drivers have full set up the timing constants in the drm_vblank
> structure, and that's just not the case before the crtc is actually
> on.
> 
> For robustness it seems best to just not set deadlines for modesets.
> 
> Link: 
> https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> Cc: Rob Clark 
> Cc: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Reported-by: Dmitry Baryshkov 
> Tested-by: Dmitry Baryshkov  # test patch only
> Cc: Dmitry Baryshkov 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index f21b5a74176c..6640d80d84f3 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1528,6 +1528,9 @@ static void set_fence_deadline(struct drm_device *dev,
>   for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
>   ktime_t v;
>  
> + if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> + continue;

Should this stuff also be skipped when !new_crtc_state->active?
I didn't actually check what drm_crtc_next_vblank_start() ends
up doing in that case.

> +
>   if (drm_crtc_next_vblank_start(crtc, ))
>   continue;
>  
> -- 
> 2.40.0

-- 
Ville Syrjälä
Intel


Re: [Intel-gfx] [PATCH v9 03/25] vfio: Remove vfio_file_is_group()

2023-04-05 Thread Eric Auger
Hi Yi,

On 4/1/23 17:18, Yi Liu wrote:
> since no user of vfio_file_is_group() now.
>
> Reviewed-by: Kevin Tian 
> Reviewed-by: Jason Gunthorpe 
> Tested-by: Terrence Xu 
> Tested-by: Nicolin Chen 
> Tested-by: Yanting Jiang 
> Signed-off-by: Yi Liu 

Reviewed-by: Eric Auger 

Eric
> ---
>  drivers/vfio/group.c | 10 --
>  include/linux/vfio.h |  1 -
>  2 files changed, 11 deletions(-)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index ede4723c5f72..4f937ebaf6f7 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -792,16 +792,6 @@ struct iommu_group *vfio_file_iommu_group(struct file 
> *file)
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
>  
> -/**
> - * vfio_file_is_group - True if the file is a vfio group file
> - * @file: VFIO group file
> - */
> -bool vfio_file_is_group(struct file *file)
> -{
> - return vfio_group_from_file(file);
> -}
> -EXPORT_SYMBOL_GPL(vfio_file_is_group);
> -
>  bool vfio_group_enforced_coherent(struct vfio_group *group)
>  {
>   struct vfio_device *device;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index d9a0770e5fc1..7519ae89fcd6 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -264,7 +264,6 @@ int vfio_mig_get_next_state(struct vfio_device *device,
>   * External user API
>   */
>  struct iommu_group *vfio_file_iommu_group(struct file *file);
> -bool vfio_file_is_group(struct file *file);
>  bool vfio_file_is_valid(struct file *file);
>  bool vfio_file_enforced_coherent(struct file *file);
>  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);



Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-05 Thread Eric Auger


Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> for the users that accept device fds passed from management stacks to be
> able to figure out the host reset affected devices among the devices
> opened by the user. This is needed as such users do not have BDF (bus,
> devfn) knowledge about the devices it has opened, hence unable to use
> the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> to figure out the affected devices.
>
> Signed-off-by: Yi Liu 
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 58 
>  include/uapi/linux/vfio.h| 24 -
>  2 files changed, 74 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> b/drivers/vfio/pci/vfio_pci_core.c
> index 19f5b075d70a..a5a7e148dce1 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -30,6 +30,7 @@
>  #if IS_ENABLED(CONFIG_EEH)
>  #include 
>  #endif
> +#include 
>  
>  #include "vfio_pci_priv.h"
>  
> @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct 
> vfio_pci_core_device *vdev, int irq_typ
>   return 0;
>  }
>  
> +static struct vfio_device *
> +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> +struct pci_dev *pdev)
> +{
> + struct vfio_device *cur;
> +
> + lockdep_assert_held(_set->lock);
> +
> + list_for_each_entry(cur, _set->device_list, dev_set_list)
> + if (cur->dev == >dev)
> + return cur;
> + return NULL;
> +}
> +
>  static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
>  {
>   (*(int *)data)++;
> @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, 
> void *data)
>  struct vfio_pci_fill_info {
>   int max;
>   int cur;
> + bool require_devid;
> + struct iommufd_ctx *iommufd;
> + struct vfio_device_set *dev_set;
>   struct vfio_pci_dependent_device *devices;
>  };
>  
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
>   struct vfio_pci_fill_info *fill = data;
> + struct vfio_device_set *dev_set = fill->dev_set;
>   struct iommu_group *iommu_group;
> + struct vfio_device *vdev;
> +
> + lockdep_assert_held(_set->lock);
>  
>   if (fill->cur == fill->max)
>   return -EAGAIN; /* Something changed, try again */
> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void 
> *data)
>   if (!iommu_group)
>   return -EPERM; /* Cannot reset non-isolated devices */
>  
> - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + if (fill->require_devid) {
> + /*
> +  * Report dev_id of the devices that are opened as cdev
> +  * and have the same iommufd with the fill->iommufd.
> +  * Otherwise, just fill IOMMUFD_INVALID_ID.
> +  */
> + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> + if (vdev && vfio_device_cdev_opened(vdev) &&
> + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> + vfio_iommufd_physical_devid(vdev, 
> >devices[fill->cur].dev_id);
> + else
> + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> + } else {
> + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + }
>   fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>   fill->devices[fill->cur].bus = pdev->bus->number;
>   fill->devices[fill->cur].devfn = pdev->devfn;
> @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>   return -ENOMEM;
>  
>   fill.devices = devices;
> + fill.dev_set = vdev->vdev.dev_set;
>  
> + mutex_lock(>vdev.dev_set->lock);
> + if (vfio_device_cdev_opened(>vdev)) {
> + fill.require_devid = true;
> + fill.iommufd = vfio_iommufd_physical_ictx(>vdev);
> + }
>   ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>   , slot);
> + mutex_unlock(>vdev.dev_set->lock);
>  
>   /*
>* If a device was removed between counting and filling, we may come up
>* short of fill.max.  If a device was added, we'll have a return of
>* -EAGAIN above.
>*/
> - if (!ret)
> + if (!ret) {
>   hdr.count = fill.cur;
> + if (fill.require_devid)
> + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> + }
>  
>  reset_info_exit:
>   if (copy_to_user(arg, , minsz))
> @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct 
> vfio_pci_core_device *vdev,
>  static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
>  {
>   struct vfio_device_set *dev_set = data;
> - struct vfio_device *cur;
>  
> - list_for_each_entry(cur, _set->device_list, dev_set_list)
> - if (cur->dev == >dev)
> -   

Re: [Intel-gfx] [PATCH v9 16/25] iommufd/device: Add iommufd_access_detach() API

2023-04-05 Thread Jason Gunthorpe
On Tue, Apr 04, 2023 at 04:45:12PM -0600, Alex Williamson wrote:
> On Sat,  1 Apr 2023 08:18:24 -0700
> Yi Liu  wrote:
> 
> > From: Nicolin Chen 
> > 
> > Previously, the detach routine is only done by the destroy(). And it was
> > called by vfio_iommufd_emulated_unbind() when the device runs close(), so
> > all the mappings in iopt were cleaned in that setup, when the call trace
> > reaches this detach() routine.
> > 
> > Now, there's a need of a detach uAPI, meaning that it does not only need
> > a new iommufd_access_detach() API, but also requires access->ops->unmap()
> > call as a cleanup. So add one.
> > 
> > However, leaving that unprotected can introduce some potential of a race
> > condition during the pin_/unpin_pages() call, where access->ioas->iopt is
> > getting referenced. So, add an ioas_lock to protect the context of iopt
> > referencings.
> > 
> > Also, to allow the iommufd_access_unpin_pages() callback to happen via
> > this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
> > be affected by the "access->ioas = NULL" trick.
> > 
> > Reviewed-by: Kevin Tian 
> > Tested-by: Terrence Xu 
> > Tested-by: Yanting Jiang 
> > Signed-off-by: Nicolin Chen 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/device.c  | 76 +++--
> >  drivers/iommu/iommufd/iommufd_private.h |  2 +
> >  include/linux/iommufd.h |  1 +
> >  3 files changed, 74 insertions(+), 5 deletions(-)
> 
> Does this need to go in via iommufd first?  There seems to be quite a
> bit of churn in iommufd/device.c vs the vfio_mdev_ops branch (ie. it
> doesn't apply). Thanks,

I think it is best to stay with this series, Yi has to rebase it

Jason


Re: [Intel-gfx] [PATCH 1/5] drm/i915/ttm: Add I915_BO_PREALLOC

2023-04-05 Thread Andi Shyti
Hi Nirmoy,

> > > Add a mechanism to keep existing data when creating
> > > a ttm object with I915_BO_ALLOC_USER flag.
> > why do we need this mechanism? What was the logic behind? These
> > are all questions people might have when checking this commit.
> > Please be a bit more explicative.
> 
> 
> Agree, the commit message is bit short. I will add more content in next
> revision.

you don't need to send a new version just for this commit log.

You could just propose a new commit log in the reply and if it's
OK, add it before pushing it.

As you wish.

Andi

> > 
> > > Cc: Matthew Auld 
> > > Cc: Andi Shyti 
> > > Cc: Andrzej Hajda 
> > > Cc: Ville Syrjälä 
> > > Cc: Jani Nikula 
> > > Cc: Imre Deak 
> > > Signed-off-by: Nirmoy Das 
> > Reviewed-by: Andi Shyti 
> 
> 
> Thanks,
> 
> Nirmoy
> 
> > 
> > Thanks,
> > Andi


Re: [Intel-gfx] [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device

2023-04-05 Thread Eric Auger



On 4/1/23 16:44, Yi Liu wrote:
> There are users that need to check if vfio_device is opened as cdev.
> e.g. vfio-pci. This adds a flag in vfio_device, it will be set in the
> cdev path when device is opened. This is not used at this moment, but
> a preparation for vfio device cdev support.

better to squash this patch with the patch setting cdev_opened then?

Thanks

Eric
>
> Signed-off-by: Yi Liu 
> ---
>  include/linux/vfio.h | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index f8fb9ab25188..d9a0770e5fc1 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -62,6 +62,7 @@ struct vfio_device {
>   struct iommufd_device *iommufd_device;
>   bool iommufd_attached;
>  #endif
> + bool cdev_opened;
>  };
>  
>  /**
> @@ -151,6 +152,12 @@ vfio_iommufd_physical_devid(struct vfio_device *vdev, 
> u32 *id)
>   ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
>  #endif
>  
> +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> +{
> + lockdep_assert_held(>dev_set->lock);
> + return device->cdev_opened;
> +}
> +
>  /**
>   * @migration_set_state: Optional callback to change the migration state for
>   * devices that support migration. It's mandatory for



  1   2   >