Re: [PATCH 7/8] video/aperture: Only remove sysfb on the default vga pci device

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> Instead of calling aperture_remove_conflicting_devices() to remove the
> conflicting devices, just call to aperture_detach_devices() to detach
> the device that matches the same PCI BAR / aperture range. Since the
> former is just a wrapper of the latter plus a sysfb_disable() call,
> and now that's done in this function but only for the primary devices.
>
> This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
> sysfb device registration when removing conflicting FBs"), where we
> remove the sysfb when loading a driver for an unrelated pci device,
> resulting in the user loosing their efifb console or similar.
>
> Note that in practice this only is a problem with the nvidia blob,
> because that's the only gpu driver people might install which does not
> come with an fbdev driver of it's own. For everyone else the real gpu
> driver will restore a working console.
>
> Also note that in the referenced bug there's confusion that this same
> bug also happens on amdgpu. But that was just another amdgpu specific
> regression, which just happened to happen at roughly the same time and
> with the same user-observable symptoms. That bug is fixed now, see
> https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
>
> Note that we should not have any such issues on non-pci multi-gpu
> issues, because I could only find two such cases:
> - SoC with some external panel over spi or similar. These panel
>   drivers do not use drm_aperture_remove_conflicting_framebuffers(),
>   so no problem.
> - vga+mga, which is a direct console driver and entirely bypasses all
>   this.
>
> For the above reasons the cc: stable is just notionally, this patch
> will need a backport and that's up to nvidia if they care enough.
>
> v2:
> - Explain a bit better why other multi-gpu that aren't pci shouldn't
>   have any issues with making all this fully pci specific.
>
> v3
> - polish commit message (Javier)
>
> Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing 
> conflicting FBs")
> Tested-by: Aaron Plattner 
> Reviewed-by: Javier Martinez Canillas 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
> Signed-off-by: Daniel Vetter 
> Cc: Aaron Plattner 
> Cc: Javier Martinez Canillas 
> Cc: Thomas Zimmermann 
> Cc: Helge Deller 
> Cc: Sam Ravnborg 
> Cc: Alex Deucher 
> Cc:  # v5.19+ (if someone else does the backport)
> ---
>  drivers/video/aperture.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 7/8] video/aperture: Only remove sysfb on the default vga pci device

2023-04-05 Thread Daniel Vetter
On Tue, Apr 04, 2023 at 01:59:33PM -0700, Aaron Plattner wrote:
> On 4/4/23 1:18 PM, Daniel Vetter wrote:
> > Instead of calling aperture_remove_conflicting_devices() to remove the
> > conflicting devices, just call to aperture_detach_devices() to detach
> > the device that matches the same PCI BAR / aperture range. Since the
> > former is just a wrapper of the latter plus a sysfb_disable() call,
> > and now that's done in this function but only for the primary devices.
> > 
> > This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
> > sysfb device registration when removing conflicting FBs"), where we
> > remove the sysfb when loading a driver for an unrelated pci device,
> > resulting in the user loosing their efifb console or similar.
> > 
> > Note that in practice this only is a problem with the nvidia blob,
> > because that's the only gpu driver people might install which does not
> > come with an fbdev driver of it's own. For everyone else the real gpu
> > driver will restore a working console.
> 
> It might be worth noting that this also affects devices that have no driver
> installed, or where the driver failed to initialize or was configured not to
> set a mode. E.g. I reproduced this problem on a laptop with i915.modeset=0
> and an NVIDIA driver that calls drm_fbdev_generic_setup. It would also
> reproduce on a system that sets modeset=0 (or has a GPU that's too new for
> its corresponding kernel driver) and that passes an NVIDIA GPU through to a
> VM using vfio-pci since that also calls
> aperture_remove_conflicting_pci_devices.
> 
> I agree that in practice this will mostly affect people with our driver
> until I get my changes to add drm_fbdev_generic_setup checked in. But these
> other cases don't seem all that unlikely to me.

Thomas Z. refactored the entire modeset=0 handling to be more consistent
across drivers, so I think in practice it'll again only happen with the
nvidia blob driver (unless you patch in the call to
drm_firmware_drivers_only()). Or if you dont use nomodeset or similar and
instead use a driver-specific module option, which isn't what howtos in
distros recommend.

I can add this to the commit message if you want?
-Daniel

> 
> -- Aaron
> 
> > Also note that in the referenced bug there's confusion that this same
> > bug also happens on amdgpu. But that was just another amdgpu specific
> > regression, which just happened to happen at roughly the same time and
> > with the same user-observable symptoms. That bug is fixed now, see
> > https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
> > 
> > Note that we should not have any such issues on non-pci multi-gpu
> > issues, because I could only find two such cases:
> > - SoC with some external panel over spi or similar. These panel
> >drivers do not use drm_aperture_remove_conflicting_framebuffers(),
> >so no problem.
> > - vga+mga, which is a direct console driver and entirely bypasses all
> >this.
> > 
> > For the above reasons the cc: stable is just notionally, this patch
> > will need a backport and that's up to nvidia if they care enough.
> > 
> > v2:
> > - Explain a bit better why other multi-gpu that aren't pci shouldn't
> >have any issues with making all this fully pci specific.
> > 
> > v3
> > - polish commit message (Javier)
> > 
> > Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when 
> > removing conflicting FBs")
> > Tested-by: Aaron Plattner 
> > Reviewed-by: Javier Martinez Canillas 
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
> > Signed-off-by: Daniel Vetter 
> > Cc: Aaron Plattner 
> > Cc: Javier Martinez Canillas 
> > Cc: Thomas Zimmermann 
> > Cc: Helge Deller 
> > Cc: Sam Ravnborg 
> > Cc: Alex Deucher 
> > Cc:  # v5.19+ (if someone else does the backport)
> > ---
> >   drivers/video/aperture.c | 7 ---
> >   1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> > index 8f1437339e49..2394c2d310f8 100644
> > --- a/drivers/video/aperture.c
> > +++ b/drivers/video/aperture.c
> > @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct 
> > pci_dev *pdev, const char *na
> > primary = pdev == vga_default_device();
> > +   if (primary)
> > +   sysfb_disable();
> > +
> > for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
> > if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> > continue;
> > base = pci_resource_start(pdev, bar);
> > size = pci_resource_len(pdev, bar);
> > -   ret = aperture_remove_conflicting_devices(base, size, name);
> > -   if (ret)
> > -   return ret;
> > +   aperture_detach_devices(base, size);
> > }
> > if (primary) {

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 7/8] video/aperture: Only remove sysfb on the default vga pci device

2023-04-05 Thread Aaron Plattner

On 4/4/23 1:18 PM, Daniel Vetter wrote:

Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.

This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
sysfb device registration when removing conflicting FBs"), where we
remove the sysfb when loading a driver for an unrelated pci device,
resulting in the user loosing their efifb console or similar.

Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.


It might be worth noting that this also affects devices that have no 
driver installed, or where the driver failed to initialize or was 
configured not to set a mode. E.g. I reproduced this problem on a laptop 
with i915.modeset=0 and an NVIDIA driver that calls 
drm_fbdev_generic_setup. It would also reproduce on a system that sets 
modeset=0 (or has a GPU that's too new for its corresponding kernel 
driver) and that passes an NVIDIA GPU through to a VM using vfio-pci 
since that also calls aperture_remove_conflicting_pci_devices.


I agree that in practice this will mostly affect people with our driver 
until I get my changes to add drm_fbdev_generic_setup checked in. But 
these other cases don't seem all that unlikely to me.


-- Aaron


Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15

Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
   drivers do not use drm_aperture_remove_conflicting_framebuffers(),
   so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
   this.

For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.

v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
   have any issues with making all this fully pci specific.

v3
- polish commit message (Javier)

Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing 
conflicting FBs")
Tested-by: Aaron Plattner 
Reviewed-by: Javier Martinez Canillas 
References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter 
Cc: Aaron Plattner 
Cc: Javier Martinez Canillas 
Cc: Thomas Zimmermann 
Cc: Helge Deller 
Cc: Sam Ravnborg 
Cc: Alex Deucher 
Cc:  # v5.19+ (if someone else does the backport)
---
  drivers/video/aperture.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8f1437339e49..2394c2d310f8 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct 
pci_dev *pdev, const char *na
  
  	primary = pdev == vga_default_device();
  
+	if (primary)

+   sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
  
  		base = pci_resource_start(pdev, bar);

size = pci_resource_len(pdev, bar);
-   ret = aperture_remove_conflicting_devices(base, size, name);
-   if (ret)
-   return ret;
+   aperture_detach_devices(base, size);
}
  
  	if (primary) {


[PATCH 7/8] video/aperture: Only remove sysfb on the default vga pci device

2023-04-04 Thread Daniel Vetter
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.

This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable
sysfb device registration when removing conflicting FBs"), where we
remove the sysfb when loading a driver for an unrelated pci device,
resulting in the user loosing their efifb console or similar.

Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.

Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15

Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
  drivers do not use drm_aperture_remove_conflicting_framebuffers(),
  so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
  this.

For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.

v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
  have any issues with making all this fully pci specific.

v3
- polish commit message (Javier)

Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing 
conflicting FBs")
Tested-by: Aaron Plattner 
Reviewed-by: Javier Martinez Canillas 
References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter 
Cc: Aaron Plattner 
Cc: Javier Martinez Canillas 
Cc: Thomas Zimmermann 
Cc: Helge Deller 
Cc: Sam Ravnborg 
Cc: Alex Deucher 
Cc:  # v5.19+ (if someone else does the backport)
---
 drivers/video/aperture.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 8f1437339e49..2394c2d310f8 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct 
pci_dev *pdev, const char *na
 
primary = pdev == vga_default_device();
 
+   if (primary)
+   sysfb_disable();
+
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
continue;
 
base = pci_resource_start(pdev, bar);
size = pci_resource_len(pdev, bar);
-   ret = aperture_remove_conflicting_devices(base, size, name);
-   if (ret)
-   return ret;
+   aperture_detach_devices(base, size);
}
 
if (primary) {
-- 
2.40.0