Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-11 Thread Gerd Hoffmann
On Fri, Dec 10, 2021 at 07:54:34PM -0500, Felix Kuehling wrote:
 
> Do you actually need to restore the exact boot-up mode? If you have the same
> framebuffer memory layout (width, height, bpp, stride) the precise display
> timing doesn't really matter. So we "just" need to switch to a mode that's
> compatible with the efifb framebuffer parameters and point the display
> engine at the efifb as the scan-out buffer.

That'll probably doable for a normal kexec but in case of a crashdump
kexec I don't think it is a good idea to touch the gpu using the driver
of the kernel which just crashed ...

take care,
  Gerd



Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Felix Kuehling

On 2021-12-10 10:13 a.m., Christian König wrote:



Am 10.12.21 um 15:25 schrieb Guilherme G. Piccoli:

On 10/12/2021 11:16, Alex Deucher wrote:> [...]

Why not just reload the driver after kexec?

Alex

Because the original issue is the kdump case, and we want a very very
tiny kernel - also, the crash originally could have been caused by
amdgpu itself, so if it's a GPU issue, we don't want to mess with that
in kdump. And I confess I tried modprobe amdgpu after a kdump, no
success - kdump won't call shutdown handlers, so GPU will be in a
"rogue" state...

My question was about regular kexec because it's much simpler usually,
we can do whatever we want there. My line of thought was: if I make it
work in regular kexec with a simple framebuffer, I might be able to get
it working on kdump heheh


How about issuing a PCIe reset and re-initializing the ASIC with just 
the VBIOS?


That should be pretty straightforward I think.


Do you actually need to restore the exact boot-up mode? If you have the 
same framebuffer memory layout (width, height, bpp, stride) the precise 
display timing doesn't really matter. So we "just" need to switch to a 
mode that's compatible with the efifb framebuffer parameters and point 
the display engine at the efifb as the scan-out buffer.


Regards,
  Felix




Christian.


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Alex Deucher
On Fri, Dec 10, 2021 at 10:24 AM Guilherme G. Piccoli
 wrote:
>
> On 10/12/2021 12:13, Christian König wrote:
> > [...]
> > How about issuing a PCIe reset and re-initializing the ASIC with just
> > the VBIOS?
> >
> > That should be pretty straightforward I think.
> >
> > Christian.
>
>
> Thanks Christian, that'd be perfect! Is it feasible? Per Alex comment,
> we'd need to run atombios commands to reprogram the timings, display
> info, etc...like a small driver would do, a full init.
>

You need the equivalent of a GOP driver or a full GPU driver.  I think
it would be less effort to just fix up any problems amdgpu has when
trying to load after the crash than to write a new mini driver.  By
the time you add everything you'd need, you'd be pretty close to a
full GPU driver.

> Also, what kind of PCIe reset is recommended for this adapter? Like a
> hot reset, powering-off/re-power, FLR or that MODE2 reset present in
> amdgpu code? Remembering this is an APU device.

You'd need to issue the relevant device specific reset sequence.  It
would be a mode2 reset on vangogh, but varies on other asics.  It
would probably be easiest to just fix up the logic in amdgpu to detect
bad GPU state on driver load and do a GPU reset before driver init.
We already have the logic in place for some dGPUs, but APUs only
recently got full GPU reset support due to architectural limitations
and hardware bugs.

Alex


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Alex Deucher
On Fri, Dec 10, 2021 at 9:25 AM Guilherme G. Piccoli
 wrote:
>
> On 10/12/2021 11:16, Alex Deucher wrote:> [...]
> > Why not just reload the driver after kexec?
> >
> > Alex
>
> Because the original issue is the kdump case, and we want a very very
> tiny kernel - also, the crash originally could have been caused by
> amdgpu itself, so if it's a GPU issue, we don't want to mess with that
> in kdump. And I confess I tried modprobe amdgpu after a kdump, no
> success - kdump won't call shutdown handlers, so GPU will be in a
> "rogue" state...
>
> My question was about regular kexec because it's much simpler usually,
> we can do whatever we want there. My line of thought was: if I make it
> work in regular kexec with a simple framebuffer, I might be able to get
> it working on kdump heheh
>

Well if the GPU is hung, I'm not sure if you'll be able to get back
the display environment without a GPU reset and once you do that,
you've lost any state you might have been trying to preserve.

Alex


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Christian König

Am 10.12.21 um 16:24 schrieb Guilherme G. Piccoli:

On 10/12/2021 12:13, Christian König wrote:

[...]
How about issuing a PCIe reset and re-initializing the ASIC with just
the VBIOS?

That should be pretty straightforward I think.

Christian.


Thanks Christian, that'd be perfect! Is it feasible? Per Alex comment,
we'd need to run atombios commands to reprogram the timings, display
info, etc...like a small driver would do, a full init.

Also, what kind of PCIe reset is recommended for this adapter? Like a
hot reset, powering-off/re-power, FLR or that MODE2 reset present in
amdgpu code? Remembering this is an APU device.


Well, Alex is the expert on that.

APU makes the whole thing pretty tricky since the VBIOS is part of the 
system BIOS there and I'm not sure you can only re-initialize the GPU 
without a complete reset.


On dGPUs just making sure the ROM is mapped and calling the VESA modeset 
BIOS functions might already do the trick.


Christian.



Thanks a lot!





Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Guilherme G. Piccoli
On 10/12/2021 12:13, Christian König wrote:
> [...]
> How about issuing a PCIe reset and re-initializing the ASIC with just 
> the VBIOS?
> 
> That should be pretty straightforward I think.
> 
> Christian.


Thanks Christian, that'd be perfect! Is it feasible? Per Alex comment,
we'd need to run atombios commands to reprogram the timings, display
info, etc...like a small driver would do, a full init.

Also, what kind of PCIe reset is recommended for this adapter? Like a
hot reset, powering-off/re-power, FLR or that MODE2 reset present in
amdgpu code? Remembering this is an APU device.

Thanks a lot!



Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Christian König




Am 10.12.21 um 15:25 schrieb Guilherme G. Piccoli:

On 10/12/2021 11:16, Alex Deucher wrote:> [...]

Why not just reload the driver after kexec?

Alex

Because the original issue is the kdump case, and we want a very very
tiny kernel - also, the crash originally could have been caused by
amdgpu itself, so if it's a GPU issue, we don't want to mess with that
in kdump. And I confess I tried modprobe amdgpu after a kdump, no
success - kdump won't call shutdown handlers, so GPU will be in a
"rogue" state...

My question was about regular kexec because it's much simpler usually,
we can do whatever we want there. My line of thought was: if I make it
work in regular kexec with a simple framebuffer, I might be able to get
it working on kdump heheh


How about issuing a PCIe reset and re-initializing the ASIC with just 
the VBIOS?


That should be pretty straightforward I think.

Christian.


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Guilherme G. Piccoli
On 10/12/2021 11:16, Alex Deucher wrote:> [...]
> Why not just reload the driver after kexec?
> 
> Alex

Because the original issue is the kdump case, and we want a very very
tiny kernel - also, the crash originally could have been caused by
amdgpu itself, so if it's a GPU issue, we don't want to mess with that
in kdump. And I confess I tried modprobe amdgpu after a kdump, no
success - kdump won't call shutdown handlers, so GPU will be in a
"rogue" state...

My question was about regular kexec because it's much simpler usually,
we can do whatever we want there. My line of thought was: if I make it
work in regular kexec with a simple framebuffer, I might be able to get
it working on kdump heheh




Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Alex Deucher
On Fri, Dec 10, 2021 at 9:09 AM Guilherme G. Piccoli
 wrote:
>
> Thanks a lot Alex / Gerd and Thomas, very informative stuff! I'm glad
> there are projects to collect/save the data and reuse after a kdump,
> this is very useful.
>
> I'll continue my study on the atombios thing of AMD and QXL, maybe at
> least we can make it work in qemu, that'd be great (like a small
> initdriver to reprogram de paravirtual device on kexec boot).

Why not just reload the driver after kexec?

Alex


>
> Cheers,
>
>
> Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Guilherme G. Piccoli
Thanks a lot Alex / Gerd and Thomas, very informative stuff! I'm glad
there are projects to collect/save the data and reuse after a kdump,
this is very useful.

I'll continue my study on the atombios thing of AMD and QXL, maybe at
least we can make it work in qemu, that'd be great (like a small
initdriver to reprogram de paravirtual device on kexec boot).

Cheers,


Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Thomas Zimmermann

Hi

Am 09.12.21 um 19:17 schrieb Guilherme G. Piccoli:

Thanks again Alex! Some comments inlined below:

On 09/12/2021 15:06, Alex Deucher wrote:

Not really in a generic way.  It's asic and platform specific.  In
addition most modern displays require link training to bring up the
display, so you can't just save and restore registers.


Oh sure, I understand that. My question is more like: is there a way,
inside amdgpu driver, to save this state before taking
over/overwriting/reprogramming the device? So we could (again, from
inside the amdgpu driver) dump this pre-saved state in the shutdown
handler, for example, having the device in a "pre-OS" state when the new
kexec'ed kernel starts.


We have have been talking about reading out and storing state of active 
devices within DRM. So far nothing usable has emerged. In a distant 
future, kexec might be able to store information about the active 
framebuffer and the new kernel's simpledrm (or some other driver) could 
use it as output.


But don't hold your breath for it. It won't happen anytime soon.

Best regards
Thomas





The drivers are asic and platform specific.  E.g., the driver for
vangogh is different from renoir is different from skylake, etc.  The
display programming interfaces are asic specific.


Cool, that makes sense! But if you (or anybody here) know some of these
GOP drivers, e.g. for the qemu/qxl device, I'm just curious to
see/understand how complex is the FW driver to just put the
device/screen in a usable state.

Cheers,


Guilherme



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Gerd Hoffmann
  Hi,

> > The drivers are asic and platform specific.  E.g., the driver for
> > vangogh is different from renoir is different from skylake, etc.  The
> > display programming interfaces are asic specific.
> 
> Cool, that makes sense! But if you (or anybody here) know some of these
> GOP drivers, e.g. for the qemu/qxl device,

OvmfPkg/QemuVideoDxe in tianocore source tree.

> I'm just curious to see/understand how complex is the FW driver to
> just put the device/screen in a usable state.

Note that qemu has a paravirtual interface for vesa vga mode programming
where you basically program a handful of registers with xres, yres,
depth etc. (after resetting the device to put it into vga compatibility
mode) and you are done.

Initializing physical hardware is an order of magnitude harder than
that.

With qxl you could also go figure the current state of the hardware and
fill screen_info with that to get a working boot framebuffer in the
kexec'ed kernel.

Problem with this approach is this works only in case the framebuffer
happens to be in a format usable by vesafb/efifb.  So no modifiers
(tiling etc.) and continuous in physical address space.  That is true
for qxl.  With virtio-gpu it wouldn't work though (framebuffer can be
scattered), and I expect with most modern physical hardware it wouldn't
work either.

take care,
  Gerd



Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Guilherme G. Piccoli
Hi all, I have a question about the possibility of reusing a framebuffer
after a regular (or panic) kexec - my case is with amdgpu (APU, aka, not
a separate GPU hardware), but I guess the question is kinda generic
hence I've looped most of the lists / people I think does make sense
(apologies for duplicates).


The context is: we have a hardware that has an amdgpu-controlled device
(Vangogh model) and as soon as the machine boots, efifb is providing
graphics - I understand the UEFI/GRUB outputs rely in EFI framebuffer as
well. As soon amdgpu module is available, kernel loads it and it takes
over the GPU, providing graphics. The kexec_file_load syscall allows to
pass a valid screen_info structure, so by kexec'ing a new kernel, we
have again efifb taking over on boot time, but this time I see nothing
in the screen. I've manually blacklisted amdgpu in this new kexec'ed
kernel, I'd like to rely in the simple framebuffer - the goal is to have
a tiny kernel kexec'ed. I'm using kernel version 5.16.0-rc4.

I've done some other experiments, for exemple: I've forced screen_info
model to match VLFB, so vesafb took over after the kexec, with the same
result. Also noticed that BusMaster bit was off after kexec, in the AMD
APU PCIe device, so I've set it on efifb before probe, and finally
tested the same things in qemu, with qxl, all with the same result
(blank screen).
The most interesting result I got (both with amdgpu and qemu/qxl) is
that if I blacklist these drivers and let the machine continue using
efifb since the beginning, after kexec the efifb is still able to
produce graphics.

Which then led me to think that likely there's something fundamentally
"blocking" the reuse of the simple framebuffer after kexec, like maybe
DRM stack is destroying the old framebuffer somehow? What kind of
preparation is required at firmware level to make the simple EFI VGA
framebuffer work, and could we perform this in a kexec (or "save it"
before the amdgpu/qxl drivers take over and reuse later)?

Any advice is greatly appreciated!
Thanks in advance,


Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Guilherme G. Piccoli
On 09/12/2021 14:31, Alex Deucher wrote:
> [...] 
> Once the driver takes over, none of the pre-driver state is retained.
> You'll need to load the driver in the new kernel to initialize the
> displays.  Note the efifb doesn't actually have the ability to program
> any hardware, it just takes over the memory region that was used for
> the pre-OS framebuffer and whatever display timing was set up by the
> GOP driver prior to the OS loading.  Once that OS driver has loaded
> the area is gone and the display configuration may have changed.
> 

Hi Christian and Alex, thanks for the clarifications!

Is there any way to save/retain this state before amdgpu takes over?
Would simpledrm be able to program the device again, to a working state?

Finally, do you have any example of such a GOP driver (open source) so I
can take a look? I tried to find something like that in Tianocore
project, but didn't find anything that seemed useful for my issue.

Thanks again!


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Guilherme G. Piccoli
Thanks again Alex! Some comments inlined below:

On 09/12/2021 15:06, Alex Deucher wrote:
> Not really in a generic way.  It's asic and platform specific.  In
> addition most modern displays require link training to bring up the
> display, so you can't just save and restore registers.

Oh sure, I understand that. My question is more like: is there a way,
inside amdgpu driver, to save this state before taking
over/overwriting/reprogramming the device? So we could (again, from
inside the amdgpu driver) dump this pre-saved state in the shutdown
handler, for example, having the device in a "pre-OS" state when the new
kexec'ed kernel starts.

> 
> The drivers are asic and platform specific.  E.g., the driver for
> vangogh is different from renoir is different from skylake, etc.  The
> display programming interfaces are asic specific.

Cool, that makes sense! But if you (or anybody here) know some of these
GOP drivers, e.g. for the qemu/qxl device, I'm just curious to
see/understand how complex is the FW driver to just put the
device/screen in a usable state.

Cheers,


Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Alex Deucher
On Thu, Dec 9, 2021 at 1:18 PM Guilherme G. Piccoli  wrote:
>
> Thanks again Alex! Some comments inlined below:
>
> On 09/12/2021 15:06, Alex Deucher wrote:
> > Not really in a generic way.  It's asic and platform specific.  In
> > addition most modern displays require link training to bring up the
> > display, so you can't just save and restore registers.
>
> Oh sure, I understand that. My question is more like: is there a way,
> inside amdgpu driver, to save this state before taking
> over/overwriting/reprogramming the device? So we could (again, from
> inside the amdgpu driver) dump this pre-saved state in the shutdown
> handler, for example, having the device in a "pre-OS" state when the new
> kexec'ed kernel starts.

Sure, it could be done, it's just a fair amount of work.  Things like
legacy vga text mode is a bit more of a challenge, but that tends to
be less relevant as non-legacy UEFI becomes more pervasive.

>
> >
> > The drivers are asic and platform specific.  E.g., the driver for
> > vangogh is different from renoir is different from skylake, etc.  The
> > display programming interfaces are asic specific.
>
> Cool, that makes sense! But if you (or anybody here) know some of these
> GOP drivers, e.g. for the qemu/qxl device, I'm just curious to
> see/understand how complex is the FW driver to just put the
> device/screen in a usable state.

Most of the asic init and display setup on AMD GPUs is handled via
atombios command tables (basically little scripted stored in the
vbios) which are shared by the driver and the GOP driver for most
programming sequences.  In our case, the GOP driver is pretty simple.
Take a look at the pre-DC display code in amdgpu to see what a basic
display driver would look like (e.g., dce_v11_0.c).  The GOP driver
would call the atombios asic_init table to make sure the chip itself
is initialized (e.g., memory controller, etc.), then walk the display
data tables in the vbios to determine the display configuration
specific to this board, then probe the displays and use the atombios
display command tables to light them up.

Alex


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Alex Deucher
On Thu, Dec 9, 2021 at 1:00 PM Guilherme G. Piccoli  wrote:
>
> On 09/12/2021 14:31, Alex Deucher wrote:
> > [...]
> > Once the driver takes over, none of the pre-driver state is retained.
> > You'll need to load the driver in the new kernel to initialize the
> > displays.  Note the efifb doesn't actually have the ability to program
> > any hardware, it just takes over the memory region that was used for
> > the pre-OS framebuffer and whatever display timing was set up by the
> > GOP driver prior to the OS loading.  Once that OS driver has loaded
> > the area is gone and the display configuration may have changed.
> >
>
> Hi Christian and Alex, thanks for the clarifications!
>
> Is there any way to save/retain this state before amdgpu takes over?

Not really in a generic way.  It's asic and platform specific.  In
addition most modern displays require link training to bring up the
display, so you can't just save and restore registers.

> Would simpledrm be able to program the device again, to a working state?

No.  You need an asic specific driver that knows how to program the
specific hardware.  It's also platform specific in that you need to
determine platform specific details such as the number and type of
display connectors and encoders that are present on the system.

>
> Finally, do you have any example of such a GOP driver (open source) so I
> can take a look? I tried to find something like that in Tianocore
> project, but didn't find anything that seemed useful for my issue.

The drivers are asic and platform specific.  E.g., the driver for
vangogh is different from renoir is different from skylake, etc.  The
display programming interfaces are asic specific.

Alex


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Alex Deucher
On Thu, Dec 9, 2021 at 12:04 PM Guilherme G. Piccoli
 wrote:
>
> Hi all, I have a question about the possibility of reusing a framebuffer
> after a regular (or panic) kexec - my case is with amdgpu (APU, aka, not
> a separate GPU hardware), but I guess the question is kinda generic
> hence I've looped most of the lists / people I think does make sense
> (apologies for duplicates).
>
>
> The context is: we have a hardware that has an amdgpu-controlled device
> (Vangogh model) and as soon as the machine boots, efifb is providing
> graphics - I understand the UEFI/GRUB outputs rely in EFI framebuffer as
> well. As soon amdgpu module is available, kernel loads it and it takes
> over the GPU, providing graphics. The kexec_file_load syscall allows to
> pass a valid screen_info structure, so by kexec'ing a new kernel, we
> have again efifb taking over on boot time, but this time I see nothing
> in the screen. I've manually blacklisted amdgpu in this new kexec'ed
> kernel, I'd like to rely in the simple framebuffer - the goal is to have
> a tiny kernel kexec'ed. I'm using kernel version 5.16.0-rc4.
>
> I've done some other experiments, for exemple: I've forced screen_info
> model to match VLFB, so vesafb took over after the kexec, with the same
> result. Also noticed that BusMaster bit was off after kexec, in the AMD
> APU PCIe device, so I've set it on efifb before probe, and finally
> tested the same things in qemu, with qxl, all with the same result
> (blank screen).
> The most interesting result I got (both with amdgpu and qemu/qxl) is
> that if I blacklist these drivers and let the machine continue using
> efifb since the beginning, after kexec the efifb is still able to
> produce graphics.
>
> Which then led me to think that likely there's something fundamentally
> "blocking" the reuse of the simple framebuffer after kexec, like maybe
> DRM stack is destroying the old framebuffer somehow? What kind of
> preparation is required at firmware level to make the simple EFI VGA
> framebuffer work, and could we perform this in a kexec (or "save it"
> before the amdgpu/qxl drivers take over and reuse later)?
>

Once the driver takes over, none of the pre-driver state is retained.
You'll need to load the driver in the new kernel to initialize the
displays.  Note the efifb doesn't actually have the ability to program
any hardware, it just takes over the memory region that was used for
the pre-OS framebuffer and whatever display timing was set up by the
GOP driver prior to the OS loading.  Once that OS driver has loaded
the area is gone and the display configuration may have changed.

Alex


> Any advice is greatly appreciated!
> Thanks in advance,
>
>
> Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Christian König

Hi Guilherme,

Am 09.12.21 um 17:00 schrieb Guilherme G. Piccoli:

Hi all, I have a question about the possibility of reusing a framebuffer
after a regular (or panic) kexec - my case is with amdgpu (APU, aka, not
a separate GPU hardware), but I guess the question is kinda generic
hence I've looped most of the lists / people I think does make sense
(apologies for duplicates).


The context is: we have a hardware that has an amdgpu-controlled device
(Vangogh model) and as soon as the machine boots, efifb is providing
graphics - I understand the UEFI/GRUB outputs rely in EFI framebuffer as
well. As soon amdgpu module is available, kernel loads it and it takes
over the GPU, providing graphics. The kexec_file_load syscall allows to
pass a valid screen_info structure, so by kexec'ing a new kernel, we
have again efifb taking over on boot time, but this time I see nothing
in the screen. I've manually blacklisted amdgpu in this new kexec'ed
kernel, I'd like to rely in the simple framebuffer - the goal is to have
a tiny kernel kexec'ed. I'm using kernel version 5.16.0-rc4.

I've done some other experiments, for exemple: I've forced screen_info
model to match VLFB, so vesafb took over after the kexec, with the same
result. Also noticed that BusMaster bit was off after kexec, in the AMD
APU PCIe device, so I've set it on efifb before probe, and finally
tested the same things in qemu, with qxl, all with the same result
(blank screen).
The most interesting result I got (both with amdgpu and qemu/qxl) is
that if I blacklist these drivers and let the machine continue using
efifb since the beginning, after kexec the efifb is still able to
produce graphics.

Which then led me to think that likely there's something fundamentally
"blocking" the reuse of the simple framebuffer after kexec, like maybe
DRM stack is destroying the old framebuffer somehow? What kind of
preparation is required at firmware level to make the simple EFI VGA
framebuffer work, and could we perform this in a kexec (or "save it"
before the amdgpu/qxl drivers take over and reuse later)?


unfortunately what you try here will most likely not work easily.

During bootup the ASIC is initialized in a VGA compatibility mode by the 
VBIOS which also allows efifb to display something. And among the first 
things amdgpu does is to disable this compatibility mode :)


What you need to do to get this working again is to issue a PCIe reset 
of the GPU and then re-init the ASIC with the VBIOS tables.


Alex should know more details about how to do this.

Regards,
Christian.



Any advice is greatly appreciated!
Thanks in advance,


Guilherme