Re: DRM-based Oops viewer

2019-03-10 Thread Martin Steigerwald
Hell Ahmed.

Ahmed S. Darwish - 10.03.19, 02:31:
> Hello DRM/UEFI maintainers,
> 
> Several years ago, I wrote a set of patches to dump the kernel
> log to disk upon panic -- through BIOS INT 0x13 services. [1]
> 
> The overwhelming response was that it's unsafe to do this in a
> generic manner. Linus proposed a video-based viewer instead: [2]
[…]
> Of course it's 2019 now though, and it's quite known that
> Intel is officially obsoleting the PC/AT BIOS by 2020.. [3]
[…]
> The maximum possible that UEFI can provide is a GOP-provided
> framebuffer that's ready to use by the OS -- even after the UEFI
> boot phase is marked as done through ExitBootServices(). [5]
> 
> Of course, once native drivers like i915 or radeon take over,
> such a framebuffer is toast... [6]
> 
> Thus a possible remaining option, is to display the oops through
> "minimal" DRM drivers provided for each HW variant... Since
> these special drivers will run only and fully under a panic()
> context though, several constraints exist:

Thank you for your idea and willingness to work on something like this.

As a user I'd very much favor a solution that could not only work with 
UEFI but with other firmwares. I still dream to be able to buy a laptop 
with up to date hardware and with Coreboot/Libreboot at some time.

While this would not solve all "I just freeze" kind of crashes, it may 
at least give some information about some of them. When testing rc 
kernels I often enough faced "I just freeze" crashes that just happened 
*sometimes*. On a machine that I also use for production work I find it  
infeasible to debug it as bisecting could take a long, long time. And 
well the machine could just crash every moment… even during doing 
important work with it.

In my ideal world an operating system would never ever crash or hang 
without telling why. Well it would not crash or hang at all… but there 
you go. Maybe some time with a widely usable micro kernel based OS that 
can restart device drivers in a broken state – at least almost. No 
discussion of that micro kernel topic required here. :)

Thanks,
-- 
Martin




DRM-based Oops viewer

2019-03-09 Thread Ahmed S. Darwish
Hello DRM/UEFI maintainers,

Several years ago, I wrote a set of patches to dump the kernel
log to disk upon panic -- through BIOS INT 0x13 services. [1]

The overwhelming response was that it's unsafe to do this in a
generic manner. Linus proposed a video-based viewer instead: [2]

If you want to do the BIOS services thing, do it for video: copy the
oops to low RAM, return to real mode, re-run the graphics card POST
routines to initialize text-mode, and use the BIOS to print out the
oops.  That is WAY less scary than writing to disk.

Of course it's 2019 now though, and it's quite known that
Intel is officially obsoleting the PC/AT BIOS by 2020.. [3]

Researching whether this can be done from UEFI, it was also clear
that UEFI "Runtime Services" do not provide any re-initialization
routines. [4]

The maximum possible that UEFI can provide is a GOP-provided
framebuffer that's ready to use by the OS -- even after the UEFI
boot phase is marked as done through ExitBootServices(). [5]

Of course, once native drivers like i915 or radeon take over,
such a framebuffer is toast... [6]

Thus a possible remaining option, is to display the oops through
"minimal" DRM drivers provided for each HW variant... Since
these special drivers will run only and fully under a panic()
context though, several constraints exist:

  - The code should be fully synchronous (irqs are disabled)
  - It should not allocate any dynamic memory
  - It should make minimal assumptions about HW state
  - It should not chain into any other kernel subsystem
  - It has ample freedom to use delay-based loops and the
like, the kernel is already dead.

How feasible is it to have such a special "DRM viewoops"
framework + its minimal drivers in the kernel?

The target is to start from i915, since that's what in my
laptop now, and work from there..

Some final notes:

  - The NT kernel has a similar concept, but for storage instead.
They're used to dump core under kernel panic() situations,
and are called "Minoport storage drivers". [7]

  - Since Windows 7+, a very fancy Blue Screen of Death is
displayed, with Unicode and whatnot, implying GPU drivers
involvement. [8]

  - Mac OS X also does something similar [9]

  - On Linux laptops, the current situation is _really_ bad.

In any graphical session, type "echo c > /proc/sysrq-trigger";
the screen will just completely freeze...

Desired first goal: just print the panic() log

Thanks a lot,

[1] https://lore.kernel.org/lkml/20110125134748.GA10051@laptop
[2] 
https://lore.kernel.org/lkml/AANLkTinU0KYiCd4p=z+=ojbkeeot2g+cayvdru02k...@mail.gmail.com

[3] 
https://uefi.org/sites/default/files/resources/Brian_Richardson_Intel_Final.pdf

[4] UEFI v2.7 spec, Chapter 8, "Services — Runtime Services"
[5] UEFI v2.7 spec, Section 12.9, "Graphics Output Protocol"
"The Graphics Output Protocol supports this capability by
 providing the EFI OS loader access to a hardware frame buffer
 and enough information to allow the OS to draw directly to
 the graphics output device."

[6] linux/drivers/gpu/drm/i915/i915_drv.c::i915_kick_out_firmware_fb()
linux/drivers/gpu/drm/radeon/radeon_drv.c::radeon_pci_probe()

[7] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/restrictions-on-miniport-drivers-that-manage-the-boot-drive

[8] 
https://upload.wikimedia.org/wikipedia/commons/archive/5/56/20181019151937%21Bsodwindows10.png
[9] 
https://upload.wikimedia.org/wikipedia/commons/4/4a/Mac_OS_X_10.2_Kernel_Panic.jpg

--darwi
http://darwish.chasingpointers.com