On 9/29/23 13:17, Marc-André Lureau wrote: > Hi > > On Wed, Sep 27, 2023 at 7:46 PM Laszlo Ersek <ler...@redhat.com> wrote: >> >> On 9/19/23 15:19, Laszlo Ersek wrote: >>> The fw_cfg DMA write callback in ramfb prepares a new display surface in >>> QEMU; this new surface is put to use ("swapped in") upon the next display >>> update. At that time, the old surface (if any) is released. >>> >>> If the guest triggers the fw_cfg DMA write callback at least twice between >>> two adjacent display updates, then the second callback (and further such >>> callbacks) will leak the previously prepared (but not yet swapped in) >>> display surface. >>> >>> The issue can be shown by: >>> >>> (1) starting QEMU with "-trace displaysurface_free", and >>> >>> (2) running the following program in the guest UEFI shell: >>> >>>> #include <Library/ShellCEntryLib.h> // ShellAppMain() >>>> #include <Library/UefiBootServicesTableLib.h> // gBS >>>> #include <Protocol/GraphicsOutput.h> // >>>> EFI_GRAPHICS_OUTPUT_PROTOCOL >>>> >>>> INTN >>>> EFIAPI >>>> ShellAppMain ( >>>> IN UINTN Argc, >>>> IN CHAR16 **Argv >>>> ) >>>> { >>>> EFI_STATUS Status; >>>> VOID *Interface; >>>> EFI_GRAPHICS_OUTPUT_PROTOCOL *Gop; >>>> UINT32 Mode; >>>> >>>> Status = gBS->LocateProtocol ( >>>> &gEfiGraphicsOutputProtocolGuid, >>>> NULL, >>>> &Interface >>>> ); >>>> if (EFI_ERROR (Status)) { >>>> return 1; >>>> } >>>> >>>> Gop = Interface; >>>> >>>> Mode = 1; >>>> for ( ; ;) { >>>> Status = Gop->SetMode (Gop, Mode); >>>> if (EFI_ERROR (Status)) { >>>> break; >>>> } >>>> >>>> Mode = 1 - Mode; >>>> } >>>> >>>> return 1; >>>> } >>> >>> The symptom is then that: >>> >>> - only one trace message appears periodically, >>> >>> - the time between adjacent messages keeps increasing -- implying that >>> some list structure (containing the leaked resources) keeps growing, >>> >>> - the "surface" pointer is ever different. >>> >>>> 18566@1695127471.449586:displaysurface_free surface=0x7f2fcc09a7c0 >>>> 18566@1695127471.529559:displaysurface_free surface=0x7f2fcc9dac10 >>>> 18566@1695127471.659812:displaysurface_free surface=0x7f2fcc441dd0 >>>> 18566@1695127471.839669:displaysurface_free surface=0x7f2fcc0363d0 >>>> 18566@1695127472.069674:displaysurface_free surface=0x7f2fcc413a80 >>>> 18566@1695127472.349580:displaysurface_free surface=0x7f2fcc09cd00 >>>> 18566@1695127472.679783:displaysurface_free surface=0x7f2fcc1395f0 >>>> 18566@1695127473.059848:displaysurface_free surface=0x7f2fcc1cae50 >>>> 18566@1695127473.489724:displaysurface_free surface=0x7f2fcc42fc50 >>>> 18566@1695127473.969791:displaysurface_free surface=0x7f2fcc45dcc0 >>>> 18566@1695127474.499708:displaysurface_free surface=0x7f2fcc70b9d0 >>>> 18566@1695127475.079769:displaysurface_free surface=0x7f2fcc82acc0 >>>> 18566@1695127475.709941:displaysurface_free surface=0x7f2fcc369c00 >>>> 18566@1695127476.389619:displaysurface_free surface=0x7f2fcc32b910 >>>> 18566@1695127477.119772:displaysurface_free surface=0x7f2fcc0d5a20 >>>> 18566@1695127477.899517:displaysurface_free surface=0x7f2fcc086c40 >>>> 18566@1695127478.729962:displaysurface_free surface=0x7f2fccc72020 >>>> 18566@1695127479.609839:displaysurface_free surface=0x7f2fcc185160 >>>> 18566@1695127480.539688:displaysurface_free surface=0x7f2fcc23a7e0 >>>> 18566@1695127481.519759:displaysurface_free surface=0x7f2fcc3ec870 >>>> 18566@1695127482.549930:displaysurface_free surface=0x7f2fcc634960 >>>> 18566@1695127483.629661:displaysurface_free surface=0x7f2fcc26b140 >>>> 18566@1695127484.759987:displaysurface_free surface=0x7f2fcc321700 >>>> 18566@1695127485.940289:displaysurface_free surface=0x7f2fccaad100 >>> >>> We figured this wasn't a CVE-worthy problem, as only small amounts of >>> memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU >>> only allocates administrative structures), plus libvirt restricts QEMU >>> memory footprint anyway, thus the guest can only DoS itself. >>> >>> Plug the leak, by releasing the last prepared (not yet swapped in) display >>> surface, if any, in the fw_cfg DMA write callback. >>> >>> Regarding the "reproducer", with the fix in place, the log is flooded with >>> trace messages (one per fw_cfg write), *and* the trace message alternates >>> between just two "surface" pointer values (i.e., nothing is leaked, the >>> allocator flip-flops between two objects in effect). >>> >>> This issue appears to date back to the introducion of ramfb (995b30179bdc, >>> "hw/display: add ramfb, a simple boot framebuffer living in guest ram", >>> 2018-06-18). >>> >>> Cc: Gerd Hoffmann <kra...@redhat.com> (maintainer:ramfb) >>> Cc: qemu-sta...@nongnu.org >>> Fixes: 995b30179bdc >>> Signed-off-by: Laszlo Ersek <ler...@redhat.com> >>> --- >>> hw/display/ramfb.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c >>> index 79b9754a5820..c2b002d53480 100644 >>> --- a/hw/display/ramfb.c >>> +++ b/hw/display/ramfb.c >>> @@ -97,6 +97,7 @@ static void ramfb_fw_cfg_write(void *dev, off_t offset, >>> size_t len) >>> >>> s->width = width; >>> s->height = height; >>> + qemu_free_displaysurface(s->ds); >>> s->ds = surface; >>> } > > Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> > > Incidentally I found the same issue: > https://patchew.org/QEMU/20230920082634.3349487-1-marcandre.lur...@redhat.com/
Which patch is better? I certainly didn't think of g_clear_pointer(); is that more idiomatic? > > > fwiw, my migration support patch is still unreviewed: > https://patchew.org/QEMU/20230920082651.3349712-1-marcandre.lur...@redhat.com/ > I don't have a copy of that patch (not subscribed, sorry...), but how simply you did it surprises me. I did expect to simulate an fw_cfg write in post_load, but I thought we'd approach the device (for the sake of including it in the migration stream) from the higher level device's vmstate descriptors (dc->vmsd) that set up / depend on ramfb in the first place. I didn't know that raw vmstate_register() was still accepted. If it is, then, for that patch (with Gerd's comment addressed): Acked-by: Laszlo Ersek <ler...@redhat.com> BTW: can you please assign <https://bugzilla.redhat.com/show_bug.cgi?id=1859424> to yourself and link your patch into it? The reason we ended up duplicating work here is that noone took RHBZ 1859424 before. ... Well, the ticket is RHEL-7478 in JIRA now, in fact. :/ Thanks! Laszlo