Hi Alex, On Mon, 21 Aug 2023 at 14:20, Alexander Graf <[email protected]> wrote: > > > On 21.08.23 21:57, Simon Glass wrote: > > Hi Alex, > > > > On Mon, 21 Aug 2023 at 13:33, Alexander Graf <[email protected]> wrote: > >> > >> On 21.08.23 21:11, Simon Glass wrote: > >>> Hi Alper, > >>> > >>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <[email protected]> > >>> wrote: > >>>> This is a rebase of Alexander Graf's video damage tracking series, with > >>>> some tests and other changes. The original cover letter is as follows: > >>>> > >>>>> This patch set speeds up graphics output on ARM by a factor of 60x. > >>>>> > >>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached, > >>>>> but need it accessible by the display controller which reads directly > >>>>> from a later point of consistency. Hence, we flush the frame buffer to > >>>>> DRAM on every change. The full frame buffer. > >>> It should not, see below. > >>> > >>>>> Unfortunately, with the advent of 4k displays, we are seeing frame > >>>>> buffers > >>>>> that can take a while to flush out. This was reported by Da Xue with > >>>>> grub, > >>>>> which happily print 1000s of spaces on the screen to draw a menu. Every > >>>>> printed space triggers a cache flush. > >>> That is a bug somewhere in EFI. > >> > >> Unfortunately not :). You may call it a bug in grub: It literally prints > >> over space characters for every character in its menu that it wants > >> cleared. On every text screen draw. > >> > >> This wouldn't be a big issue if we only flush the reactangle that gets > >> modified. But without this patch set, we're flushing the full DRAM > >> buffer on every u-boot text console character write, which means for > >> every character (as that's the only API UEFI has). > >> > >> As a nice side effect, we speed up the normal U-Boot text console as > >> well with this patch set, because even "normal" text prints that write > >> for example a single line of text on the screen today flush the full > >> frame buffer to DRAM. > > No, I mean that it is a bug that U-Boot (apparently) flushes the cache > > after every character. It doesn't do that for normal character output > > and I don't think it makes sense to do it for EFI either. > > > I see. Let's trace the calls: > > efi_cout_output_string() > -> fputs() > -> vidconsole_puts() > -> video_sync() > -> flush_dcache_range() > > Unfortunately grub abstracts character backends down to the "print > character" level, so it calls UEFI's sopisticated "output_string" > callback with single characters at a time, which means we do a full > dcache flush for every character that we print: > > https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165 > > > > > >> > >>>>> This patch set implements the easiest mitigation against this problem: > >>>>> Damage tracking. We remember the lowest common denominator region that > >>>>> was > >>>>> touched since the last video_sync() call and only flush that. The most > >>>>> typical writer to the frame buffer is the video console, which always > >>>>> writes rectangles of characters on the screen and syncs afterwards. > >>>>> > >>>>> With this patch set applied, we reduce drawing a large grub menu (with > >>>>> serial console attached for size information) on an RK3399-ROC system > >>>>> at 1440p from 55 seconds to less than 1 second. > >>>>> > >>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing its > >>>>> overhead compared to before as well. So even x86 systems should be > >>>>> faster > >>>>> with this now :). > >>>>> > >>>>> > >>>>> Alternatives considered: > >>>>> > >>>>> 1) Lazy sync - Sandbox does this. It only calls video_sync(true) > >>>>> ever > >>>>> so often. We are missing timers to do this generically. > >>>>> > >>>>> 2) Double buffering - We could try to identify whether anything > >>>>> changed > >>>>> at all and only draw to the FB if it did. That would require > >>>>> maintaining a second buffer that we need to scan. > >>>>> > >>>>> 3) Text buffer - Maintain a buffer of all text printed on the > >>>>> screen with > >>>>> respective location. Don't write if the old and new character are > >>>>> identical. This would limit applicability to text only and is an > >>>>> optimization on top of this patch set. > >>>>> > >>>>> 4) Hash screen lines - Create a hash (sha256?) over every line when > >>>>> it > >>>>> changes. Only flush when it does. I'm not sure if this would > >>>>> waste > >>>>> more time, memory and cache than the current approach. It would > >>>>> make > >>>>> full screen updates much more expensive. > >>> 5) Fix the bug mentioned above? > >>> > >>>> Changes in v5: > >>>> - Add patch "video: test: Split copy frame buffer check into a function" > >>>> - Add patch "video: test: Support checking copy frame buffer contents" > >>>> - Add patch "video: test: Test partial updates of hardware frame buffer" > >>>> - Use xstart, ystart, xend, yend as names for damage region > >>>> - Document damage struct and fields in struct video_priv comment > >>>> - Return void from video_damage() > >>>> - Fix undeclared priv error in video_sync() > >>>> - Drop unused headers from video-uclass.c > >>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED() > >>>> - Call video_damage() also in video_fill_part() > >>>> - Use met->baseline instead of priv->baseline > >>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH > >>>> - Update console_rotate.c video_damage() calls to pass video tests > >>>> - Remove mention about not having minimal damage for console_rotate.c > >>>> - Add patch "video: test: Test video damage tracking via vidconsole" > >>>> - Document new vdev field in struct efi_gop_obj comment > >>>> - Remove video_sync_copy() also from video_fill(), video_fill_part() > >>>> - Fix memmove() calls by removing the extra dev argument > >>>> - Call video_sync() before checking copy_fb in video tests > >>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it > >>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS > >>>> > >>>> v4: https://lore.kernel.org/all/[email protected]/ > >>>> > >>>> Changes in v4: > >>>> - Move damage clear to patch "dm: video: Add damage tracking API" > >>>> - Simplify first damage logic > >>>> - Remove VIDEO_DAMAGE default for ARM > >>>> - Skip damage on EfiBltVideoToBltBuffer > >>>> - Add patch "video: Always compile cache flushing code" > >>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it" > >>>> > >>>> v3: https://lore.kernel.org/all/[email protected]/ > >>>> > >>>> Changes in v3: > >>>> - Adapt to always assume DM is used > >>>> - Adapt to always assume DM is used > >>>> - Make VIDEO_COPY always select VIDEO_DAMAGE > >>>> > >>>> v2: https://lore.kernel.org/all/[email protected]/ > >>>> > >>>> Changes in v2: > >>>> - Remove ifdefs > >>>> - Fix ranges in truetype target > >>>> - Limit rotate to necessary damage > >>>> - Remove ifdefs from gop > >>>> - Fix dcache range; we were flushing too much before > >>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY" > >>>> > >>>> v1: https://lore.kernel.org/all/[email protected]/ > >>>> > >>>> Alexander Graf (9): > >>>> dm: video: Add damage tracking API > >>>> dm: video: Add damage notification on display fills > >>>> vidconsole: Add damage notifications to all vidconsole drivers > >>>> video: Add damage notification on bmp display > >>>> efi_loader: GOP: Add damage notification on BLT > >>>> video: Only dcache flush damaged lines > >>>> video: Use VIDEO_DAMAGE for VIDEO_COPY > >>>> video: Always compile cache flushing code > >>>> video: Enable VIDEO_DAMAGE for drivers that need it > >>>> > >>>> Alper Nebi Yasak (4): > >>>> video: test: Split copy frame buffer check into a function > >>>> video: test: Support checking copy frame buffer contents > >>>> video: test: Test partial updates of hardware frame buffer > >>>> video: test: Test video damage tracking via vidconsole > >>>> > >>>> arch/arm/mach-omap2/omap3/Kconfig | 1 + > >>>> arch/arm/mach-sunxi/Kconfig | 1 + > >>>> drivers/video/Kconfig | 26 +++ > >>>> drivers/video/console_normal.c | 27 ++-- > >>>> drivers/video/console_rotate.c | 94 +++++++---- > >>>> drivers/video/console_truetype.c | 37 +++-- > >>>> drivers/video/exynos/Kconfig | 1 + > >>>> drivers/video/imx/Kconfig | 1 + > >>>> drivers/video/meson/Kconfig | 1 + > >>>> drivers/video/rockchip/Kconfig | 1 + > >>>> drivers/video/stm32/Kconfig | 1 + > >>>> drivers/video/tegra20/Kconfig | 1 + > >>>> drivers/video/tidss/Kconfig | 1 + > >>>> drivers/video/vidconsole-uclass.c | 16 -- > >>>> drivers/video/video-uclass.c | 190 ++++++++++++---------- > >>>> drivers/video/video_bmp.c | 7 +- > >>>> include/video.h | 59 +++---- > >>>> include/video_console.h | 52 ------ > >>>> lib/efi_loader/efi_gop.c | 7 + > >>>> test/dm/video.c | 256 ++++++++++++++++++++++++------ > >>>> 20 files changed, 483 insertions(+), 297 deletions(-) > >>> It is good to see this tidied up into something that can be applied! > >>> > >>> I am unsure what is going on with the EFI performance, though. It > >>> should not flush the cache after every character, only after a new > >>> line. Is there something wrong in here? If so, we should fix that bug > >>> first and it should be patch 1 of this series. > >> > >> Before I came up with this series, I was trying to identify the UEFI bug > >> in question as well, because intuition told me surely this is a bug in > >> UEFI :). Turns out it really isn't this time around. > > I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI > > implementation. Where did you look for the bug? > > > The "real" bug is in grub. But given that it's reasonably simple to work > around in U-Boot and even with it "fixed" in grub we would still see > performance benefits from flushing only parts of the screen, I think > it's worth living with the grub deficiency.
OK thanks for digging into it. I suggest we add a param to vidconsole_puts() to tell it whether to sync or not, then the EFI code can indicate this and try to be a bit smarter about it. Regards, Simon

