Hi Alex, On Mon, 21 Aug 2023 at 13:59, Alexander Graf <[email protected]> wrote: > > > On 21.08.23 21:11, Simon Glass wrote: > > Hi Alper, > > > > On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak <[email protected]> > > wrote: > >> From: Alexander Graf <[email protected]> > >> > >> Now that we have a damage area tells us which parts of the frame buffer > >> actually need updating, let's only dcache flush those on video_sync() > >> calls. With this optimization in place, frame buffer updates - especially > >> on large screen such as 4k displays - speed up significantly. > >> > >> Signed-off-by: Alexander Graf <[email protected]> > >> Reported-by: Da Xue <[email protected]> > >> [Alper: Use damage.xstart/yend, IS_ENABLED()] > >> Co-developed-by: Alper Nebi Yasak <[email protected]> > >> Signed-off-by: Alper Nebi Yasak <[email protected]> > >> --- > >> > >> Changes in v5: > >> - Use xstart, ystart, xend, yend as names for damage region > >> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED() > >> > >> Changes in v2: > >> - Fix dcache range; we were flushing too much before > >> - Remove ifdefs > >> > >> drivers/video/video-uclass.c | 41 +++++++++++++++++++++++++++++++----- > >> 1 file changed, 36 insertions(+), 5 deletions(-) > > This is a little strange, since flushing the whole cache will only > > actually write out data that was actually written (to the display). Is > > there a benefit to this patch, in terms of performance? > > > I'm happy to see you go through the same thought process I went through > when writing these: "This surely can't be the problem, can it?". The > answer is "simple" in hindsight: > > Have a look at the ARMv8 cache flush function. It does the only "safe" > thing you can expect it to do: Clean+Invalidate to POC because we use it > for multiple things, clearing modified code among others: > > ENTRY(__asm_flush_dcache_range) > mrs x3, ctr_el0 > ubfx x3, x3, #16, #4 > mov x2, #4 > lsl x2, x2, x3 /* cache line size */ > > /* x2 <- minimal cache line size in cache system */ > sub x3, x2, #1 > bic x0, x0, x3 > 1: dc civac, x0 /* clean & invalidate data or unified > cache */ > add x0, x0, x2 > cmp x0, x1 > b.lo 1b > dsb sy > ret > ENDPROC(__asm_flush_dcache_range) > > > Looking at the "dc civac" call, we find this documentation page from > ARM: > https://developer.arm.com/documentation/ddi0601/2022-03/AArch64-Instructions/DC-CIVAC--Data-or-unified-Cache-line-Clean-and-Invalidate-by-VA-to-PoC > > This says we're writing any dirtyness of this cache line up to the POC > and then invalidate (remove the cache line) also up to POC. That means > when you look at a typical SBC, this will either be L2 or system level > cache. Every read afterwards needs to go and pull it all the way back to > L1 to modify it (or not) on the next character write and then flush it > again. > > Even worse: Because of the invalidate, we may even evict it from caches > that the display controller uses to read the frame buffer. So depending > on the SoC's cache topology and implementation, we may force the display > controller to refetch the full FB content on its next screen refresh cycle. > > I faintly remember that I tried to experiment with CVAC instead to only > flush without invalidating. I don't fully remember the results anymore > though. I believe CVAC just behaved identical to CIVAC on the A53 > platform I was working on. And then I looked at Cortex-A53 errata like > [1] and just accepted that doing anything but restricting the flushing > range is a waste of time :)
Yuck I didn't know it was invalidating too. That is horrible. Is there no way to fix it? Regards, Simon > > > Alex > > > [1] > https://patchwork.kernel.org/project/xen-devel/patch/[email protected]/ > >

