[libavcodec/jpeg2000dwt.c:dwt_decode97_float]
On Thu, May 15, 2025 at 10:19:57PM +0200, Michael Niedermayer wrote:
> Hi
>
> On Wed, May 14, 2025 at 06:40:03PM +0200, Michael Niedermayer wrote:
> > Hi Chitra
> >
> > On Wed, May 14, 2025 at 03:55:59AM +, Chitra Dey Sarkar
[libavcodec/jpeg2000dwt.c:dwt_decode97_float]
Hi Chitra
On Wed, May 14, 2025 at 03:55:59AM +, Chitra Dey Sarkar via ffmpeg-devel
wrote:
> Original Implementation:
> -
> In the original implementation, the "VER_SD" section processes image data
>
This is a test email
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Original Implementation:
-
In the original implementation, the "VER_SD" section processes image data
stored in *data using strided memory access in a vertical fashion This leads to
inefficient memory access patterns and cache thrashing due to non-sequential
data a
From d074ea81c12132e3a92211679adbe2d2cb4d5a69 Mon Sep 17 00:00:00 2001
From: ChitraDeySarkar
Date: Wed, 14 May 2025 11:51:35 -0700
Subject: [PATCH] Boost FPS and performance: Optimize vertical loop for
cache-friendly access [libavcodec/jpeg2000dwt.c:dwt_decode97_float]
From: chdey...@microsoft.co
H
We have been profiling FFmpeg at Microsoft and have identified that
ff_xyz12ToRgb48 has a high sample count ( profiled every 1ms )
It seems like ff_xyz12ToRgb48 has performance penalty for
1. Unaligned read and write access
2. Access to xyz2rgb_matrix
3. Multiplication
I would be int